Artificial intelligence has proven thatcomplicated board and video gamesare no longer the exclusive domain of the human mind.

Finding the gap in reinforcement learning

40% off TNW Conference!

Early research of deep reinforcement learning relied on the agent being pretrained on gameplay data from human players.

Reinforcement learning makes for terrible AI teammates in co-op games

Dr. Ross Allen, AI researcher at Lincoln Laboratory and co-author of the paper, told TechTalks.

One famous example was a move made by DeepMinds AlphaGo in its matchup against Go world champion Lee Sedol.

Analysts first thought the move was a mistake because it went against the intuitions of human experts.

A depiction of reinforcement learning used by an AI in the game Dota 2

But the same move ended up turning the tide in favor of the AI player and defeating Sedol.

Allen thinks the same kind of ingenuity can come into play when RL is teamed up with humans.

Players must hold their cards backward and cant see their faces.

Hanabi reinforcement learning and symbolic AI systems

Accordingly, each player can see the faces of their teammates cards.

Players can use a limited number of tokens to provide each other clues about the cards theyre holding.

In the pursuit of real-world problems, we have to start simple, Allen said.

Thus we focus on the benchmark collaborative game ofHanabi.

In recent years, several research teams have explored the development of AI bots that can playHanabi.

This work directly extends previous work on RL for trainingHanabiagents.

In particular we study the Other Play RL agent from Jakob Foersters lab, Allen said.

It had produced state-of-the-art performance inHanabiwhen teamed with other AI it had not met during training.

Human-AI cooperation

In the experiments, human participants played several games ofHanabiwith an AI teammate.

The players were exposed to both SmartBot and Other-Play but werent told which algorithm was working behind the scenes.

The researchers evaluated the level of human-AI cooperation based on objective and subjective metrics.

Objective metrics include scores, error rates, etc.

There was no significant difference in the objective performance of the two AI agents.

In short, they hated it, Allen said.

One of the key points to success inHanabiis the skill of providing subtle hints to other players.

An experienced player would catch on the hint immediately.

But providing the same kind of information to the AI teammate proves to be much more difficult.

Another said, At this point, I dont know what the point is.

This makes Other-Play an optimal teammate for AI algorithms that werent part of its training regime.

But it still has assumptions about the types of teammates it will encounter, the researchers note.

Notably, [Other-Play] assumes that teammates are also optimized for zero-shot coordination.

In contrast, humanHanabiplayers typically do not learn with this assumption.

This raises the question: what objective metrics do correlate to subjective human preferences?

Nonetheless, the results can have important implications for the future of reinforcement learning research.

There is a lot of buzz about reinforcement learning within tech and academic fields; and rightfully so.

you might read the original articlehere.

Also tagged with