Demonstrating once again the potential of video games to advance AI and machine learning research, Facebook researchers propose a game-like language challenge — Read to Fight Monsters (RTFM) — in a paper that the International Conference on Learning Representations (ICLR) 2020 accepted. RTFM tasks an AI agent plopped in a procedurally generated environment to learn the dynamics by reading a description of them, so that it can generalize to new worlds with dynamics it isn’t familiar with.
Facebook’s work could form the cornerstone of AI models capable of capturing the interplay among goals, documents, and observations in complex tasks. If the RTFM agents perform well on objectives requiring reasoning, this suggests that language understanding is a promising way to learn policies — i.e., heuristics that suggest a set of actions in response to a state.
In RTFM, which takes inspiration from roguelikes (a subgenre of role-playing games that use a lot of procedurally generated elements) such as NetHack, Diablo, and Darkest Dungeon, the dynamics consist of:
- Monsters like wolfs, bats, jaguars, ghosts, and goblins
- Teams like “Order of the Forest” and “fire goblin”
- Element types like fire and poison
- Item modifiers like “fanatical” and “arcane”
- Items like swords and hammers
At the beginning of a run, RTFM generates a large number of dynamics along with descriptions of those dynamics (for example, “Blessed items are effective against poison monsters”) and goals (“Defeat the Order of the Forest”). Groups, monsters, modifiers, and elements are randomized, as are monsters’ team assignments and the effectiveness of modifiers against various elements. One element, team, and a monster from that team are designated to be the “target” monster, while an element, team, and monster from a different team are designated a “distractor” monster, along with an element that defeats the distractor monster. The position of the target and distractor monsters — both of which move to attack the agent at a fixed speed — are also randomized so that the agent can’t memorize their patterns.
Register today and save 30% off digital access passes.
Human-written templates indicate which monster belongs to which team, which modifiers are effective against which element, and which team the agent should defeat. The researchers note that there are 2 million possible games within RTFM without considering the natural language templates (and 200 million otherwise), and that with the random ordering of the templates, the number of unique documents exceeded 15 billion.
Agents are given a text document describing the dynamics and observations of the environment, in addition to a partial goal instruction. In order to achieve the goal, they must cross-reference relevant information in the document — which also lists their inventory — as well as in the observations.
Specifically, RTFM agents must:
- Identify the target team from the goal
- Identify the monster belonging to that team
- Identify the modifiers that are effective against this element
- Find which modifier is present and the item with the modifier
- Pick up the correct item
- Engage the correct monster in combat
The researchers leveraged reinforcement learning, a technique that spurs agents toward goals via rewards, to train an RTFM model they refer to as txt2π. By receiving a reward of “1” for wins and “-1” for losses, txt2π learned to build representations that capture interactions among the goal, documents describing dynamics, and observations.
The team ran experiments in which they trained txt2π for a minimum of 50 million frames. While the performance of the final model trailed that of human players, who consistently solved RTFM, txt2π beat two baselines and achieved good performance by learning a curriculum. In the training phase on large environments (10 by 10 blocks) with new dynamics and world configurations, the model had a 61% win rate (plus or minus 18%), and during evaluation, it had a 43% win rate during evaluation (plus or minus 13%).
“[The results suggest] that there is ample room for improvement in grounded policy learning on complex RTFM problems,” conceded the coauthors, who hope to explore in future work how supporting evidence in external documents might be used to train an agent to reason about plans..