Facebook researchers believe the Atari game NetHack is well-tailored to training, testing, and evaluating AI models. To this end, they today released the NetHack Learning Environment, a research tool for benchmarking the robustness and generalization of reinforcement learning agents.
For decades, games have served as benchmarks for AI. But things really kicked into gear in 2013 — the year Google subsidiary DeepMind demonstrated an AI system that could play Pong, Breakout, Space Invaders, Seaquest, Beamrider, Enduro, and Q*bert at superhuman levels. The advancements aren’t merely improving game design, according to folks like DeepMind cofounder Demis Hassabis. Rather, they’re informing the development of systems that might one day diagnose illnesses, predict complicated protein structures, and segment CT scans.
NetHack, which was first released in 1987, is more sophisticated than might be assumed. It tasks players with descending more than 50 dungeon levels to retrieve a magical amulet, during which they must use hundreds of items and fight monsters while contending with rich interactions between the two. Levels in NetHack are procedurally generated and every game is different, which the Facebook researchers note tests the generalization limits of current state-of-the-art AI.
Register for the free livestream.
NetHack has another advantage in its lightweight architecture. A turn-based, ASCII-art world and a game engine written primarily in C captures its complexity. It forgoes all but the simplest physics while rendering symbols instead of pixels, importantly, allowing models to learn quickly without wasting computational resources on simulating dynamics or rendering observations.
Indeed, training sophisticated machine learning models in the cloud remains prohibitively expensive. According to a recent Synced report, the University of Washington’s Grover, which is tailored for both the generation and detection of fake news, cost $25,000 to train over the course of two weeks. OpenAI racked up $256 per hour to train its GPT-2 language model, and Google spent an estimated $6,912 training BERT, a bidirectional transformer model that redefined the state of the art for 11 natural language processing tasks.
By contrast, a single high-end graphics card is sufficient to train AI-driven NetHack agents hundreds of millions of steps a day using the TorchBeast framework, which supports further scaling by adding additional graphics cards or machines. Agents can even experience billions of steps in the environment in a reasonable time frame while still challenging the limits of what current AI techniques can achieve.
“NetHack presents a challenge that’s on the frontier of current methods, without the computational costs of other challenging simulation environments. Standard deep [reinforcement learning] agents currently operating on NetHack explore only a fraction of the overall game of NetHack,” the Facebook researchers said in a paper published this week. “Progress in this challenging new environment will require [reinforcement learning] agents to move beyond tabula rasa learning.”
The NetHack Learning Environment consists of three components: a Python interface to NetHack using the popular OpenAI Gym API, a suite of benchmark tasks, and a baseline agent. Beyond this, it includes seven benchmark tasks designed to measure agents’ progress, specifically:
- Staircase: descend to lower levels of the dungeon
- Set: take care of your pet (keep it alive and take it with you deeper into the dungeon)
- Eat: find sources of nonpoisonous food and eat it, to avoid starving
- Gold: collect gold throughout the dungeon scout:
- See as much of the dungeon as you can
- Score: achieve high in-game score (e.g., killing monsters, descending, collecting gold)
- Oracle: reach an important landmark, the Oracle (appears 4–9 levels into the dungeon)
The coauthors note that NetHack contains a large body of external resources, which they expect will be used to improve agents’ performance. For example, repositories of replay data from human players exist from which a model could learn directly, as well as resources like the official NetHack Guidebook, the NetHack Wiki, and online videos and forum discussions.
“We believe that the NetHack Learning environment will inspire further research on robust exploration strategies in [reinforcement learning], planning with long-term horizons, and transferring commonsense knowledge from resources outside of the simulation,” continued the researchers. “[It] provides … agents with plenty of experience to learn from so that we as researchers can spend more time testing new ideas instead of waiting for results to come in. In addition, we believe it democratizes access for researchers in more resource-constrained labs without sacrificing the difficulty and richness of the environment.”