Available gamesο
Statuses:
π’: thoroughly-tested. In many cases, we verified against known values and/or reproduced results from papers.
πΆ: implemented but lightly tested.
β: known issues (see notes below and code for details).
Status | Game | Players | Deterministic | Perfect info | Description |
---|---|---|---|---|---|
πΆ | 2048 | 1 | β | β | A single player game where player aims to create a 2048 tile by merging other tiles. |
πΆ | Amazons | 2 | β | β | Move pieces on a board trying to block opponents from moving. |
πΆ | Atari | 1 | β (most games) | β | Agent plays classic games from Gym's Atari Environments, such as Breakout. |
π’ | Backgammon | 2 | β | β | Players move their pieces through the board based on the rolls of dice. |
πΆ | Bargaining | 2 | β | β | Agents negotiate for items in a pool with different (hidden) valuations. References: DeVault et al. '15. Lewis et al. '17. |
πΆ | Battleship | 2 | β | β | Players place ships and shoot at each other in turns. References: Farina et al. '19, Correlation in Extensive-Form Games: Saddle-Point Formulation and Benchmarks. |
πΆ | Blackjack | 1 | β | β | Simplified version of blackjack, with only HIT/STAND moves. |
πΆ | Block Dominoes | 2 | β | β | Most simple version of dominoes. Consists of 28 tiles, featuring all combinations of spot counts (also called pips or dots) between zero and six. |
π’ | Breakthrough | 2 | β | β | Simplified chess using only pawns. |
π’ | Bridge | 4 | β | β | A card game where players compete in pairs. |
π’ | (Uncontested) Bridge bidding | 2 | β | β | Players score points by forming specific sets with the cards in their hands. |
πΆ | Catch | 1 | β | β | Agent must move horizontally to 'catch' a descending ball. Designed to test basic learning. References: Mnih et al. 2014, Recurrent Models of Visual Attention. Osband et al '19, Behaviour Suite for Reinforcement Learning, Appendix A. |
πΆ | Checkers | 2 | β | β | Players move pieces around the board with the goal of eliminating the opposing pieces. |
πΆ | Cliff Walking | 1 | β | β | Agent must find goal without falling off a cliff. Designed to demonstrate exploration-with-danger. Sutton et al. '18, page 132. |
πΆ | Clobber | 2 | β | β | Simplified checkers, where tokens can capture neighbouring tokens. Designed to be amenable to combinatorial analysis. |
πΆ | Coin Game | 2 | β | β | Agents must collect their and their collaborator's tokens while avoiding a third kind of token. Designed to test divining of collaborator's intentions. References: Raileanu et al. '18, Modeling Others using Oneself in Multi-Agent Reinforcement Learning. |
πΆ | Colored Trails | 3 | β | β | Agents negotiations for chips that they they play on a colored grid to move closer to the goal. References: Ya'akov et al. '10. Fecici & Pfeffer '08. de Jong et al. '11. |
π’ | Connect Four | 2 | β | β | Players drop tokens into columns to try and form a pattern. |
πΆ | Cooperative Box-Pushing | 2 | β | β | Agents must collaborate to push a box into the goal. Designed to test collaboration. References: Seuken & Zilberstein '12, Improved Memory-Bounded Dynamic Programming for Decentralized POMDPs. |
π’ | Chess | 2 | β | β | Players move pieces around the board with the goal of eliminating the opposing pieces. |
πΆ | Crazy Eights | 2 | β | β | A precursor of UNO (see here). |
πΆ | Dark Hex | 2 | β | β | Hex, except the opponent's tokens are hidden (imperfect-information version). |
πΆ | Deep Sea | 1 | β | β | Agent must explore to find reward (first version) or penalty (second version). Designed to test exploration. References: Osband et al. '17, Deep Exploration via Randomized Value Functions. |
π’ | Dots and Boxes | 2 | β | β | Players put lines between dots to form boxes to get points. |
πΆ | Dou Dizhu | 3 | β | β | A three-player games where one player (dizhu) plays against a team of two (peasants). |
πΆ | Euchre | 4 | β | β | Trick-taking card game where players compete in pairs. |
π’ | First-price Sealed-Bid Auction | 2-10 | β | β | Agents submit bids simultaneously; highest bid wins, and that's the price paid. |
π’ | Gin Rummy | 2 | β | β | Players score points by forming specific sets with the cards in their hands. |
π’ | Go | 2 | β | β | Players place tokens on the board with the goal of encircling territory. |
π’ | Goofspiel | 2-10 | β | β | Players bid with their cards to win other cards. |
π’ | Hanabi | 2-5 | β | β | Players can see only other player's pieces, and everyone must cooperate to win. References: Bard et al. '19, The Hanabi Challenge: A New Frontier for AI Research. Implemented via Hanabi Learning Environment. |
π’ | Havannah | 2 | β | β | Players add tokens to a hex grid to try and form a winning structure. |
π’ | Hearts | 3-6 | β | β | A card game where players try to avoid playing the highest card in each round. |
πΆ | Hex | 2 | β | β | Players add tokens to a hex grid to try and link opposite sides of the board. References: Hex, the full story by Ryan Hayward and Bjarne Toft. |
πΆ | Kriegspiel | 2 | β | β | Chess with opponent's pieces unknown. Illegal moves have no effect - it remains the same player's turn until they make a legal move. References: Monte Carlo tree search in Kriegspiel. Game-Tree Search with Combinatorially Large Belief States, Parker 2005. |
π’ | Kuhn poker | 2 | β | β | Simplified poker amenable to game-theoretic analysis. |
πΆ | Laser Tag | 2 | β | β | Agents see a local part of the grid, and attempt to tag each other with beams. References: Leibo et al. '17. Lanctot et al. '17. |
π’ | Leduc poker | 2 | β | β | Simplified poker amenable to game-theoretic analysis. References: Southey et al. '05, Bayesβ bluff: Opponent modelling in poker. |
πΆ | Lewis Signaling | 2 | β | β | Receiver must choose an action dependent on the sender's hidden state. Designed to demonstrate the use of conventions. |
π’ | Liar's Dice | 2 | β | β | Players bid and bluff on the state of all the dice together, given only the state of their dice. |
πΆ | Liar's Poker | 2+ | β | β | Players bid and bluff on the state of all hands, given only the state of their hand. |
πΆ | Mensch Γ€rgere Dich nicht | 2-4 | β | β | Players roll dice to move their pegs toward their home row while throwing other players' pegs to the out area. |
πΆ | Mancala | 2 | β | β | Players take turns sowing beans on the board and try to capture more beans than the opponent. |
πΆ | Markov Soccer | 2 | β | β | Agents must take the ball to their goal, and can 'tackle' the opponent by predicting their next move. References: Littman '94, Markov games as a framework for multi-agent reinforcement learning. He et al. '16, Opponent Modeling in Deep Reinforcement Learning. |
π’ | Matching Pennies (3-player) | 3 | β | β | Players must predict and match/oppose another player. Designed to have an unstable Nash equilibrium. References: Jordan '93. |
π’ | Mean Field Game: crowd modelling | n/a | n/a | n/a | References: Scaling up Mean Field Games with Online Mirror Descent, Scalable Deep Reinforcement Learning Algorithms for Mean Field Games, Learning in Mean Field Games: A Survey. |
π’ | Mean Field Game: crowd modelling 2d | n/a | n/a | n/a | References: Scaling up Mean Field Games with Online Mirror Descent, Scalable Deep Reinforcement Learning Algorithms for Mean Field Games, Learning in Mean Field Games: A Survey. |
π’ | Mean Field Game: linear-quadratic | n/a | β | β | Players are uniformly distributed and are then incentivized to gather at the same point (The lower the distanbce wrt. the distribution mean position, the higher the reward). A mean-reverting term pushes the players towards the distribution, a gaussian noise term perturbs them. The players' actions alter their states linearly (alpha * a * dt) and the cost thereof is quadratic (K * a^2 * dt), hence the name. There exists an exact, closed form solution for the fully continuous version of this game. References: Perrin & al. 2019. |
π’ | Mean Field Game: predator prey | n/a | n/a | n/a | References: Scaling up Mean Field Games with Online Mirror Descent, Scalable Deep Reinforcement Learning Algorithms for Mean Field Games, Learning in Mean Field Games: A Survey. |
π’ | Mean Field Game: routing | n/a | β | β | Representative player chooses at each node where they go. They has an origin, a destination and a departure time and chooses their route to minimize their travel time. Time spent on each link is a function of the distribution of players on the link when the player reaches the link. References: Cabannes et. al. '21, Solving N-player dynamic routing games with congestion: a mean field approach. |
πΆ | Morpion Solitaire (4D) | 1 | β | β | A single player game where player aims to maximize lines drawn on a grid, under certain limitations. |
π’ | Negotiation | 2 | β | β | Agents with different utilities must negotiate an allocation of resources. References: Lewis et al. '17. Cao et al. '18. |
πΆ | Nim | 2 | β | β | Two agents take objects from distinct piles trying to either avoid taking the last one or take it. Any positive number of objects can be taken on each turn given they all come from the same pile. |
πΆ | Nine men's morris | 2 | β | β | Two players put and move stones on the board to try to form mills (three adjacent stones in a line) to capture the other player's stones. |
πΆ | Oh Hell | 3-7 | β | β | A card game where players try to win exactly a declared number of tricks. |
π’ | Oshi-Zumo | 2 | β | β | Players must repeatedly bid to push a token off the other side of the board. References: Buro, 2004. Solving the oshi-zumo game. Bosansky et al. '16, Algorithms for Computing Strategies in Two-Player Simultaneous Move Games. |
π’ | Oware | 2 | β | β | Players redistribute tokens from their half of the board to capture tokens in the opponent's part of the board. |
πΆ | Pathfinding | 1-10 | β | β | Agents must move to their destination. References: Austerweil et al. '15. Greenwald & Hall '03. Littman '01. |
π’ | Pentago | 2 | β | β | Players place tokens on the board, then rotate part of the board to a new orientation. |
πΆ | Phantom Go | 2 | β | β | Go, except the opponent's stones are hidden. The analogue of Kriegspiel for Go. References: Cazenave '05, A Phantom Go Program. |
πΆ | Phantom Tic-Tac-Toe | 2 | β | β | Tic-tac-toe, except the opponent's tokens are hidden. Designed as a simple, imperfect-information game. References: Auger '11, Multiple Tree for Partially Observable Monte-Carlo Tree Search. Lisy '14, Alternative Selection Functions for Information Set Monte Carlo Tree Search. Lanctot '13. |
π’ | Pig | 2-10 | β | β | Each player rolls a dice until they get a 1 or they 'hold'; the rolled total is added to their score. |
π’ | Prisoner's Dilemma | 2 | β | β | Players decide on whether to cooperate or defect given a situation with different payoffs. |
πΆ | Poker (Hold 'em) | 2-10 | β | β | Players bet on whether their hand of cards plus some communal cards will form a special set. Implemented via ACPC. |
β (#1158) | Quoridor | 2-4 | β | β | Each turn, players can either move their agent or add a small wall to the board. |
β (#811) | Reconnaissance Blind Chess | 2 | β | β | Chess with opponent's pieces unknown, with sensing moves. Chess variant, invented by John Hopkins University Applied Physics Lab. Used in NeurIPS competition and Hidden Information Game Competition. References: Markowitz et al. '18, On the Complexity of Reconnaissance Blind Chess. Newman et al. '16, Reconnaissance blind multi-chess: an experimentation platform for ISR sensor fusion and resource management. |
π’ | Routing game | 1+ | β | β | Players choose at each node where they go. They have an origin, a destination and a departure time and choose their route to minimize their travel time. Time spent on each link is a function of the number of players on the link when the player reaches the link. References: Cabannes et. al. '21, Solving N-player dynamic routing games with congestion: a mean field approach. |
πΆ | Sheriff | 2 | β | β | Bargaining game. Good for correlated equilibria. Based on the board game Sheriff of Nottingham. References: Farina et al. '19, Correlation in Extensive-Form Games: Saddle-Point Formulation and Benchmarks. |
πΆ | Slovenian Tarok | 3-4 | β | β | Trick-based card game with bidding. References: LuΕ‘trek et al. 2003, A program for playing Tarok. |
πΆ | Skat (simplified bidding) | 3 | β | β | Each turn, players bid to compete against the other two players. |
πΆ | Solitaire (K+) | 1 | β | β | A single-player card game. References: Bjarnason et al. '07, Searching solitaire in real time. |
πΆ | Spades | 4 | β | β | A four-player card game. |
πΆ | Team Dominoes | 4 | β | β | Team version of dominoes. Consists of 28 tiles, featuring all combinations of spot counts (also called pips or dots) between zero and six. |
π’ | Tic-Tac-Toe | 2 | β | β | Players place tokens to try and form a pattern. |
π’ | Tiny Bridge | 2,4 | β | β | Simplified Bridge with fewer cards and tricks. |
π’ | Tiny Hanabi | 2-10 | β | β | Simplified Hanabi with just two turns. References: Foerster et al 2018, Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning. |
π’ | Trade Comm | 2 | β | β | Players with different utilities and items communicate and then trade. |
πΆ | TwixT | 2 | β | β | Players place pegs and links on a 24x24 square to connect a line between opposite sides. |
πΆ | Ultimate Tic-Tac-Toe | 2 | β | β | Players try and form a pattern in local boards and a meta-board. |
πΆ | Weighted Voting Games | 1+ | β | β | Classic coalitional game. Players each have a weight w_i, and there is a quota q. Denote p the binary vector representing a coalition over n players. The utility is 1 if p Β· w β₯ q, 0 otherwise. References: Chalkiadakis, Elkind, & Wooldridge '12. |
π’ | Y | 2 | β | β | Players place tokens to try and connect sides of a triangular board. |