Available algorithms

_images/green_circ10.png: thoroughly-tested. In many cases, we verified against known values and/or reproduced results from papers.

~: implemented but lightly tested.

X: known problems; please see github issues.

Algorithms Category Reference Status
Information Set Monte Carlo Tree Search (IS-MCTS) Search Cowley et al. '12 ~
Max^n Search Luckhart & Irani '86 ~
Minimax (and Alpha-Beta) Search Search Wikipedia1, Wikipedia2, Knuth and Moore '75
Monte Carlo Tree Search Search Wikipedia, UCT paper, Coulom '06, Cowling et al. survey
Perfect Information Monte Carlo (PIMC) Search Long et al. '10 ~
Lemke-Howson (via nashpy) Opt. Wikipedia, Shoham & Leyton-Brown '09
ADIDAS Opt. Gemp et al '22 ~
Sequence-form linear programming Opt. Koller, Megiddo, and von Stengel '94,
Shoham & Leyton-Brown '09
Stackelberg equilibrium solver Opt. Conitzer & Sandholm '06 ~
MIP-Nash Opt. Sandholm et al. '05 ~
Magnetic Mirror Descent (MMD) with dilated entropy Opt. Sokota et al. '22 ~
Counterfactual Regret Minimization (CFR) Tabular Zinkevich et al '08, Neller & Lanctot '13
CFR against a best responder (CFR-BR) Tabular Johanson et al '12
Exploitability / Best response Tabular Shoham & Leyton-Brown '09
External sampling Monte Carlo CFR Tabular Lanctot et al. '09, Lanctot '13
Fixed Strategy Iteration CFR (FSICFR) Tabular Neller & Hnath '11 ~
Mean-field Ficticious Play for MFG Tabular Perrin et. al. '20 ~
Online Mirror Descent for MFG Tabular Perolat et. al. '21 ~
Munchausen Online Mirror Descent for MFG Tabular Lauriere et. al. '22 ~
Fixed Point for MFG Tabular Huang et. al. '06 ~
Boltzmann Policy Iteration for MFG Tabular Lauriere et. al. '22 ~
Outcome sampling Monte Carlo CFR Tabular Lanctot et al. '09, Lanctot '13
Policy Iteration Tabular Sutton & Barto '18
Q-learning Tabular Sutton & Barto '18
Regret Matching Tabular Hart & Mas-Colell '00
Restricted Nash Response (RNR) Tabular Johanson et al '08 ~
SARSA Tabular Sutton & Barto '18
Value Iteration Tabular Sutton & Barto '18
Advantage Actor-Critic (A2C) RL Mnih et al. '16
Deep Q-networks (DQN) RL Mnih et al. '15
Ephemeral Value Adjustments (EVA) RL Hansen et al. '18 ~
Proximal Policy Optimization (PPO) RL Schulman et al. '18 ~
AlphaZero (C++/LibTorch) MARL Silver et al. '18
AlphaZero (Python/TF) MARL Silver et al. '18
Correlated Q-Learning MARL Greenwald & Hall '03 ~
Asymmetric Q-Learning MARL Kononen '04 ~
Deep CFR MARL Brown et al. '18
DiCE: The Infinitely Differentiable Monte-Carlo Estimator (LOLA-DiCE) MARL Foerster, Farquhar, Al-Shedivat et al. '18 ~
Exploitability Descent (ED) MARL Lockhart et al. '19
(Extensive-form) Fictitious Play (XFP) MARL Heinrich, Lanctot, & Silver '15
Learning with Opponent-Learning Awareness (LOLA) MARL Foerster, Chen, Al-Shedivat, et al. '18 ~
Nash Q-Learning MARL Hu & Wellman '03 ~
Neural Fictitious Self-Play (NFSP) MARL Heinrich & Silver '16
Neural Replicator Dynamics (NeuRD) MARL Omidshafiei, Hennes, Morrill, et al. '19 X
Regret Policy Gradients (RPG, RMPG) MARL Srinivasan, Lanctot, et al. '18
Policy-Space Response Oracles (PSRO) MARL Lanctot et al. '17
Q-based ("all-actions") Policy Gradient (QPG) MARL Srinivasan, Lanctot, et al. '18
Regularized Nash Dynamics (R-NaD) MARL Perolat, De Vylder, et al. '22
Regression CFR (RCFR) MARL Waugh et al. '15, Morrill '16
Rectified Nash Response (PSRO_rn) MARL Balduzzi et al. '19 ~
Win-or-Learn-Fast Policy-Hill Climbing (WoLF-PHC) MARL Bowling & Veloso '02 ~
α-Rank Eval. / Viz. Omidhsafiei et al. '19, arXiv
Nash Averaging Eval. / Viz. Balduzzi et al. '18 ~
Replicator / Evolutionary Dynamics Eval. / Viz. Hofbaeur & Sigmund '98, Sandholm '10