Available algorithms

_images/green_circ10.png : thoroughly-tested. In many cases, we verified against known values and/or reproduced results from papers.

~: implemented but lightly tested.

X: known problems; please see Github issues.

Algorithms	Category	Reference	Status
Information Set Monte Carlo Tree Search (IS-MCTS)	Search	Cowley et al. '12	~
Max^n	Search	Luckhart & Irani '86	~
Minimax (and Alpha-Beta) Search	Search	Wikipedia1, Wikipedia2, Knuth and Moore '75
Monte Carlo Tree Search	Search	Wikipedia, UCT paper, Coulom '06, Cowling et al. survey
Perfect Information Monte Carlo (PIMC)	Search	Long et al. '10	~
Lemke-Howson (via `nashpy`)	Opt.	Wikipedia, Shoham & Leyton-Brown '09
ADIDAS	Opt.	Gemp et al '22	~
Least Core via Linear Programming	Opt.	Yan & Procaccia '21	~
Least Core via Saddle-Point (Lagrangian) Programming	Opt.	Gemp et al '24	~
Sequence-form linear programming	Opt.	Koller, Megiddo, and von Stengel '94, Shoham & Leyton-Brown '09
Sequence-form LP for Sequential Equilibrium	Opt.	Miltersen & Sørensen '06,
Shapley Values (incl. approximations via Monte Carlo sampling)	Opt.	Mitchell et al. '22	~
Stackelberg equilibrium solver	Opt.	Conitzer & Sandholm '06	~
MIP-Nash	Opt.	Sandholm et al. '05	~
Magnetic Mirror Descent (MMD) with dilated entropy	Opt.	Sokota et al. '22	~
Counterfactual Regret Minimization (CFR)	Tabular	Zinkevich et al '08, Neller & Lanctot '13
CFR against a best responder (CFR-BR)	Tabular	Johanson et al '12
Exploitability / Best response	Tabular	Shoham & Leyton-Brown '09
External sampling Monte Carlo CFR	Tabular	Lanctot et al. '09, Lanctot '13
Fixed Strategy Iteration CFR (FSICFR)	Tabular	Neller & Hnath '11	~
Extensive-form Regret Minimization	Tabular	Morrill et. al. '22	~
Mean-field Ficticious Play for MFG	Tabular	Perrin et. al. '20	~
Online Mirror Descent for MFG	Tabular	Perolat et. al. '21	~
Munchausen Online Mirror Descent for MFG	Tabular	Lauriere et. al. '22	~
Fixed Point for MFG	Tabular	Huang et. al. '06	~
Boltzmann Policy Iteration for MFG	Tabular	Lauriere et. al. '22	~
Outcome sampling Monte Carlo CFR	Tabular	Lanctot et al. '09, Lanctot '13
Policy Iteration	Tabular	Sutton & Barto '18
Q-learning	Tabular	Sutton & Barto '18
Regret Matching	Tabular	Hart & Mas-Colell '00
Restricted Nash Response (RNR)	Tabular	Johanson et al '08	~
SARSA	Tabular	Sutton & Barto '18
Value Iteration	Tabular	Sutton & Barto '18
Advantage Actor-Critic (A2C)	RL	Mnih et al. '16
Deep Q-networks (DQN)	RL	Mnih et al. '15
Ephemeral Value Adjustments (EVA)	RL	Hansen et al. '18	~
Proximal Policy Optimization (PPO)	RL	Schulman et al. '18	~
Mean Field Proximal Policy Optimization (MF-PPO)	RL	Algumaei et al. '23	~
AlphaZero (C++/LibTorch)	MARL	Silver et al. '18
AlphaZero (Python/TF)	MARL	Silver et al. '18
Correlated Q-Learning	MARL	Greenwald & Hall '03	~
Asymmetric Q-Learning	MARL	Kononen '04	~
Deep CFR	MARL	Brown et al. '18
ESCHER	MARL	McAleer et al. '22	~
DiCE: The Infinitely Differentiable Monte-Carlo Estimator (LOLA-DiCE)	MARL	Foerster, Farquhar, Al-Shedivat et al. '18	~
Exploitability Descent (ED)	MARL	Lockhart et al. '19
(Extensive-form) Fictitious Play (XFP)	MARL	Heinrich, Lanctot, & Silver '15
Learning with Opponent-Learning Awareness (LOLA)	MARL	Foerster, Chen, Al-Shedivat, et al. '18	~
Nash Q-Learning	MARL	Hu & Wellman '03	~
Neural Fictitious Self-Play (NFSP)	MARL	Heinrich & Silver '16
Neural Replicator Dynamics (NeuRD)	MARL	Omidshafiei, Hennes, Morrill, et al. '19	X
Regret Policy Gradients (RPG, RMPG)	MARL	Srinivasan, Lanctot, et al. '18
Policy-Space Response Oracles (PSRO)	MARL	Lanctot et al. '17
GQ-based ("all-actions") Policy Gradient (QPG)	MARL	Srinivasan, Lanctot, et al. '18
Regression CFR (RCFR)	MARL	Waugh et al. '15, Morrill '16
Rectified Nash Response (PSRO_rn)	MARL	Balduzzi et al. '19	~
Mean-Field PSRO (MFPSRO)	MARL	Muller et al. '21	~
Win-or-Learn-Fast Policy-Hill Climbing (WoLF-PHC)	MARL	Bowling & Veloso '02	~
α-Rank	Eval. / Viz.	Omidhsafiei et al. '19, arXiv
Nash Averaging	Eval. / Viz.	Balduzzi et al. '18	~
Replicator / Evolutionary Dynamics	Eval. / Viz.	Hofbaeur & Sigmund '98, Sandholm '10
Voting-as-Evaluation (VasE)	Eval. / Viz.	Lanctot et al. '23