MuZero is a computer program developed by artificial intelligence research company DeepMind to master games without knowing their rules.
| FactSnippet No. 845,780 | 
MuZero is a computer program developed by artificial intelligence research company DeepMind to master games without knowing their rules.
| FactSnippet No. 845,780 | 
MuZero was trained via self-play, with no access to rules, opening books, or endgame tablebases.
| FactSnippet No. 845,781 | 
MuZero really is discovering for itself how to build a model and understand it just from first principles.
| FactSnippet No. 845,782 | 
MuZero is a combination of the high-performance planning of the AlphaZero algorithm with approaches to model-free reinforcement learning.
| FactSnippet No. 845,783 | 
MuZero was derived directly from AZ code, sharing its rules for setting hyperparameters.
| FactSnippet No. 845,784 | 
MuZero surpassed both R2D2's mean and median performance across the suite of games, though it did not do better in every game.
| FactSnippet No. 845,785 | 
MuZero used 16 third-generation tensor processing units for training, and 1000 TPUs for selfplay for board games, with 800 simulations per step and 8 TPUs for training and 32 TPUs for selfplay for Atari games, with 50 simulations per step.
| FactSnippet No. 845,786 | 
MuZero matched AlphaZero's performance in chess and Shogi after roughly 1 million training steps.
| FactSnippet No. 845,787 | 
MuZero was viewed as a significant advancement over AlphaZero, and a generalizable step forward in unsupervised learning techniques.
| FactSnippet No. 845,788 | 
MuZero has been used as a reference implementation in other work, for instance as a way to generate model-based behavior.
| FactSnippet No. 845,789 |