MuZero is a computer program developed by artificial intelligence research company DeepMind to master games without knowing their rules.
FactSnippet No. 845,780 |
MuZero is a computer program developed by artificial intelligence research company DeepMind to master games without knowing their rules.
FactSnippet No. 845,780 |
MuZero was trained via self-play, with no access to rules, opening books, or endgame tablebases.
FactSnippet No. 845,781 |
MuZero really is discovering for itself how to build a model and understand it just from first principles.
FactSnippet No. 845,782 |
MuZero is a combination of the high-performance planning of the AlphaZero algorithm with approaches to model-free reinforcement learning.
FactSnippet No. 845,783 |
MuZero was derived directly from AZ code, sharing its rules for setting hyperparameters.
FactSnippet No. 845,784 |
MuZero surpassed both R2D2's mean and median performance across the suite of games, though it did not do better in every game.
FactSnippet No. 845,785 |
MuZero used 16 third-generation tensor processing units for training, and 1000 TPUs for selfplay for board games, with 800 simulations per step and 8 TPUs for training and 32 TPUs for selfplay for Atari games, with 50 simulations per step.
FactSnippet No. 845,786 |
MuZero matched AlphaZero's performance in chess and Shogi after roughly 1 million training steps.
FactSnippet No. 845,787 |
MuZero was viewed as a significant advancement over AlphaZero, and a generalizable step forward in unsupervised learning techniques.
FactSnippet No. 845,788 |
MuZero has been used as a reference implementation in other work, for instance as a way to generate model-based behavior.
FactSnippet No. 845,789 |