Model-Based Reinforcement Learning

Model-Based Reinforcement Learning

An approach to Reinforcement Learning that learns a model of the environment — the transition dynamics and reward function — and uses this model for planning to improve the policy or value function. Contrasts with model-free methods that learn value functions or policies directly from experience.

Intuition

Learning a Mental Model

Model-free RL learns “what to do” from trial and error. Model-based RL first learns “how the world works” (a model), then uses that model to figure out what to do. This is like learning the rules of chess before deciding on a strategy, rather than just memorizing which moves worked in past games.

Model Learning

The model is learned from collected data using supervised learning:

SettingApproach
TabularStore transition counts, compute $\hat{p}(s’
Function approximationTrain neural network, Gaussian process, etc. to predict and from

Using the Model: Two Approaches

1. Background Planning

Generate simulated experience from the model and use it to update value functions/policies between real environment steps. Example: Dyna.

2. Decision-Time Planning

Plan from the current state at the moment a decision is needed. Examples: Rollout Algorithm, Monte Carlo Tree Search (MCTS).

Advantages

  • Sample efficient: can extract more learning from each real interaction by replaying/simulating
  • Transfer: a good model can be reused across different reward functions or tasks
  • Interpretability: the model captures environment dynamics explicitly

Disadvantages

Model Errors Compound

Model errors accumulate over multi-step rollouts. A small error per step can lead to highly inaccurate predictions over long horizons. This is the fundamental challenge of MBRL.

  • Model learning is itself a hard problem (especially in high-dimensional spaces)
  • Computational overhead of maintaining and querying the model
  • Model bias can lead to suboptimal policies that exploit model inaccuracies

Big Picture

                    Known model              Learned model
Rollout/MCTS    Rollout algorithm,      Rollout algorithm,
                MCTS, AlphaGo          MCTS, AlphaGo

Planning with   Planning               Model-based RL
MF RL tools                            (e.g., Dyna)

Connections

Appears In