Dyna

Definition

Dyna

Dyna is a model-based reinforcement learning architecture that integrates learning, planning, and acting. It uses real experience to simultaneously improve a policy/value function (direct RL) and learn a model of the environment. This model is then used to generate simulated experience for planning (indirect RL).

Intuition

Learning from the Simulated Past

In traditional RL (like Q-learning), you only learn from what actually happened. In Dyna, you use your experience to build a mental mirror of the world (the Model). After every real step you take, you pause and imagine simulated steps using that model. This “imaginary” experience updates your value function just like real experience does, making learning much more sample-efficient.

Update Logic

Dyna-Q combines direct Q-learning with random replay from the model:

  1. Direct RL:
  2. Model Learning: (storing the transition)
  3. Planning: Repeat times:
    • Select a previously visited state and action at random.
    • Query model:
    • Update :

Key Components

  • Direct RL: Improving value functions/policies from real experience.
  • Model Learning: Learning the transition and reward dynamics: .
  • Planning: Improving value functions/policies using simulated experience from the model.
  • Search Control: The process of selecting starting states and actions for the simulated experience.

Dyna-Q vs Dyna-Q+

VariantStrategyPurpose
Dyna-QRandomly samples Standard planning.
Dyna-Q+Adds a “curiosity” reward Encourages exploration in changing environments where is time since last visit.

Connections

Appears In