Optimal Policy

Definition

Optimal Policy ( )

A policy is defined to be better than or equal to a policy if its expected return is greater than or equal to that of for all states. An optimal policy is any policy that is better than or equal to all other policies.

Key Properties

  • Existence: At least one optimal policy always exists for any Markov Decision Process (MDP).
  • Shared Value Function: All optimal policies share the same optimal state-value function and optimal action-value function .
  • Greedy Selection: Once is known, an optimal policy can be found by being greedy with respect to it:
  • Uniqueness: While the value functions and are unique, the optimal policy may not be (e.g., if multiple actions share the same maximum value).

Mathematical Relation

Relationship to Optimal Value Function

The Bellman Optimality Equation for is:

Intuition

The Ceiling of Performance

The optimal policy represents the best possible way to behave in an environment. Reinforcement Learning algorithms (like Q-Learning or REINFORCE) are essentially searching for this policy or its corresponding value function.

Connections

  • Attained when solving: Bellman Equation (optimality version)
  • Goal of: Q-Learning (approximates )
  • Foundation for: MDP theory

Appears In