Optimal Policy
Definition
Optimal Policy ( )
A policy is defined to be better than or equal to a policy if its expected return is greater than or equal to that of for all states. An optimal policy is any policy that is better than or equal to all other policies.
Key Properties
- Existence: At least one optimal policy always exists for any Markov Decision Process (MDP).
- Shared Value Function: All optimal policies share the same optimal state-value function and optimal action-value function .
- Greedy Selection: Once is known, an optimal policy can be found by being greedy with respect to it:
- Uniqueness: While the value functions and are unique, the optimal policy may not be (e.g., if multiple actions share the same maximum value).
Mathematical Relation
Relationship to Optimal Value Function
The Bellman Optimality Equation for is:
Intuition
The Ceiling of Performance
The optimal policy represents the best possible way to behave in an environment. Reinforcement Learning algorithms (like Q-Learning or REINFORCE) are essentially searching for this policy or its corresponding value function.
Connections
- Attained when solving: Bellman Equation (optimality version)
- Goal of: Q-Learning (approximates )
- Foundation for: MDP theory