Policy Improvement

Policy Improvement Theorem

If $q_{π} (s, π^{'} (s)) \geq v_{π} (s)$ for all $s$ , then $π^{'}$ is at least as good as $π$ : $v_{π^{'}} (s) \geq v_{π} (s)$ for all $s$ .

The greedy policy $π^{'} (s) = ar g max_{a} q_{π} (s, a)$ satisfies this condition (by definition of max). Therefore, making the policy greedy w.r.t. its own value function always improves it (or leaves it unchanged if already optimal).

This theorem justifies the improvement step in Policy Iteration and Generalized Policy Iteration.

Appears In

RL-L02 - Dynamic Programming, RL-Book Ch4 - Dynamic Programming

Study Notes

Explorer

Policy Improvement

Policy Improvement

Appears In

Graph View

Table of Contents

Backlinks