Policy Improvement

Policy Improvement Theorem

If for all , then is at least as good as : for all .

The greedy policy satisfies this condition (by definition of max). Therefore, making the policy greedy w.r.t. its own value function always improves it (or leaves it unchanged if already optimal).

This theorem justifies the improvement step in Policy Iteration and Generalized Policy Iteration.

Appears In