Bellman Error

Bellman Error

The Bellman error at a state measures how far the current value estimate is from satisfying the Bellman Equation:

Mean Squared Bellman Error ( )

Bellman Error Is Not Learnable

A key result from Ch 11.6: the cannot be learned from data alone — different MDPs can produce identical data but have different values. This is why gradient methods that minimize directly are problematic.

Alternative objectives: Projected Bellman Error (PBE), Mean Squared TD Error — these are learnable and used by Gradient-TD Methods.

Appears In