Study Notes

❯

❯

Bellman Error

Jun 06, 20261 min read

approximation
exam-topic

Bellman Error

Bellman Error

The Bellman error at a state $s$ measures how far the current value estimate is from satisfying the Bellman Equation: $\overset{ˉ}{δ}_{w} (s) = (\sum_{a} π (a ∣ s) \sum_{s^{'}, r} p (s^{'}, r ∣ s, a) [r + γ \overset{v}{^} (s^{'}, w)]) - \overset{v}{^} (s, w)$

Mean Squared Bellman Error ( $\overline{BE}$ )

$\overline{BE} (w) = \sum_{s} μ (s) [\overset{ˉ}{δ}_{w} (s)]^{2}$

Bellman Error Is Not Learnable

A key result from Ch 11.6: the $\overline{BE}$ cannot be learned from data alone — different MDPs can produce identical data but have different $\overline{BE}$ values. This is why gradient methods that minimize $\overline{BE}$ directly are problematic.

Alternative objectives: Projected Bellman Error (PBE), Mean Squared TD Error — these are learnable and used by Gradient-TD Methods.

Appears In

RL-L07 - Off-Policy RL with Approximation
RL-Book Ch11 - Off-Policy Methods with Approximation (§11.5-11.6)

Graph View

Bellman Error
Appears In

Backlinks

Bellman Equation
RL-Book Ch11 - Off-Policy Methods with Approximation
RL-L14 - Recap

Created with Quartz v4.5.2 © 2026

GitHub
Discord Community