Study Notes

❯

❯

TD Error

Jun 06, 20261 min read

tabular-methods
key-formula

TD Error

TD Error

$δ_{t} = R_{t + 1} + γV (S_{t + 1}) - V (S_{t})$

where:

$R_{t + 1} + γV (S_{t + 1})$ — the TD target (better estimate of $V (S_{t})$ )

$V (S_{t})$ — current estimate

$δ_{t}$ — the “surprise”: how much better (or worse) the next step was than expected

What It Measures

The TD error is the difference between a new estimate of the value (based on what just happened) and the old estimate. If $δ_{t} > 0$ , things went better than expected; if $δ_{t} < 0$ , worse. The update $V (S_{t}) \leftarrow V (S_{t}) + α δ_{t}$ nudges the estimate toward the new evidence.

The TD error is the fundamental signal for all Temporal Difference Learning methods. It also appears in:

SARSA: $δ_{t} = R_{t + 1} + γ Q (S_{t + 1}, A_{t + 1}) - Q (S_{t}, A_{t})$
Q-Learning: $δ_{t} = R_{t + 1} + γ max_{a} Q (S_{t + 1}, a) - Q (S_{t}, A_{t})$
Semi-Gradient Methods: Same form, drives weight updates

Appears In

RL-L04 - Temporal Difference Learning, RL-L05 - Tabular to Approximation

Graph View

TD Error
Appears In

Backlinks

Fourier Basis
TD(0)
Temporal Difference Learning
RL-Book Ch10 - On-Policy Control with Approximation
RL-Book Ch16 - Applications and Case Studies
RL-Book Ch6 - Temporal-Difference Learning
RL-L14 - Recap

Created with Quartz v4.5.2 © 2026

GitHub
Discord Community