TD Error
TD Error
where:
- — the TD target (better estimate of )
- — current estimate
- — the “surprise”: how much better (or worse) the next step was than expected
What It Measures
The TD error is the difference between a new estimate of the value (based on what just happened) and the old estimate. If , things went better than expected; if , worse. The update nudges the estimate toward the new evidence.
The TD error is the fundamental signal for all Temporal Difference Learning methods. It also appears in:
- SARSA:
- Q-Learning:
- Semi-Gradient Methods: Same form, drives weight updates