TD Error

TD Error

where:

  • — the TD target (better estimate of )
  • — current estimate
  • — the “surprise”: how much better (or worse) the next step was than expected

What It Measures

The TD error is the difference between a new estimate of the value (based on what just happened) and the old estimate. If , things went better than expected; if , worse. The update nudges the estimate toward the new evidence.

The TD error is the fundamental signal for all Temporal Difference Learning methods. It also appears in:

  • SARSA:
  • Q-Learning:
  • Semi-Gradient Methods: Same form, drives weight updates

Appears In