Gradient-TD Methods

Gradient-TD Methods

Gradient-TD methods (GTD, GTD2, TDC) are true stochastic gradient descent methods that converge even with off-policy data and Linear Function Approximation. They avoid the Deadly Triad by performing gradient descent on a proper objective function (the projected Bellman error or the mean squared TD error).

Why Needed

Semi-Gradient Methods can diverge with off-policy + function approximation. Gradient-TD methods fix this by computing the full gradient (including the gradient through the target).

Key Algorithms

GTD2

GTD2 Update

Uses an auxiliary weight vector to estimate the expected TD error direction.

TDC (TD with gradient Correction)

TDC Update

First term = semi-gradient TD. Second term = correction that makes it a true gradient method.

Trade-offs

  • ✅ Converges off-policy with linear FA (solves the deadly triad)
  • ❌ Two sets of weights to maintain ( and )
  • ❌ Two step sizes to tune ( and )
  • ❌ Slower convergence than semi-gradient TD (when semi-gradient converges)

Connections

Appears In