Gradient-TD Methods

Gradient-TD Methods

Gradient-TD methods (GTD, GTD2, TDC) are true stochastic gradient descent methods that converge even with off-policy data and Linear Function Approximation. They avoid the Deadly Triad by performing gradient descent on a proper objective function (the projected Bellman error or the mean squared TD error).

Why Needed

Semi-Gradient Methods can diverge with off-policy + function approximation. Gradient-TD methods fix this by computing the full gradient (including the gradient through the target).

Key Algorithms

GTD2

GTD2 Update

$w_{t + 1} = w_{t} + α (x_{t} - γ x_{t + 1}) (x_{t}^{⊤} v_{t})$ $v_{t + 1} = v_{t} + β (δ_{t} - x_{t}^{⊤} v_{t}) x_{t}$

Uses an auxiliary weight vector $v$ to estimate the expected TD error direction.

TDC (TD with gradient Correction)

TDC Update

$w_{t + 1} = w_{t} + α [δ_{t} x_{t} - γ (x_{t}^{⊤} v_{t}) x_{t + 1}]$ $v_{t + 1} = v_{t} + β (δ_{t} - x_{t}^{⊤} v_{t}) x_{t}$

First term = semi-gradient TD. Second term = correction that makes it a true gradient method.

Trade-offs

✅ Converges off-policy with linear FA (solves the deadly triad)
❌ Two sets of weights to maintain ( $w$ and $v$ )
❌ Two step sizes to tune ( $α$ and $β$ )
❌ Slower convergence than semi-gradient TD (when semi-gradient converges)

Connections

Fixes: Deadly Triad (for linear case)
Alternative to: Semi-Gradient Methods (off-policy)
Related: LSTD (also solves the same fixed point, but in closed form)

Study Notes

Explorer

Gradient-TD Methods

Gradient-TD Methods

Why Needed

Key Algorithms

GTD2

TDC (TD with gradient Correction)

Trade-offs

Connections

Appears In

Graph View

Table of Contents

Backlinks