Semi-Gradient Methods

Definition

Semi-Gradient Methods

Semi-gradient methods are function approximation methods where the gradient is computed only with respect to the weight vector in the estimate being updated, ignoring the effect of $w$ on the target. They are called “semi-gradient” because they don’t follow the true gradient of any objective function.

Why “Semi”?

The Missing Half of the Gradient

Consider the TD(0) update target: $R_{t + 1} + γ \overset{v}{^} (S_{t + 1}, w)$ .

The true gradient of the loss $[v_{π} (S_{t}) - \overset{v}{^} (S_{t}, w)]^{2}$ would require differentiating through both $\overset{v}{^} (S_{t}, w)$ AND $\overset{v}{^} (S_{t + 1}, w)$ (which appears in the target). Semi-gradient methods treat the target as a constant — they only differentiate $\overset{v}{^} (S_{t}, w)$ .

This makes the update simpler and often works well in practice, but it means we’re not doing true gradient descent on any well-defined loss function.

Semi-Gradient TD(0)

Semi-Gradient TD(0) Update

$w_{t + 1} = w_{t} + α δ_{t} (TD error) [R_{t + 1} + γ \overset{v}{^} (S_{t + 1}, w_{t}) - \overset{v}{^} (S_{t}, w_{t})] \nabla_{w} \overset{v}{^} (S_{t}, w_{t})$

Note: the gradient $\nabla$ is only of $\overset{v}{^} (S_{t}, w)$ , NOT of the target $R_{t + 1} + γ \overset{v}{^} (S_{t + 1}, w)$ .

Algorithm: Semi-Gradient TD(0) for estimating v̂ ≈ v_π
──────────────────────────────────────────────────────
Input: policy π, step-size α, differentiable v̂(s,w)
Initialize: w arbitrarily (e.g., w = 0)
 
Loop for each episode:
  Initialize S
  Loop for each step of episode:
    Choose A ~ π(·|S)
    Take action A, observe R, S'
    If S' is terminal:
      w ← w + α[R - v̂(S,w)] ∇v̂(S,w)
      Go to next episode
    w ← w + α[R + γv̂(S',w) - v̂(S,w)] ∇v̂(S,w)
    S ← S'

For Linear Function Approximation

With $\overset{v}{^} (s, w) = w^{⊤} x (s)$ , the gradient simplifies: $\nabla_{w} \overset{v}{^} (s, w) = x (s)$

So the update becomes: $w_{t + 1} = w_{t} + α δ_{t} x (S_{t})$

Convergence Properties

Linear semi-gradient TD(0) converges to the TD Fixed Point: $w_{T D}$ where $\overline{V E} (w_{T D}) \leq \frac{1}{1 - γ} min_{w} \overline{V E} (w)$
Not guaranteed to converge to the global minimum of $\overline{V E}$
With non-linear approximators (neural nets): no convergence guarantees in general
Off-policy: can diverge (see Deadly Triad)

Semi-Gradient ≠ Convergence to Optimal

Linear semi-gradient TD doesn’t find the $w$ that minimizes $\overline{V E}$ . It finds the TD fixed point, which is bounded by $\frac{1}{1 - γ}$ times the best possible error. For $γ$ close to 1, this bound can be loose.

Semi-Gradient Control

Semi-Gradient Sarsa

$w_{t + 1} = w_{t} + α [R_{t + 1} + γ \overset{q}{^} (S_{t + 1}, A_{t + 1}, w_{t}) - \overset{q}{^} (S_{t}, A_{t}, w_{t})] \nabla_{w} \overset{q}{^} (S_{t}, A_{t}, w_{t})$

See Episodic Semi-Gradient Control for the full algorithm.

Connections

Extends: Temporal Difference Learning to function approximation
Types: Linear Function Approximation, Neural Network Function Approximation
Alternative: LSTD (closed-form solution for linear case)
Danger: Deadly Triad (off-policy + bootstrapping + FA)
True gradient alternatives: Gradient-TD Methods (TDC, GTD2)

Study Notes

Explorer

Semi-Gradient Methods

Semi-Gradient Methods

Definition

Why “Semi”?

Semi-Gradient TD(0)

For Linear Function Approximation

Convergence Properties

Semi-Gradient Control

Semi-Gradient Sarsa

Connections

Appears In

Graph View

Table of Contents

Backlinks