Semi-Gradient Methods
Definition
Semi-Gradient Methods
Semi-gradient methods are function approximation methods where the gradient is computed only with respect to the weight vector in the estimate being updated, ignoring the effect of on the target. They are called “semi-gradient” because they don’t follow the true gradient of any objective function.
Why “Semi”?
The Missing Half of the Gradient
Consider the TD(0) update target: .
The true gradient of the loss would require differentiating through both AND (which appears in the target). Semi-gradient methods treat the target as a constant — they only differentiate .
This makes the update simpler and often works well in practice, but it means we’re not doing true gradient descent on any well-defined loss function.
Semi-Gradient TD(0)
Semi-Gradient TD(0) Update
Note: the gradient is only of , NOT of the target .
Algorithm: Semi-Gradient TD(0) for estimating v̂ ≈ v_π
──────────────────────────────────────────────────────
Input: policy π, step-size α, differentiable v̂(s,w)
Initialize: w arbitrarily (e.g., w = 0)
Loop for each episode:
Initialize S
Loop for each step of episode:
Choose A ~ π(·|S)
Take action A, observe R, S'
If S' is terminal:
w ← w + α[R - v̂(S,w)] ∇v̂(S,w)
Go to next episode
w ← w + α[R + γv̂(S',w) - v̂(S,w)] ∇v̂(S,w)
S ← S'For Linear Function Approximation
With , the gradient simplifies:
So the update becomes:
Convergence Properties
- Linear semi-gradient TD(0) converges to the TD Fixed Point: where
- Not guaranteed to converge to the global minimum of
- With non-linear approximators (neural nets): no convergence guarantees in general
- Off-policy: can diverge (see Deadly Triad)
Semi-Gradient ≠ Convergence to Optimal
Linear semi-gradient TD doesn’t find the that minimizes . It finds the TD fixed point, which is bounded by times the best possible error. For close to 1, this bound can be loose.
Semi-Gradient Control
Semi-Gradient Sarsa
See Episodic Semi-Gradient Control for the full algorithm.
Connections
- Extends: Temporal Difference Learning to function approximation
- Types: Linear Function Approximation, Neural Network Function Approximation
- Alternative: LSTD (closed-form solution for linear case)
- Danger: Deadly Triad (off-policy + bootstrapping + FA)
- True gradient alternatives: Gradient-TD Methods (TDC, GTD2)