Episodic Semi-Gradient Control
Episodic Semi-Gradient Control
Extension of Semi-Gradient Methods to the control setting using action-value approximation .
Semi-Gradient Sarsa Update
Algorithm: Episodic Semi-Gradient Sarsa
───────────────────────────────────────
Initialize w arbitrarily
Loop for each episode:
S ← initial state; A ← ε-greedy(q̂(S,·,w))
Loop for each step:
Take A, observe R, S'
If S' is terminal:
w ← w + α[R - q̂(S,A,w)] ∇q̂(S,A,w)
Go to next episode
A' ← ε-greedy(q̂(S',·,w))
w ← w + α[R + γq̂(S',A',w) - q̂(S,A,w)] ∇q̂(S,A,w)
S ← S'; A ← A'With linear FA: , so .