Function Approximation

Definition

Function Approximation in RL

Function approximation replaces the lookup table used in tabular RL with a parameterized function $\overset{v}{^} (s, w) \approx v_{π} (s)$ (or $\overset{q}{^} (s, a, w) \approx q_{π} (s, a)$ ). Instead of storing one value per state, we learn a weight vector $w \in R^{d}$ where $d ≪ ∣ S ∣$ .

Why We Need It

Tabular methods store $V (s)$ for every state. Real problems have millions or billions of states (or continuous state spaces). You can’t visit every state, let alone store a value for each. Function approximation lets you generalize — update $w$ from one state, and the values of similar states change too.

The Prediction Objective

Mean Squared Value Error ( $\overline{V E}$ )

$\overline{V E} (w) = \sum_{s \in S} μ (s) [v_{π} (s) - \overset{v}{^} (s, w)]^{2}$

where:

$μ (s)$ — on-policy distribution (how often state $s$ is visited under $π$ )

$v_{π} (s)$ — true value (unknown)

$\overset{v}{^} (s, w)$ — our approximation

Stochastic Gradient Descent

SGD Update for Value Prediction

$w_{t + 1} = w_{t} + α [v_{π} (S_{t}) - \overset{v}{^} (S_{t}, w_{t})] \nabla_{w} \overset{v}{^} (S_{t}, w_{t})$

Problem: we don’t know $v_{π} (S_{t})$ . Replace it with a target:

MC target: $G_{t}$ → true gradient method (converges to local minimum of $\overline{V E}$ )

TD target: $R_{t + 1} + γ \overset{v}{^} (S_{t + 1}, w)$ → semi-gradient (not true gradient because target depends on $w$ )

Types of Function Approximators

Linear Function Approximation

$\overset{v}{^} (s, w) = w^{⊤} x (s) = \sum_{i = 1}^{d} w_{i} x_{i} (s)$

$x (s)$ is a feature vector
Simple, well-understood convergence guarantees
Feature design matters: Tile Coding, polynomials, Fourier basis, RBFs

Neural Network Function Approximation

$\overset{v}{^} (s, w) = f_{w} (s)$

Non-linear, can represent complex functions
Trained with backpropagation
Fewer convergence guarantees
Foundation of Deep Q-Network (DQN) and modern deep RL

Key Distinction: Tabular as Special Case

Tabular = Function Approximation with One-Hot Features

A lookup table is actually a special case of linear function approximation where the feature vector $x (s)$ is a one-hot vector (1 in position $s$ , 0 elsewhere). Then $\overset{v}{^} (s, w) = w_{s}$ — each state has its own weight. All tabular convergence guarantees follow from the more general FA framework.

Challenges

Generalization: Updating one state affects nearby states — can be good (efficiency) or bad (interference)
The Deadly Triad: Function approximation + bootstrapping + off-policy = potential divergence
Non-stationarity: Target values change as $w$ updates

Connections

Extends: Value Function (from tables to functions)
Methods: Semi-Gradient Methods, LSTD, Linear Function Approximation
Features: Tile Coding, Feature Construction
Deep version: Deep Q-Network (DQN), Neural Network Function Approximation
Danger: Deadly Triad

Study Notes

Explorer

Function Approximation

Function Approximation

Definition

The Prediction Objective

Stochastic Gradient Descent

Types of Function Approximators

Linear Function Approximation

Neural Network Function Approximation

Key Distinction: Tabular as Special Case

Challenges

Connections

Appears In

Graph View

Table of Contents

Backlinks