Optimality and Approximation
The Limit of Exact Solutions
In large or continuous state spaces, finding the exact optimal policy or value function is computationally infeasible. Approximation refers to using parameterized functions (like neural networks or linear models) to represent the value functions, accepting a small amount of error to gain the ability to generalize across states.
Coverage vs. Accuracy
When we have millions of states (like in Go or Chess), we cannot store a table with one entry per state. Instead, we use a compact representation (a “function approximator”). This means:
- Generalization: Learning about one state helps us estimate the value of similar, unvisited states.
- Prioritization: We focus our limited “accuracy budget” on the states the agent actually visits.
Core Concepts
- Online Learning: By learning as the agent interacts with the world, the approximation naturally focuses on the most relevant parts of the state space (the On-Policy Distribution).
- Functional Forms: We typically approximate the value function as , where are the weights of the model.
- The “Deadly Triad”: A warning in RL that combining Function Approximation, Bootstrapping, and Off-policy learning can lead to instability and divergence.
Connections
- Solved using: Gradient Descent, Stochastic Gradient Descent
- Metric for success: Mean Squared Value Error (MSVE)
- Fundamental shift from: Tabular RL (where exact solutions are possible).