Value Function

Definition

Value Function

A value function estimates “how good” it is for an agent to be in a given state (or to take a given action in a state). “How good” is defined in terms of expected future rewards — specifically, the expected Return.

There are two types:

State-Value Function

State-Value Function

The expected Return when starting in state and following Policy thereafter.

Action-Value Function

Action-Value Function

The expected Return when starting in state , taking action , and following thereafter.

Relationship

vs

  • : “How good is this state?” (averaged over what my policy would do)
  • : “How good is taking this specific action in this state?”

For control (finding the best policy), we usually need , because choosing requires knowing the model , while doesn’t.

Optimal Value Functions

Optimal State-Value Function

The best possible value of state under any policy.

Optimal Action-Value Function

The best possible value of taking action in state .

If we know , the optimal Policy is trivially:

Estimation Methods

MethodHow it estimates or
Dynamic ProgrammingSolves Bellman Equation exactly (requires model)
Monte Carlo MethodsAverages sampled returns
Temporal Difference LearningBootstraps:
Function ApproximationParameterized trained with SGD

Key Properties

  • Value functions satisfy the Bellman Equation (recursive relationship)
  • There exists a partial ordering over policies defined by value functions: iff for all
  • At least one policy is better than or equal to all others — the optimal policy
  • All optimal policies share the same and

Tabular vs Approximate

In tabular settings, is stored as a table with one entry per state. With Function Approximation, we use — a parameterized function. The fundamental concept is the same, but convergence guarantees differ.

Connections

Appears In