Policy

Definition

Policy

A policy is a mapping from states to probabilities of selecting each possible action. If an agent follows policy at time , then is the probability that given .

  • Deterministic policy: — maps each state to exactly one action
  • Stochastic policy: — a probability distribution over actions for each state, where

Types of Policies

Greedy Policy

Always picks the action with the highest estimated value. Pure exploitation, no exploration.

ε-Greedy Policy (Epsilon-Greedy Policy)

Mostly greedy, but with probability picks a random action. Balances Exploration vs Exploitation.

Softmax / Boltzmann Policy

Temperature controls exploration: high → uniform, low → greedy.

Optimal Policy

Optimal Policy

A policy is optimal if for all and all policies . There always exists at least one optimal policy for any finite MDP.

All optimal policies share the same optimal value functions and . Given :

Policy in Different RL Methods

MethodHow policy is used
Policy IterationExplicit policy, alternates evaluation and improvement
Value IterationImplicit policy (greedy w.r.t. current )
Monte Carlo MethodsGenerates episodes, improved via ε-greedy
SARSAOn-policy: follows and improves ε-greedy
Q-LearningOff-policy: follows ε-greedy, learns about greedy
REINFORCEDirectly parameterized:

On-Policy vs Off-Policy

Key Distinction

  • Behavior policy : the policy used to generate data (select actions)
  • Target policy : the policy being evaluated or improved
  • On-policy: (same policy)
  • Off-policy: (different policies, requires Importance Sampling correction)

See On-Policy vs Off-Policy for details.

Connections

Appears In