Exploring Starts

Exploring Starts

The assumption that every episode begins with a randomly chosen state-action pair , with every pair having non-zero probability. This guarantees that all state-action pairs will be visited infinitely often.

  • Used in Monte Carlo ES (Exploring Starts) algorithm for MC control
  • Ensures sufficient exploration for convergence to optimal policy
  • Unrealistic in practice — can’t always control the starting state (e.g., in real-world environments)
  • Alternatives: Epsilon-Greedy Policy (on-policy), Importance Sampling (off-policy)

Appears In