Exploration vs Exploitation

Definition

Exploration vs Exploitation Dilemma

The fundamental trade-off in RL between:

Exploitation: Choosing the action with the highest estimated value (using current knowledge to maximize immediate reward)

Exploration: Choosing non-greedy actions to gather more information about their values (potentially discovering better long-term strategies)

The Restaurant Analogy

You know a good restaurant nearby (exploitation). But there might be an even better one you haven’t tried (exploration). Every night you eat at the known-good place, you miss the chance to discover something better. But every night you try somewhere random, you risk a bad meal. The optimal strategy is somewhere in between.

Why It Matters

If you only exploit: You may converge to a suboptimal policy because you never discovered better actions
If you only explore: You waste time on clearly bad actions and never capitalize on what you’ve learned
The optimal balance depends on the time horizon: more exploration early, more exploitation later

Methods for Balancing

Method	Type	Key Idea
Epsilon-Greedy Policy	Action selection	Random action with probability $ε$
Upper Confidence Bound	Action selection	Bonus for uncertainty: prefer under-explored actions
Optimistic Initial Values	Initialization	High initial values drive early exploration
Gradient Bandit	Policy gradient	Softmax over learned preferences
Exploring Starts	Episode init	Random starting state-action pairs
Boltzmann/Softmax	Action selection	Temperature-controlled randomness

In Different RL Contexts

Multi-Armed Bandit: Pure exploration-exploitation (no state transitions)
Monte Carlo Methods: ε-greedy or exploring starts for exploration
Temporal Difference Learning: ε-greedy during training (SARSA, Q-Learning)
Deep RL: ε-greedy with decay, entropy regularization, curiosity-driven exploration

Connections

Core to: Multi-Armed Bandit, Monte Carlo Control
Methods: Epsilon-Greedy Policy, Upper Confidence Bound, Exploring Starts
Affects: convergence speed, optimality of learned policy

Study Notes

Explorer

Exploration vs Exploitation

Exploration vs Exploitation

Definition

Why It Matters

Methods for Balancing

In Different RL Contexts

Connections

Appears In

Graph View

Table of Contents

Backlinks