Monte Carlo Control
Finding optimal policies using Monte Carlo Methods. Learns (not ) to enable model-free policy improvement.
Three main variants:
- MC with Exploring Starts: Guarantees coverage but unrealistic
- On-policy MC (Epsilon-Greedy Policy): Practical, converges to best ε-soft policy
- Off-policy MC (Importance Sampling): Learns optimal policy from exploratory data
All follow the Generalized Policy Iteration framework.
See RL-L03 - Monte Carlo Methods for full algorithms.