Offline Reinforcement Learning

Offline RL

Offline RL (also called batch RL) learns a policy from a fixed dataset of previously collected transitions, without any further interaction with the environment.

Key challenge: distribution shift — the learned policy may select actions not well-represented in the dataset, leading to unreliable value estimates.

Solutions: Conservative Q-Learning (CQL), BCQ, BEAR, IQL — all constrain the learned policy to stay close to the dataset distribution.

Appears In

RL-L08 - Deep RL Value-Based

Study Notes

Explorer

Offline Reinforcement Learning

Offline Reinforcement Learning

Appears In

Graph View

Table of Contents

Backlinks