Reinforcement Learning

Reinforcement Learning

Reinforcement Learning is a computational approach to learning from interaction. An agent learns to make decisions by taking actions in an environment and receiving reward signals, with the goal of maximizing cumulative reward over time.

Key Characteristics

Trial and error: No supervisor tells the agent what to do — it must discover good actions through experience
Delayed reward: Actions may not yield immediate benefit; their consequences play out over time
Exploration-exploitation trade-off: Must balance trying new things vs. using known good strategies
Sequential decision-making: Current actions affect future states and rewards

Elements of RL

Policy $π$ : Maps states to actions (what to do)
Value Function: Estimates long-term value of states/actions (what’s good)
Reward signal: Immediate feedback (what’s good right now)
Model (optional): Agent’s representation of environment dynamics

The RL Landscape

graph TD
    RL[Reinforcement Learning] --> TB[Tabular Methods]
    RL --> AP[Approximate Methods]
    TB --> DP[Dynamic Programming]
    TB --> MC[Monte Carlo]
    TB --> TD[Temporal Difference]
    AP --> VFA[Value Function Approx]
    AP --> PG[Policy Gradient]
    VFA --> Linear[Linear FA]
    VFA --> Deep[Deep RL]

Study Notes

Explorer

Reinforcement Learning

Reinforcement Learning

Key Characteristics

Elements of RL

The RL Landscape

Appears In

Graph View

Table of Contents

Backlinks