Return
Definition
Return
The return is the total accumulated reward from time step onward. It is the quantity that RL agents seek to maximize (in expectation).
Return (Discounted)
Variants
Episodic (undiscounted or discounted):
where is the terminal time step. With , this is just the sum of all remaining rewards.
Continuing (must have ):
Converges as long as and rewards are bounded.
Recursive Property
Recursive Return
This is the key recursive relationship that enables Bootstrapping and the Bellman Equation.
Why This Matters
You don’t need to compute the entire sum from scratch. The return at time equals the immediate reward plus the discounted return from the next step. This decomposition is the foundation of Dynamic Programming and Temporal Difference Learning.
Role in RL Methods
- Monte Carlo Methods: Estimate by averaging actual returns observed after visiting state
- Temporal Difference Learning: Approximates with (one-step bootstrap)
- Value Function: Defined as the expected return:
Connections
- Used to define: Value Function
- Discounted by: Discount Factor
- Estimated by: Monte Carlo Methods (full), Temporal Difference Learning (bootstrapped)
- Recursive structure enables: Bellman Equation