Discount Factor ()
Definition
Discount Factor
The discount factor determines the present value of future rewards. A reward received time steps in the future is worth times what it would be worth if received immediately.
What Controls
- : Myopic — only cares about immediate reward
- : Far-sighted — future rewards are just as important as immediate ones (only valid for episodic tasks)
- : Typical values — balances short and long-term
Why Discount?
- Mathematical convenience: Ensures the Return is finite for continuing (infinite-horizon) tasks when rewards are bounded:
- Uncertainty: Future is less predictable — discounting is a form of “doubt” about far-future rewards
- Preference for sooner rewards: Like interest rates in economics
in Continuing Tasks
If and the task never terminates, the return can diverge to infinity. Only use for episodic tasks.
Connections
- Scales: Return
- Appears in: Bellman Equation, Value Function
- Affects convergence of: Dynamic Programming, Temporal Difference Learning