Bootstrapping

Bootstrapping

In RL, bootstrapping means updating an estimate based partly on other estimates (rather than exclusively on actual observed values). The update target includes a current estimate of a value function.

Examples

  • Dynamic Programming: — uses , which is itself an estimate
  • TD(0): — uses
  • Monte Carlo Methods: — uses actual return , NOT bootstrapping

Trade-Off

Bootstrapping introduces bias (estimates are wrong initially) but reduces variance (don’t need to wait for the full noisy return). MC has zero bias but high variance. TD has some bias but much lower variance.

Role in the Deadly Triad

Bootstrapping is one of the three elements. Combined with Function Approximation and off-policy learning, it can cause divergence.

Appears In