Reparameterization Trick
Reparameterization Trick
A technique for computing gradients through stochastic sampling operations. Instead of sampling (which blocks gradient flow), express the sample as a deterministic, differentiable function of the parameters and independent noise: where .
Intuition
Making Randomness Differentiable
Normally, you can’t backpropagate through a random sampling step. The reparameterization trick moves the randomness into an input variable that doesn’t depend on the parameters . The actual sample becomes a deterministic transformation of , so gradients can flow through .
For Gaussian Policies
If , then:
Now is well-defined:
Application in SAC
In Soft Actor-Critic (SAC), the reparameterization trick enables computing the policy gradient as:
The expectation over doesn’t depend on , so the gradient moves inside the expectation.
Key Properties
- Enables lower-variance gradient estimates compared to the log-derivative trick (REINFORCE)
- Works for continuous distributions that can be expressed as transformations of base distributions
- Standard technique in variational autoencoders (VAEs) and modern deep RL
- Requires the sampling distribution to be reparameterizable (Gaussian, etc.)
Connections
- Used in Soft Actor-Critic (SAC) for policy optimization
- Related to Policy Gradient Methods — an alternative way to compute policy gradients
- Also used in Variational Autoencoders (VAEs)
- Provides lower variance than the REINFORCE log-derivative estimator