Reparameterization Trick

Reparameterization Trick

A technique for computing gradients through stochastic sampling operations. Instead of sampling $a \sim π_{θ} (\cdot ∣ s)$ (which blocks gradient flow), express the sample as a deterministic, differentiable function of the parameters and independent noise: $a = f_{θ} (ϵ; s)$ where $ϵ \sim N (0, I)$ .

Intuition

Making Randomness Differentiable

Normally, you can’t backpropagate through a random sampling step. The reparameterization trick moves the randomness into an input variable $ϵ$ that doesn’t depend on the parameters $θ$ . The actual sample becomes a deterministic transformation of $ϵ$ , so gradients can flow through $f_{θ}$ .

For Gaussian Policies

If $π_{θ} (a ∣ s) = N (μ_{θ} (s), σ_{θ} (s))$ , then:

$a = μ_{θ} (s) + σ_{θ} (s) ⊙ ϵ, ϵ \sim N (0, I)$

Now $\nabla_{θ} a$ is well-defined: $\nabla_{θ} a = \nabla_{θ} μ_{θ} (s) + \nabla_{θ} σ_{θ} (s) ⊙ ϵ$

Application in SAC

In Soft Actor-Critic (SAC), the reparameterization trick enables computing the policy gradient as:

$\nabla_{θ} J (π) = \nabla_{θ} E_{ϵ} [Q (s, f_{θ} (ϵ; s)) - α lo g π_{θ} (f_{θ} (ϵ; s) ∣ s)]$

The expectation over $ϵ$ doesn’t depend on $θ$ , so the gradient moves inside the expectation.

Key Properties

Enables lower-variance gradient estimates compared to the log-derivative trick (REINFORCE)
Works for continuous distributions that can be expressed as transformations of base distributions
Standard technique in variational autoencoders (VAEs) and modern deep RL
Requires the sampling distribution to be reparameterizable (Gaussian, etc.)

Connections

Used in Soft Actor-Critic (SAC) for policy optimization
Related to Policy Gradient Methods — an alternative way to compute policy gradients
Also used in Variational Autoencoders (VAEs)
Provides lower variance than the REINFORCE log-derivative estimator

Appears In

RL-L11 - SAC, Decision Transformer & Diffuser

Study Notes

Explorer

Reparameterization Trick

Reparameterization Trick

Intuition

For Gaussian Policies

Application in SAC

Key Properties

Connections

Appears In

Graph View

Table of Contents

Backlinks