RMSProp

Definition

RMSProp

RMSProp (Root Mean Square Propagation) is an adaptive learning rate optimization method designed to tackle the radically diminishing learning rates of AdaGrad. It limits the influence of historical gradients by using an exponentially decaying average of squared gradients.

Intuition

Normalizing the Gradient

If a gradient is consistently large, we want to slow down in that direction to avoid overshooting. If a gradient is small, we want to speed up. RMSProp does this by dividing the current gradient by a “running average” of recent gradient magnitudes. This keeps the updates at a similar scale across all dimensions.

Mathematical Formulation

For each parameter $w$ :

Accumulate Squared Gradient: $v_{t} = ρ v_{t - 1} + (1 - ρ) g_{t}^{2}$
Update Weights: $w \leftarrow w - \frac{α}{v _{t} + ϵ} g_{t}$

where:

$α$ — learning rate
$ρ$ — forgetting factor (decay rate, typically 0.9)
$g_{t}$ — current gradient $\frac{\partial L}{\partial w}$
$ϵ$ — small constant for stability

RMSProp vs AdaGrad

AdaGrad: Accumulates all past squared gradients. This causes the learning rate to eventually shrink to zero, stopping learning prematurely.
RMSProp: Only “remembers” recent gradients via the decay factor $ρ$ . This allows the optimizer to continue learning indefinitely in non-stationary environments.

Connections

Sub-component of: Adam
Improved version of: Adagrad
Context: Optimization for Neural Networks

Appears In

Deep Learning Optimization
RL-L08 - Deep RL Value-Based (often used in A3C)

Study Notes

Explorer

RMSProp

RMSProp

Definition

Intuition

Mathematical Formulation

RMSProp vs AdaGrad

Connections

Appears In

Graph View

Table of Contents

Backlinks