RMSProp

Definition

RMSProp

RMSProp (Root Mean Square Propagation) is an adaptive learning rate optimization method designed to tackle the radically diminishing learning rates of AdaGrad. It limits the influence of historical gradients by using an exponentially decaying average of squared gradients.

Intuition

Normalizing the Gradient

If a gradient is consistently large, we want to slow down in that direction to avoid overshooting. If a gradient is small, we want to speed up. RMSProp does this by dividing the current gradient by a “running average” of recent gradient magnitudes. This keeps the updates at a similar scale across all dimensions.

Mathematical Formulation

For each parameter :

  1. Accumulate Squared Gradient:
  2. Update Weights:

where:

  • — learning rate
  • — forgetting factor (decay rate, typically 0.9)
  • — current gradient
  • — small constant for stability

RMSProp vs AdaGrad

  • AdaGrad: Accumulates all past squared gradients. This causes the learning rate to eventually shrink to zero, stopping learning prematurely.
  • RMSProp: Only “remembers” recent gradients via the decay factor . This allows the optimizer to continue learning indefinitely in non-stationary environments.

Connections

Appears In