Deep Recurrent Q-Learning (DRQN)

Deep Recurrent Q-Learning

An extension of Deep Q-Network (DQN) for partially observable environments. DRQN replaces the first fully connected layer of DQN with an LSTM (Long Short-Term Memory) layer, enabling the network to maintain an internal memory across timesteps.

Architecture

Standard DQN:    [Observation] → [Conv layers] → [FC layers] → Q(s,a)

DRQN:            [Observation] → [Conv layers] → [LSTM] → [FC layers] → Q(o,h,a)

The LSTM processes a sequence of observations over time, maintaining a hidden state that aggregates information from past observations. This hidden state serves as an approximate internal state for the POMDP.

Training Strategies

Two approaches for unrolling the LSTM during training:

Bootstrapped random updates: Sample random starting points in episodes, unroll LSTM for a fixed number of steps. The LSTM hidden state starts from zero.
Sequential updates: Process episodes sequentially, carrying the LSTM hidden state forward. More accurate but less diverse sampling.

Key Properties

Handles partial observability by learning to aggregate information over time
The LSTM hidden state acts as a learned internal state (approximating a Belief State)
Works with Experience Replay, though care is needed with LSTM state initialization
Simple modification to DQN — just swap one layer

Connections

Extends Deep Q-Network (DQN) to partially observable settings
Addresses Partial Observability / POMDP
Alternative to frame stacking (which is a simpler approximation)
Uses LSTM (a type of recurrent neural network)

Appears In

RL-L13 - Partial Observability
Hausknecht & Stone, “Deep Recurrent Q-Learning for Partially Observable MDPs” (2015)

Study Notes

Explorer

Deep Recurrent Q-Learning

Deep Recurrent Q-Learning (DRQN)

Architecture

Training Strategies

Key Properties

Connections

Appears In

Graph View

Table of Contents

Backlinks