Partial Observability

Partial Observability

A setting where the agent cannot directly observe the true state of the environment. Instead, it receives observations that provide incomplete or noisy information about the underlying state. The standard MDP assumption of full state access is violated.

Intuition

Seeing Through a Keyhole

Full observability is like playing chess — you can see the entire board. Partial observability is like playing poker — you can only see your own cards. Your decisions must account for uncertainty about what you can’t see.

Handling Partial Observability

Three approaches (from least to most approximate):

ApproachIdeaRequirementsLimitations
Belief StateProbability distribution over hidden statesFull model (transitions, observations)Discrete states only, model needed
Predictive State RepresentationPredictions about future observationsCore testsTabular setting
ApproximateUse recent observations as stateNothing extraNot optimal, no guarantees

Approximate Methods in Practice

  1. Single observation: — simplest, often “good enough”
  2. Frame stacking: — used in Atari DQN (4 frames)
  3. Recurrent networks: Deep Recurrent Q-Learning — LSTM maintains internal memory

Practical Reality

In practice, many successful RL systems simply treat observations as states (). With function approximation, there’s typically no guarantee that the features define a Markov state anyway. As long as the system is “close enough” to Markov, this can work well enough.

Connections

Appears In