Belief State
Belief State
A belief state is a probability distribution over the hidden states of a POMDP, representing the agent’s uncertainty about which state it is in given its history of observations and actions:
Bayesian Update
Belief State Update
After taking action and observing , the belief is updated via Bayes’ rule:
where:
- — observation likelihood (how likely is this observation if the true state is )
- — transition probability
- — prior belief about state
- The result is normalized to sum to 1
Intuition
Tracking Where You Might Be
Imagine you’re in a dark room. You can’t see where you are (hidden state), but you can feel around (observations). Your belief state is your mental map of where you think you are — a probability over all possible locations. Each time you move and get a new sensory input, you update this mental map using Bayes’ rule.
The Tiger Problem (Classic Example)
A tiger is behind one of two doors. The agent can:
- Open Left (OL): reward +100 if treasure, -100 if tiger
- Open Right (OR): reward +100 if treasure, -100 if tiger
- Listen (L): reward -1, get noisy observation (85% correct)
Belief evolution (from the lecture, starting at ):
| Step | Action | Observation | Best action value | |
|---|---|---|---|---|
| Start | — | — | 0.50 | Listen: |
| 1 | Listen | Hear Left | 0.85 | Listen: |
| 2 | Listen | Hear Left | 0.97 | Listen: |
| 3 | Listen | Hear Left | ~0.995 | Open Right becomes best |
After enough consistent observations, the agent becomes confident enough to open the door.
Key Properties
- The belief state is a sufficient statistic for the history — it captures all relevant information
- The belief state MDP is fully observable (we know what belief we’re in)
- Planning (e.g., Dynamic Programming) in belief space yields the optimal POMDP policy
- Belief states live in a continuous space (a probability simplex) even if the underlying state space is discrete
Advantages and Disadvantages
| Advantages | Disadvantages |
|---|---|
| Concrete meaning: probability over latent states | Requires knowledge of underlying models , |
| Relatively compact: dim() = | |
| Can be updated recursively | Only practical for discrete state spaces |
| Converts POMDP to (continuous) MDP | Continuous belief space makes planning hard |
Connections
- Central concept in POMDP theory
- Updated via Bayes’ Theorem
- Alternative: Predictive State Representation (doesn’t require model knowledge)
- Planning in belief space uses Dynamic Programming or Value Iteration