RL-CA01: Coding Assignment 1 — Dynamic Programming

Overview

Implementation of Policy Iteration and Value Iteration algorithms on a gridworld environment.

Files:

DP_lab.ipynb — Main notebook
gridworld.py — Gridworld environment
dp_autograde.py — Autograding tests

What You Implement

Policy Evaluation: Iteratively compute $V^{π} (s)$ for a given policy
Policy Improvement: Compute greedy policy w.r.t. current $V$
Policy Iteration: Alternate evaluation + improvement until convergence
Value Iteration: Single update combining both steps: $V (s) \leftarrow max_{a} \sum_{s^{'}, r} p (s^{'}, r ∣ s, a) [r + γV (s^{'})]$

Key Implementation Details

Policy Evaluation

# Core loop: iterate until convergence
while delta > theta:
    for s in states:
        v = V[s]
        V[s] = sum(pi[s,a] * sum(p * (r + gamma * V[s_next]) 
                    for p, s_next, r, done in env.P[s][a])
                    for a in actions)
        delta = max(delta, abs(v - V[s]))

Value Iteration

# Core: combine evaluation + improvement in one step
V[s] = max(sum(p * (r + gamma * V[s_next]) 
           for p, s_next, r, done in env.P[s][a])
           for a in actions)

Key Takeaways

Policy iteration converges in fewer outer iterations but each is expensive (full evaluation)
Value iteration is simpler (one sweep per iteration) and often faster in total wall-clock time
Both require the full model $p (s^{'}, r ∣ s, a)$ — this is a limitation addressed by Monte Carlo Methods and Temporal Difference Learning

See RL-HW01 - Homework 1 for theoretical questions about these algorithms.

Study Notes

Explorer

RL-CA01 - Dynamic Programming

RL-CA01: Coding Assignment 1 — Dynamic Programming

Overview

What You Implement

Key Implementation Details

Policy Evaluation

Value Iteration

Key Takeaways

Graph View

Table of Contents

Backlinks

Study Notes

Explorer

RL-CA01 - Dynamic Programming

RL-CA01: Coding Assignment 1 — Dynamic Programming

Overview

What You Implement

Key Implementation Details

Policy Evaluation

Value Iteration

Key Takeaways

Related Homework

Graph View

Table of Contents

Backlinks