Skip to content

Beau-Coup/value-iteration

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HJB-RL

A small reference implementation of grid-based value iteration for a 1D double-integrator (position/velocity) system with quadratic costs. The script discretizes the state and control spaces, performs value iteration with interpolation, and visualizes the resulting value function, policy, and a simulated closed-loop trajectory.

Requirements

  • Python 3.10+
  • matplotlib, numpy, scipy
  • Optional: uv for fast, locked installs (uv.lock is included)

Quickstart

  1. Clone and enter the repo.
  2. Create a virtual environment (example):
    python -m venv .venv
    source .venv/bin/activate
  3. Install dependencies:
    • With uv (uses uv.lock): uv sync
    • Or with pip: pip install matplotlib scipy
  4. Run the demo (opens plots):
    python main.py

What the script does

  • Discretizes the state space (position, velocity) over a uniform grid and the control space over a set of accelerations.
  • Iterates the Bellman backup: for each grid point, sweep all controls, simulate the next state via the discrete dynamics, interpolate the current value estimate at that next state, and update the value and greedy policy.
  • Uses a quadratic stage cost (p^2 + v^2 + 0.5*u^2) and discount factor gamma=0.95.
  • After convergence, visualizes the value function (as sqrt(V) contours for readability), the optimal policy heatmap, and a sample closed-loop trajectory obtained by interpolating the policy during rollout.
  • Optionally runs a finite-horizon backward pass: specify a terminal cost D(x) for V_T(x), then compute V_t and time-varying policies back to t=0.

Files

  • main.py: Implementation of the discrete grid value-iteration solver and the double-integrator example with plotting and a sample rollout.
  • pyproject.toml: Minimal project metadata and dependencies.
  • uv.lock: Locked dependency set for reproducible installs with uv.

Adapting the solver

  • Replace discrete_dynamics and discrete_cost in main.py with your system; ensure they are vectorized over batches of states/actions.
  • Adjust grid_bounds, grid_res, and control_space when instantiating DiscreteGridVI to match your state/control ranges.
  • Tune gamma, tol, and max_iter in solve() to balance accuracy and runtime; larger grids or action sets increase compute.
  • To run a finite-horizon solve, define terminal_cost(state_batch) that returns D(x) for each flattened state row, then call solve_finite(horizon=T, terminal_cost=terminal_cost) to get V_0 and the time-varying policy list [pi_0, ..., pi_{T-1}].

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages