HJB-RL

A small reference implementation of grid-based value iteration for a 1D double-integrator (position/velocity) system with quadratic costs. The script discretizes the state and control spaces, performs value iteration with interpolation, and visualizes the resulting value function, policy, and a simulated closed-loop trajectory.

Requirements

Python 3.10+
matplotlib, numpy, scipy
Optional: uv for fast, locked installs (uv.lock is included)

Quickstart

Clone and enter the repo.

Create a virtual environment (example):

python -m venv .venv
source .venv/bin/activate

Install dependencies:
- With uv (uses uv.lock): uv sync
- Or with pip: pip install matplotlib scipy
Run the demo (opens plots):
```
python main.py
```

What the script does

Discretizes the state space (position, velocity) over a uniform grid and the control space over a set of accelerations.
Iterates the Bellman backup: for each grid point, sweep all controls, simulate the next state via the discrete dynamics, interpolate the current value estimate at that next state, and update the value and greedy policy.
Uses a quadratic stage cost (p^2 + v^2 + 0.5*u^2) and discount factor gamma=0.95.
After convergence, visualizes the value function (as sqrt(V) contours for readability), the optimal policy heatmap, and a sample closed-loop trajectory obtained by interpolating the policy during rollout.
Optionally runs a finite-horizon backward pass: specify a terminal cost D(x) for V_T(x), then compute V_t and time-varying policies back to t=0.

Files

main.py: Implementation of the discrete grid value-iteration solver and the double-integrator example with plotting and a sample rollout.
pyproject.toml: Minimal project metadata and dependencies.
uv.lock: Locked dependency set for reproducible installs with uv.

Adapting the solver

Replace discrete_dynamics and discrete_cost in main.py with your system; ensure they are vectorized over batches of states/actions.
Adjust grid_bounds, grid_res, and control_space when instantiating DiscreteGridVI to match your state/control ranges.
Tune gamma, tol, and max_iter in solve() to balance accuracy and runtime; larger grids or action sets increase compute.
To run a finite-horizon solve, define terminal_cost(state_batch) that returns D(x) for each flattened state row, then call solve_finite(horizon=T, terminal_cost=terminal_cost) to get V_0 and the time-varying policy list [pi_0, ..., pi_{T-1}].

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.python-version		.python-version
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HJB-RL

Requirements

Quickstart

What the script does

Files

Adapting the solver

About

Uh oh!

Releases

Packages

Languages

Beau-Coup/value-iteration

Folders and files

Latest commit

History

Repository files navigation

HJB-RL

Requirements

Quickstart

What the script does

Files

Adapting the solver

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages