A small reference implementation of grid-based value iteration for a 1D double-integrator (position/velocity) system with quadratic costs. The script discretizes the state and control spaces, performs value iteration with interpolation, and visualizes the resulting value function, policy, and a simulated closed-loop trajectory.
- Python 3.10+
matplotlib,numpy,scipy- Optional:
uvfor fast, locked installs (uv.lockis included)
- Clone and enter the repo.
- Create a virtual environment (example):
python -m venv .venv source .venv/bin/activate - Install dependencies:
- With uv (uses
uv.lock):uv sync - Or with pip:
pip install matplotlib scipy
- With uv (uses
- Run the demo (opens plots):
python main.py
- Discretizes the state space (position, velocity) over a uniform grid and the control space over a set of accelerations.
- Iterates the Bellman backup: for each grid point, sweep all controls, simulate the next state via the discrete dynamics, interpolate the current value estimate at that next state, and update the value and greedy policy.
- Uses a quadratic stage cost (
p^2 + v^2 + 0.5*u^2) and discount factorgamma=0.95. - After convergence, visualizes the value function (as sqrt(V) contours for readability), the optimal policy heatmap, and a sample closed-loop trajectory obtained by interpolating the policy during rollout.
- Optionally runs a finite-horizon backward pass: specify a terminal cost
D(x)forV_T(x), then computeV_tand time-varying policies back tot=0.
main.py: Implementation of the discrete grid value-iteration solver and the double-integrator example with plotting and a sample rollout.pyproject.toml: Minimal project metadata and dependencies.uv.lock: Locked dependency set for reproducible installs withuv.
- Replace
discrete_dynamicsanddiscrete_costinmain.pywith your system; ensure they are vectorized over batches of states/actions. - Adjust
grid_bounds,grid_res, andcontrol_spacewhen instantiatingDiscreteGridVIto match your state/control ranges. - Tune
gamma,tol, andmax_iterinsolve()to balance accuracy and runtime; larger grids or action sets increase compute. - To run a finite-horizon solve, define
terminal_cost(state_batch)that returnsD(x)for each flattened state row, then callsolve_finite(horizon=T, terminal_cost=terminal_cost)to getV_0and the time-varying policy list[pi_0, ..., pi_{T-1}].