Maze_RL

This project explores Reinforcement Learning (RL) in a series of custom maze environments, progressively increasing in complexity. It integrates GPT data generation and inference logic, but the actual fine-tuning process is now migrated to a new project: GPT-CoT 🎬 Watch the demo 1 video on YouTube 🎬 Watch the demo 2 video on YouTube

Environment Overview

All environments are custom-built in Python and partially follow the gymnasium interface.

Maze ID	Key Features
`maze1`	Fully observable, deterministic grid. Simple greedy path-finding.
`maze2`	Includes partial observability (1x1 view) and non-deterministic movement.
`maze3`	DFS/Prim-based maze generation with complex branching.
`maze4`	POMDP environment with `Growing Tree` maze, designed for map exploration.
`maze5`	Adds traps to the environment. Exploration must avoid trap tiles.
`maze6`	Multi-goal navigation with TSP-style shortest path planning across several goals.

All environments support:

Custom seed & size settings
Saving full exploration trajectories
Ground truth map extraction for comparison

Key Modules

env/ & env_partial/: Maze environments (fully or partially observable)
train/: RL-inspired exploration or trajectory logging
run/: Path planning / TSP execution with visualization
outputs/: Saved trajectories, maps, visuals
visual/: Standalone tools for GT visualization, exploration rendering

GPT Integration

The Maze environments are used to generate reasoning datasets for GPT fine-tuning.

All model fine-tuning tasks are now maintained under a new repo:
GPT-CoT (Chain-of-Thought Reasoning Fine-Tuning)

Example Outputs

✅ Visualization of run_tsp_theta_6.py: Shows shortest goal path with TSP search
✅ Trap avoidance coloring in maze5
✅ Demo video and annotated PDF (maze.pdf)

Demo Materials

maze.pdf: Slide summary of motivation, environment, training logic, and output examples
run_demo.mp4: Real-time TSP navigation video
v3.py: Visualize gt_maze6_multi_SEED*.npy ground truth maps

📌 Git Tips

To avoid tracking large output files:

# .gitignore
outputs/
experiments-linux/

If large files were already committed, use git filter-repo:

git filter-repo --force --path outputs --path experiments-linux --invert-paths

✨ Future Work

Integrate Decision Transformer
Fine-tune with policy + trajectory pairs
Add language-based goal commands
Combine LLM inference with real-time RL

🔗 GitHub

This project is hosted at: Maze_RL
GPT model training: GPT-CoT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Maze_RL

Environment Overview

Key Modules

GPT Integration

Example Outputs

Demo Materials

📌 Git Tips

✨ Future Work

🔗 GitHub

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
env		env
env_partial		env_partial
run		run
scripts		scripts
train		train
visual		visual
.gitignore		.gitignore
README.md		README.md
maze_with_video_link.pdf		maze_with_video_link.pdf
requirements.txt		requirements.txt
run_demo.mp4		run_demo.mp4

Seanaaa0/Maze_RL

Folders and files

Latest commit

History

Repository files navigation

Maze_RL

Environment Overview

Key Modules

GPT Integration

Example Outputs

Demo Materials

📌 Git Tips

✨ Future Work

🔗 GitHub

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages