A 3D generalization of the classic Go game, featuring an interactive pygame-based 3D board visualization and an AI opponent.
Status: Experimental / research project – great for playing, visualizing, and tinkering with game AI.
- 3D Board: Play Go on a 3D grid (default 5×5×5)
- Interactive 3D View: Rotate, zoom, and navigate through layers with perspective projection
- Territory Visualization: See which areas are controlled by each player
- Multiple AI Types: Play against heuristic AI, neural AI (REINFORCE), or AlphaZero-style AI (MCTS + ResNet)
- Real-time Scoring: Track stones and territory control
- AI vs AI Mode: Watch two AIs play against each other step-by-step
When you launch python main.py, a start screen lets you configure:
- Human vs AI: Play against a computer opponent
- Human vs Friend: Two-player local game
- AI vs AI: Watch two AIs play (step through moves with Space)
- Who plays Black (goes first)
- AI Type (Heuristic, Neural, or AlphaZero) when playing against the AI
- Which AI weight file to load (
.jsonfor Neural,.pthfor AlphaZero, placed inmodels/)
In AI vs AI mode, you can select different AI types and weight files for each player.
Use ↑/↓ to navigate, ←/→ to change settings, and Enter to begin.
- A / D or Left Arrow / Right Arrow: Rotate board horizontally (yaw)
- W / S or Up Arrow / Down Arrow: Tilt board up/down (pitch)
- + / - or Mouse Wheel: Zoom in/out
- Page Up / Page Down or [ / ] or 9 / 0: Change active layer
- Left Click: Place a stone on the active layer
- P or Space: Pass turn
- G: Toggle territory visualization (shows which areas favor black/white with vibrant colors)
- H: Toggle grid visibility on/off
- T: Cycle through grid display modes (Original Grid / Z-Y Axes)
- F: Toggle axis helper (shows X/Y/Z axes with numeric labels)
- J: Toggle stone pillars (depth cues to show stone height)
- O: Toggle perspective transform (ON: depth-based scaling, OFF: orthographic projection)
- Q or Esc: Quit game
Train a neural AI using REINFORCE against the heuristic baseline:
python neural_training.py --board-size 5 --episodes 100 --save-path models/neural_ai.jsonThis script:
- Plays the neural policy against the heuristic AI
- Logs per-episode rewards, capture differentials, and running win-rate summaries (
--log-interval) - Uses reward & penalty functions based on win/loss, territory, and captures
- Updates a simple Elo rating (auto-saved to
{save_path}_elo.json) - Runs 100 evaluation games after training completes
- Saves updated weights that can be selected from the start screen
Weight Naming: You can name your trained models by specifying a custom --save-path. For example:
python neural_training.py --save-path models/my_neural_v1.json --episodes 200Options:
--log-interval: how often to print win-rate/reward summaries (default 10)--load-path: resume from saved weights--elo-path: set custom Elo tracking file (default: derived from save-path)--profile: enable detailed timing per episode
Train an AlphaZero-style AI using self-play with MCTS:
python alpha_zero_training.py --num-games 1000 --num-simulations 50 --save-path models/alpha_zero.pthThis implements a full AlphaZero architecture:
- 3D ResNet Network: Convolutional neural network with residual blocks
- Input:
[batch, 3, 5, 5, 5](black, white, empty channels) - Output: Policy
[batch, 126](125 positions + pass), Value[batch, 1]
- Input:
- MCTS with PUCT: Monte Carlo Tree Search using PUCT algorithm
- Dirichlet noise at root for exploration
- Configurable simulations per move (default: 50)
- Temperature schedule (τ=1 early, τ→0 later)
- Self-Play Training: Games stored as
(state, π, z)tuples- Replay buffer for experience replay
- Loss:
(z - v)² - πᵀ log(p) + L2regularization
- Curriculum Learning: Starts training against heuristic baseline, then switches to self-play once win rate exceeds threshold (default: 55%)
- Elo Evaluation: Automatically runs 100 evaluation games against baseline after training, calculates Elo ratings
Options:
--num-games: Number of self-play games (default: 1000)--num-simulations: MCTS simulations per move (default: 50)--num-residual-blocks: ResNet blocks (default: 5)--channels: Network channels (default: 64)--learning-rate: Learning rate (default: 0.001)--batch-size: Training batch size (default: 32)--train-interval: Train every N games (default: 10)--save-path: Path to save weights (can include custom name, e.g.,models/my_alphazero_v1.pth)--load-path: Path to load weights from--no-curriculum: Disable curriculum learning (start with self-play immediately)--win-rate-threshold: Win rate threshold to switch to self-play (default: 0.55)--eval-games: Number of games to evaluate before checking win rate (default: 20)--elo-path: Custom Elo file path (default: derived from save-path with_elo.jsonsuffix)
Weight Naming: You can name your trained models by specifying a custom --save-path. For example:
python alpha_zero_training.py --save-path models/alphazero_v2.pth --num-games 500This creates:
models/alphazero_v2.pth(network weights)models/alphazero_v2_elo.json(Elo ratings)
Trained AlphaZero models can be selected from the start screen when choosing "AlphaZero" as the AI type.
- Board: 3D grid where each cell can be empty (0), black stone (1), or white stone (-1)
- Adjacency: Stones are connected via 6 orthogonal neighbors (x±1, y±1, z±1)
- Groups: Connected stones of the same color form a group
- Liberties: Empty cells adjacent to a group
- Capture: Groups with no liberties are captured and removed
- Suicide Rule: You cannot place a stone that would create a group with no liberties unless it captures opponent stones
- Game End: Game ends when both players pass consecutively
- Scoring: Based on stones on board plus estimated territory control
Press G to toggle territory view. When enabled:
- Bright Blue: Areas where Black has positional advantage (vibrant, easy to see)
- Bright Pink: Areas where White has positional advantage (vibrant, easy to see)
- Territory is calculated based on proximity to stones (closer stones have more influence)
- Colors are now much more obvious and visible compared to the subtle tints
Press H to toggle grid visibility, and T to cycle through grid modes:
- Mode 0 - X-Y Planes:
- Gray grids show complete X-Y grid patterns for all Z layers
- Green grid highlights the active Z layer's X-Y grid
- Use Page Up/Down to change active Z layer
- Mode 1 - Z-Y Planes:
- Gray grids show complete Z-Y grid patterns for all X positions
- Green grid highlights the active Y position's Z-Y grid
- Use Page Up/Down to change active Y position
- When switching modes, the active selection automatically changes (Z layer ↔ Y position)
- Axis Helper: Press F to display colored X (red), Y (green), and Z (blue) axes with numeric labels for easier orientation
- Stone Pillars: Press J to toggle per-stone depth pillars that connect stones to the base layer for height cues
pip install -r requirements.txtpython main.py--size N: Set board size (default: 5)--depth N: AI search depth (default: 1)--samples N: AI move samples per ply (default: 60)--ai-color 1|-1: AI color (1=black, -1=white, default: -1)
Example:
python main.py --size 5 --depth 2 --samples 100game_state.py: Game rules, board state, group/liberty logic, territory estimation, Ko/Superko rulesai.py: Heuristic AI move selection with shallow searchview.py: 3D rendering, camera controls, input handling, perspective projectionmain.py: Main game loop and mode selectionstart_screen.py: Start menu for game configuration
neural_ai.py: Policy network using PyTorch (MLP architecture)neural_training.py: Training script using REINFORCE algorithm
resnet3d_network.py: 3D ResNet architecture for policy and value estimationmcts.py: Monte Carlo Tree Search with PUCT algorithmalpha_zero_ai.py: AlphaZero AI wrapper combining ResNet3D and MCTSalpha_zero_training.py: Self-play training loop with replay buffer, curriculum learning, and Elo evaluation
- Uses simple 3D projection (yaw/pitch camera) for rendering with optional perspective transform
- Multiple AI architectures: Heuristic (rule-based), Neural (REINFORCE), and AlphaZero (MCTS + ResNet)
- Territory estimation uses distance-based influence calculation
- Ko and Superko rules implemented to prevent infinite game cycles
- Optimized for modest hardware with configurable board sizes
- GPU acceleration available for neural network training (auto-detects CUDA)
