Skip to content

A 3D Go variant (5×5×5) with interactive pygame visualization and multiple AIs: heuristic, neural (REINFORCE), and AlphaZero-style (MCTS + 3D ResNet).

License

Notifications You must be signed in to change notification settings

Jason-Hoford/3D-GO

Repository files navigation

3D Go Variant – AlphaZero + Heuristic AIs

Neural Fish Tank Screenshot

A 3D generalization of the classic Go game, featuring an interactive pygame-based 3D board visualization and an AI opponent.

Status: Experimental / research project – great for playing, visualizing, and tinkering with game AI.


Features

  • 3D Board: Play Go on a 3D grid (default 5×5×5)
  • Interactive 3D View: Rotate, zoom, and navigate through layers with perspective projection
  • Territory Visualization: See which areas are controlled by each player
  • Multiple AI Types: Play against heuristic AI, neural AI (REINFORCE), or AlphaZero-style AI (MCTS + ResNet)
  • Real-time Scoring: Track stones and territory control
  • AI vs AI Mode: Watch two AIs play against each other step-by-step

Start Screen & Modes

When you launch python main.py, a start screen lets you configure:

  • Human vs AI: Play against a computer opponent
  • Human vs Friend: Two-player local game
  • AI vs AI: Watch two AIs play (step through moves with Space)
  • Who plays Black (goes first)
  • AI Type (Heuristic, Neural, or AlphaZero) when playing against the AI
  • Which AI weight file to load (.json for Neural, .pth for AlphaZero, placed in models/)

In AI vs AI mode, you can select different AI types and weight files for each player.

Use ↑/↓ to navigate, ←/→ to change settings, and Enter to begin.

Controls

Camera & Navigation

  • A / D or Left Arrow / Right Arrow: Rotate board horizontally (yaw)
  • W / S or Up Arrow / Down Arrow: Tilt board up/down (pitch)
  • + / - or Mouse Wheel: Zoom in/out
  • Page Up / Page Down or [ / ] or 9 / 0: Change active layer

Gameplay

  • Left Click: Place a stone on the active layer
  • P or Space: Pass turn
  • G: Toggle territory visualization (shows which areas favor black/white with vibrant colors)
  • H: Toggle grid visibility on/off
  • T: Cycle through grid display modes (Original Grid / Z-Y Axes)
  • F: Toggle axis helper (shows X/Y/Z axes with numeric labels)
  • J: Toggle stone pillars (depth cues to show stone height)
  • O: Toggle perspective transform (ON: depth-based scaling, OFF: orthographic projection)

Other

  • Q or Esc: Quit game

AI Training

Neural AI Training (REINFORCE)

Train a neural AI using REINFORCE against the heuristic baseline:

python neural_training.py --board-size 5 --episodes 100 --save-path models/neural_ai.json

This script:

  • Plays the neural policy against the heuristic AI
  • Logs per-episode rewards, capture differentials, and running win-rate summaries (--log-interval)
  • Uses reward & penalty functions based on win/loss, territory, and captures
  • Updates a simple Elo rating (auto-saved to {save_path}_elo.json)
  • Runs 100 evaluation games after training completes
  • Saves updated weights that can be selected from the start screen

Weight Naming: You can name your trained models by specifying a custom --save-path. For example:

python neural_training.py --save-path models/my_neural_v1.json --episodes 200

Options:

  • --log-interval: how often to print win-rate/reward summaries (default 10)
  • --load-path: resume from saved weights
  • --elo-path: set custom Elo tracking file (default: derived from save-path)
  • --profile: enable detailed timing per episode

AlphaZero Training (Self-Play)

Train an AlphaZero-style AI using self-play with MCTS:

python alpha_zero_training.py --num-games 1000 --num-simulations 50 --save-path models/alpha_zero.pth

This implements a full AlphaZero architecture:

  • 3D ResNet Network: Convolutional neural network with residual blocks
    • Input: [batch, 3, 5, 5, 5] (black, white, empty channels)
    • Output: Policy [batch, 126] (125 positions + pass), Value [batch, 1]
  • MCTS with PUCT: Monte Carlo Tree Search using PUCT algorithm
    • Dirichlet noise at root for exploration
    • Configurable simulations per move (default: 50)
    • Temperature schedule (τ=1 early, τ→0 later)
  • Self-Play Training: Games stored as (state, π, z) tuples
    • Replay buffer for experience replay
    • Loss: (z - v)² - πᵀ log(p) + L2 regularization
  • Curriculum Learning: Starts training against heuristic baseline, then switches to self-play once win rate exceeds threshold (default: 55%)
  • Elo Evaluation: Automatically runs 100 evaluation games against baseline after training, calculates Elo ratings

Options:

  • --num-games: Number of self-play games (default: 1000)
  • --num-simulations: MCTS simulations per move (default: 50)
  • --num-residual-blocks: ResNet blocks (default: 5)
  • --channels: Network channels (default: 64)
  • --learning-rate: Learning rate (default: 0.001)
  • --batch-size: Training batch size (default: 32)
  • --train-interval: Train every N games (default: 10)
  • --save-path: Path to save weights (can include custom name, e.g., models/my_alphazero_v1.pth)
  • --load-path: Path to load weights from
  • --no-curriculum: Disable curriculum learning (start with self-play immediately)
  • --win-rate-threshold: Win rate threshold to switch to self-play (default: 0.55)
  • --eval-games: Number of games to evaluate before checking win rate (default: 20)
  • --elo-path: Custom Elo file path (default: derived from save-path with _elo.json suffix)

Weight Naming: You can name your trained models by specifying a custom --save-path. For example:

python alpha_zero_training.py --save-path models/alphazero_v2.pth --num-games 500

This creates:

  • models/alphazero_v2.pth (network weights)
  • models/alphazero_v2_elo.json (Elo ratings)

Trained AlphaZero models can be selected from the start screen when choosing "AlphaZero" as the AI type.

Game Rules

  • Board: 3D grid where each cell can be empty (0), black stone (1), or white stone (-1)
  • Adjacency: Stones are connected via 6 orthogonal neighbors (x±1, y±1, z±1)
  • Groups: Connected stones of the same color form a group
  • Liberties: Empty cells adjacent to a group
  • Capture: Groups with no liberties are captured and removed
  • Suicide Rule: You cannot place a stone that would create a group with no liberties unless it captures opponent stones
  • Game End: Game ends when both players pass consecutively
  • Scoring: Based on stones on board plus estimated territory control

Territory Visualization

Press G to toggle territory view. When enabled:

  • Bright Blue: Areas where Black has positional advantage (vibrant, easy to see)
  • Bright Pink: Areas where White has positional advantage (vibrant, easy to see)
  • Territory is calculated based on proximity to stones (closer stones have more influence)
  • Colors are now much more obvious and visible compared to the subtle tints

Grid Display Modes

Press H to toggle grid visibility, and T to cycle through grid modes:

  • Mode 0 - X-Y Planes:
    • Gray grids show complete X-Y grid patterns for all Z layers
    • Green grid highlights the active Z layer's X-Y grid
    • Use Page Up/Down to change active Z layer
  • Mode 1 - Z-Y Planes:
    • Gray grids show complete Z-Y grid patterns for all X positions
    • Green grid highlights the active Y position's Z-Y grid
    • Use Page Up/Down to change active Y position
    • When switching modes, the active selection automatically changes (Z layer ↔ Y position)
  • Axis Helper: Press F to display colored X (red), Y (green), and Z (blue) axes with numeric labels for easier orientation
  • Stone Pillars: Press J to toggle per-stone depth pillars that connect stones to the base layer for height cues

Installation

pip install -r requirements.txt

Running

python main.py

Command Line Options

  • --size N: Set board size (default: 5)
  • --depth N: AI search depth (default: 1)
  • --samples N: AI move samples per ply (default: 60)
  • --ai-color 1|-1: AI color (1=black, -1=white, default: -1)

Example:

python main.py --size 5 --depth 2 --samples 100

Project Structure

Core Game

  • game_state.py: Game rules, board state, group/liberty logic, territory estimation, Ko/Superko rules
  • ai.py: Heuristic AI move selection with shallow search
  • view.py: 3D rendering, camera controls, input handling, perspective projection
  • main.py: Main game loop and mode selection
  • start_screen.py: Start menu for game configuration

Neural AI (REINFORCE)

  • neural_ai.py: Policy network using PyTorch (MLP architecture)
  • neural_training.py: Training script using REINFORCE algorithm

AlphaZero AI

  • resnet3d_network.py: 3D ResNet architecture for policy and value estimation
  • mcts.py: Monte Carlo Tree Search with PUCT algorithm
  • alpha_zero_ai.py: AlphaZero AI wrapper combining ResNet3D and MCTS
  • alpha_zero_training.py: Self-play training loop with replay buffer, curriculum learning, and Elo evaluation

Development Notes

  • Uses simple 3D projection (yaw/pitch camera) for rendering with optional perspective transform
  • Multiple AI architectures: Heuristic (rule-based), Neural (REINFORCE), and AlphaZero (MCTS + ResNet)
  • Territory estimation uses distance-based influence calculation
  • Ko and Superko rules implemented to prevent infinite game cycles
  • Optimized for modest hardware with configurable board sizes
  • GPU acceleration available for neural network training (auto-detects CUDA)

About

A 3D Go variant (5×5×5) with interactive pygame visualization and multiple AIs: heuristic, neural (REINFORCE), and AlphaZero-style (MCTS + 3D ResNet).

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages