Skip to content

GIXLabs/input_challenge_ML

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Input Challenge: Programming and Machine Learning

Credit: RL section is adapted from tetris-ai.

Project Overview

  • Train a bot to play Tetris using deep reinforcement learning (RL) or imitation learning (IL).
  • Play Tetris using trained AI or as a human.
  • Collect demonstration data from both human and RL agents to improve imitation learning.

Project Structure

./
├── assets/         # Images, gifs, diagrams
├── models/         # Saved Keras models
├── data/           # Collected demonstration data (human and RL)
├── src/            # Scripts for training and inference
│   ├── run.py              # Main training script for DQN agent
│   ├── run_model.py        # Script to run inference, collect RL demos, or test imitation policy
│   ├── behav_clone.py      # Train a policy network from demonstration data
│   ├── play_human.py       # Play Tetris as a human and collect data
│   ├── play_human_vs_ai.py # Play Tetris: human vs AI
│   ├── tetris.py           # Tetris game logic (used by scripts)
│   ├── dqn_agent.py        # DQN agent implementation
│   └── logs.py             # Custom logging utilities
├── tetris-ai/      # Original code from tetris-ai
├── logs/           # Training and evaluation logs
├── environment.yml # Conda environment file
├── requirements.txt
├── LICENSE
└── README.md

Setup

If you are using Windows or Linux devices, please follow this link to setup the virtual environment.

If you are using M1 Macbooks, you may find information contained in this repo useful to setup your virtual environment.

After, please follow the steps below to setup your virtual environment.

  1. Clone the repository
  2. Create the conda environment:
    conda env create -f environment.yml

Demo

First 10000 points, after some training.

Demo - First 10000 points

Usage

1. Imitation Learning

We will use imitation learning to train an AI player. This AI player uses behavior cloning to mimic demonstrations from human players (you and your friends). To train this AI player, we need:

  1. Enter the human play mode use your controller to demonstrate your strategies for playing Tetris. This will collect a set of demonstrations as the training dataset.
    python src/play_human.py
  • Each session is saved as a separate file in the data/ directory (e.g., human_demo_YYYYMMDD_HHMMSS.npy).
  • Play multiple games to collect more data.
  • To ensure your controller is suitable to play Tetris implemented in play_human.py, we need to re-program the buttons on the controller. Add the following line
buttons.append(setup_button(board.GP5, Keycode.DOWN_ARROW))

The down arrow is used to speed up the dropping effect of blocks.

  1. Train an imitation policy from human demo.
python src/behav_clone.py
  • This script loads all human_demo_*.npy files from data/ and trains a policy network.
  • The trained policy is saved in models/policy_bc.keras.
  1. Test the trained policy.
python src/run_model.py models/policy_bc.keras

The script auto-detects the model type and runs in the appropriate mode. Visualize the performance of your trained AI player.

2. Reinforcement Learning:

At first, the agent will play random moves, saving the states and the given reward in a limited queue (replay memory). At the end of each episode (game), the agent will train itself (using a neural network) with a random sample of the replay memory. As more and more games are played, the agent becomes smarter, achieving higher and higher scores.

Since in reinforcement learning once an agent discovers a good 'path' it will stick with it, it was also considered an exploration variable (that decreases over time), so that the agent picks sometimes a random action instead of the one it considers the best. This way, it can discover new 'paths' to achieve higher scores.

State and Action Format

  • State: [lines_cleared, holes, total_bumpiness, sum_height] (4 features from the board)
  • Action: Integer encoding:
    • 0 = left
    • 1 = right
    • 2 = down
    • 3 = rotate
  • RL data is automatically converted to this format for imitation learning.

Training

The training is based on the Q Learning algorithm for RL, and on supervised learning for imitation. To train the RL agent, run

python src/run.py

(Hyperparameters can be changed in src/run.py.)

Once training is completed, you can test the performance of RL agent using

python src/run_model.py models/best.keras
  • In RL mode, the script will save RL agent demonstrations in data/ (e.g., rl_demo_YYYYMMDD_HHMMSS.npy). The data generated by RL agent can be used further by imitation learning (See Imitation Learning section).
  • You can interrupt with Ctrl+C to save partial data.

Environment

Main dependencies (see environment.yml for full list):

  • python 3.10
  • tensorflow
  • keras
  • numpy
  • tqdm
  • matplotlib
  • scikit-learn
  • opencv-python

Tips and Troubleshooting

  • Data Quality: The more diverse and skillful your demonstrations, the better your imitation policy will be.
  • Ctrl+C: You can safely interrupt data collection or RL runs with Ctrl+C; data will be saved.
  • Action Format: All actions are converted to integers for training, even if RL data originally used tuples.

Useful Links

Deep Q Learning

Tetris

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages