MiniGPT is a lightweight training framework designed for large language models (LLMs). It provides a clean, modular implementation of a GPT-style transformer model with all essential components needed for training and evaluation.
- Modular Design: Clear abstractions and interfaces for easy extension
- Full Training Pipeline: Includes dataloader, trainer, loss functions, and metrics logging
- Lightning-style Components: Implementation inspired by PyTorch Lightning patterns
- Flexible Configuration: Pydantic-based configuration with validation
- Monitoring: Integrated with Weights & Biases for experiment tracking
- Checkpointing: Automatic saving of model states during training
uv venvsource .venv/bin/activate && uv pip install -e .Verify the codebase is working correctly:
pytest src/minigpt/testsRun a training experiment:
python examples/training.py --config src/minigpt/config/gpt_config.jsonAdditional options:
--no-wandb: Disable Weights & Biases logging--overfit-batches N: Overfit on N batches (useful for debugging)
You can modify the configuration file (gpt_config.json) to change model parameters, training settings, and more.
The framework follows a modular design pattern with clear separation of concerns:
Key components:
- Dataloader: Handles data loading and preprocessing
- Model: Implements the transformer architecture
- LightningModule: Encapsulates training/validation logic
- Trainer: Orchestrates the training loop
- Callbacks: Provides hooks for custom behavior
- Loggers: Tracks metrics and experiment progress
MiniGPT implements a GPT-2 style transformer architecture:
The model includes:
- Token and positional embeddings
- Multi-head self-attention layers
- Feed-forward networks
- Layer normalization
- Dropout regularization
You can view an example training run on the RedPajama dataset:
Run 1: Wandb: Redpajama dataset, 100M samples, 124M params transformer, 1000 train steps
Run 2: Wandb: Redpajama dataset, 100M samples, 124M params transformer, 8000 train steps
Run 3: Wandb: Redpajama dataset, 100M samples, 124M params transformer, 8000 train steps
src/
└── minigpt/
├── callbacks/ # Training callbacks (checkpointing, etc.)
├── config/ # Configuration definitions
├── data/ # Data handling and datasets
├── lightning/ # Lightning-style modules
├── loggers/ # Logging implementations
├── model/ # Model architecture
├── tests/ # Unit tests
├── trainer.py # Main trainer implementation
└── utils/ # Utility functions
This project draws inspiration from:
Contributions are welcome! Feel free to open issues or submit pull requests to improve the framework.
This project is licensed under the MIT License - see the LICENSE file for details.

