MiniGPT: A Minimal LLM Training Framework

MiniGPT is a lightweight training framework designed for large language models (LLMs). It provides a clean, modular implementation of a GPT-style transformer model with all essential components needed for training and evaluation.

Features

Modular Design: Clear abstractions and interfaces for easy extension
Full Training Pipeline: Includes dataloader, trainer, loss functions, and metrics logging
Lightning-style Components: Implementation inspired by PyTorch Lightning patterns
Flexible Configuration: Pydantic-based configuration with validation
Monitoring: Integrated with Weights & Biases for experiment tracking
Checkpointing: Automatic saving of model states during training

Installation

Create Virtual Environment

uv venv

Install Dependencies

source .venv/bin/activate && uv pip install -e .

Usage

Running Tests

Verify the codebase is working correctly:

pytest src/minigpt/tests

Training a Model

Run a training experiment:

python examples/training.py --config src/minigpt/config/gpt_config.json

Additional options:

--no-wandb: Disable Weights & Biases logging
--overfit-batches N: Overfit on N batches (useful for debugging)

Customizing Training

You can modify the configuration file (gpt_config.json) to change model parameters, training settings, and more.

Framework Architecture

The framework follows a modular design pattern with clear separation of concerns:

Key components:

Dataloader: Handles data loading and preprocessing
Model: Implements the transformer architecture
LightningModule: Encapsulates training/validation logic
Trainer: Orchestrates the training loop
Callbacks: Provides hooks for custom behavior
Loggers: Tracks metrics and experiment progress

Model Architecture

MiniGPT implements a GPT-2 style transformer architecture:

Source

The model includes:

Token and positional embeddings
Multi-head self-attention layers
Feed-forward networks
Layer normalization
Dropout regularization

Example Training Runs

You can view an example training run on the RedPajama dataset:

Run 1: Wandb: Redpajama dataset, 100M samples, 124M params transformer, 1000 train steps

Run 2: Wandb: Redpajama dataset, 100M samples, 124M params transformer, 8000 train steps

Run 3: Wandb: Redpajama dataset, 100M samples, 124M params transformer, 8000 train steps

Project Structure

src/
└── minigpt/
    ├── callbacks/          # Training callbacks (checkpointing, etc.)
    ├── config/             # Configuration definitions
    ├── data/               # Data handling and datasets
    ├── lightning/          # Lightning-style modules
    ├── loggers/            # Logging implementations
    ├── model/              # Model architecture
    ├── tests/              # Unit tests
    ├── trainer.py          # Main trainer implementation
    └── utils/              # Utility functions

Relevant Resources

This project draws inspiration from:

Contributing

Contributions are welcome! Feel free to open issues or submit pull requests to improve the framework.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
assets		assets
examples		examples
src/minigpt		src/minigpt
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MiniGPT: A Minimal LLM Training Framework

Features

Installation

Create Virtual Environment

Install Dependencies

Usage

Running Tests

Training a Model

Customizing Training

Framework Architecture

Model Architecture

Example Training Runs

Project Structure

Relevant Resources

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

therealdavidos/minigpt

Folders and files

Latest commit

History

Repository files navigation

MiniGPT: A Minimal LLM Training Framework

Features

Installation

Create Virtual Environment

Install Dependencies

Usage

Running Tests

Training a Model

Customizing Training

Framework Architecture

Model Architecture

Example Training Runs

Project Structure

Relevant Resources

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages