Project Groot is a from-scratch implementation of a Transformer-based language model in PyTorch, designed to explore the space of Large Language Models (LLMs) through careful architectural choices, training stability experiments and new optimization techniques.
Note
Just like the Marvel character Groot appears in different forms and sizes, this project is designed to scale from Groot Tiny → Groot Small → Groot Medium → Groot Large, while keeping the core architecture and principles consistent.
The current models are GPT-2 style, decoder-only transformer models. The models have been pretrained on the TinyStories Dataset and will subsequently be finetuned for general Question-Answering.
-
GrootTiny: As the name suggests, this is the smallest Groot LM, with just ~120 M parameters.
- Pretraining with TinyStories Dataset
- Finetuning with General Question-Answering
- Training on Wiki-text
-
GrootSmall: This is the follow-up model to GrootTiny, with ~250 M parameters.
- Pretraining with TinyStories Dataset
- Finetuning with General Question-Answering
- Training on Wiki-text