GPT-2

My implementation of GPT-2 (124M), inspired by Andrej Karpathy's "Let's Reproduce GPT-2 (124M)" video, with explanatory notes and my own additional implementations throughout.

What I gained from this project:

Reinforced my understanding of the transformer architecture and how to implement it in pytorch/python.
Learn how to train on large-scale datasets by loading the Hugging Face dataset in Parquet, tokenizing it, and saving the tokenized arrays as .npy files.
Incrementally load and use training data through a custom dataloader class. This avoids having unreasonably long tensors.
Create a custom cosine-decay lr scheduler.
Integrate wandb into training runs for clear and convenient logging.
TODO: add clipping gradient norm + optimizations (bf16/tf32, compile, flash attention, etc). decide whether to review ddp; don't have multiple gpus to make use of it but it could be useful for the future. also, view vs reshape. handle multiple choice validation data like goldenswag. fused adamw

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
images		images
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
fineweb-edu.py		fineweb-edu.py
generate.ipynb		generate.ipynb
goldenswag.py		goldenswag.py
input.txt		input.txt
notes.ipynb		notes.ipynb
train_gpt2.ipynb		train_gpt2.ipynb
train_gpt2.py		train_gpt2.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GPT-2

About

Uh oh!

Releases

Packages

Languages

License

brien60/GPT-2

Folders and files

Latest commit

History

Repository files navigation

GPT-2

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages