Skip to content

Nano-GPT: Tiny Shakespeare Language Model A Transformer-based model with ~3M parameters, trained on the Tiny Shakespeare dataset to generate Shakespeare-like text. Inspired by Andrej Karpathy, built from scratch! ๐Ÿš€

Notifications You must be signed in to change notification settings

149189/Nano-GPT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

4 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿš€ Nano-GPT: Transformer-Based Language Model

An interactive Transformer language model trained on the Tiny Shakespeare dataset. ๐Ÿ“œ

Inspired by Andrej Karpathy, this project helped me understand the core concepts behind Transformers, like self-attention and sequence generation. Iโ€™m now working on scaling this up to create a GPT-2 clone! ๐Ÿš€


๐Ÿ“˜ What I Built

  • Self-Attention Mechanism: Helps the model focus on relevant tokens in the sequence.
  • Multi-Head Attention: Captures diverse relationships between tokens for richer understanding.
  • Positional Embeddings: Gives the model a sense of token order, crucial for sequences.
  • Layer Normalization: Stabilizes and accelerates training for better convergence.
  • Feedforward Neural Network: Processes attention outputs to extract complex features.
  • Token and Positional Embeddings: Converts characters into meaningful vector representations.
  • Greedy & Multinomial Sampling: Enables flexible and creative text generation.

๐Ÿ› ๏ธ Model Details

  • ๐Ÿง  Parameters: ~3 Million
  • ๐Ÿ“š Dataset: Tiny Shakespeare (~1MB of Shakespeareโ€™s works)
  • โšก Training Split: 90% training, 10% validation
  • ๐Ÿ“Š Loss Estimation: Robust evaluation on both train and validation sets

After training, the model can generate Shakespeare-like text โ€” itโ€™s incredible to see how it captures the style and rhythm of classic literature! โœจ


๐Ÿš€ Try It Out!


๐Ÿ’ก Whatโ€™s Next?

Iโ€™m working on scaling this up into a GPT-2 clone, and Iโ€™m exploring optimizations like rotary embeddings and flash attention. If you have ideas or suggestions, Iโ€™d love to hear them! Letโ€™s keep learning and building together. ๐Ÿš€


๐Ÿค Acknowledgments

A huge shoutout to Andrej Karpathy for making complex concepts like self-attention, multi-head attention, and layer normalization so understandable โ€” your teachings inspired this project! ๐Ÿ™Œ

#Demo_pic image

About

Nano-GPT: Tiny Shakespeare Language Model A Transformer-based model with ~3M parameters, trained on the Tiny Shakespeare dataset to generate Shakespeare-like text. Inspired by Andrej Karpathy, built from scratch! ๐Ÿš€

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published