An interactive Transformer language model trained on the Tiny Shakespeare dataset. ๐
Inspired by Andrej Karpathy, this project helped me understand the core concepts behind Transformers, like self-attention and sequence generation. Iโm now working on scaling this up to create a GPT-2 clone! ๐
- Self-Attention Mechanism: Helps the model focus on relevant tokens in the sequence.
- Multi-Head Attention: Captures diverse relationships between tokens for richer understanding.
- Positional Embeddings: Gives the model a sense of token order, crucial for sequences.
- Layer Normalization: Stabilizes and accelerates training for better convergence.
- Feedforward Neural Network: Processes attention outputs to extract complex features.
- Token and Positional Embeddings: Converts characters into meaningful vector representations.
- Greedy & Multinomial Sampling: Enables flexible and creative text generation.
- ๐ง Parameters: ~3 Million
- ๐ Dataset: Tiny Shakespeare (~1MB of Shakespeareโs works)
- โก Training Split: 90% training, 10% validation
- ๐ Loss Estimation: Robust evaluation on both train and validation sets
After training, the model can generate Shakespeare-like text โ itโs incredible to see how it captures the style and rhythm of classic literature! โจ
Iโm working on scaling this up into a GPT-2 clone, and Iโm exploring optimizations like rotary embeddings and flash attention. If you have ideas or suggestions, Iโd love to hear them! Letโs keep learning and building together. ๐
A huge shoutout to Andrej Karpathy for making complex concepts like self-attention, multi-head attention, and layer normalization so understandable โ your teachings inspired this project! ๐
