Tiny Infini-Gram

A minimal implementation of the Infini-gram language model using Go's built-in suffix array, plus a character-level GPT for comparison.

How it works

Instead of using a fixed n-gram size, infini-gram finds multiple suffix matches of varying lengths in the training data and combines their next-token distributions using exponential decay weighting. Longer matches (higher n) are weighted more heavily. The k parameter controls how many n-gram levels to use (k=2 by default, k=-1 uses all levels).

Setup

# Download the dataset (Shakespeare text)
wget https://github.com/nathan-barry/tiny-diffusion/releases/download/v2.0.0/data.txt

# Download the trained GPT weights (optional - can train from scratch)
mkdir -p weights && wget -P weights https://github.com/nathan-barry/tiny-diffusion/releases/download/v2.0.0/gpt.pt

Usage

# Run infini-gram
go run infini-gram.go

# Run GPT (uses pre-trained weights if available)
uv run gpt.py

# Train GPT from scratch
uv run gpt.py --train

# Run side-by-side visualization comparing both models
uv run visualization.py

Both models generate 1000 characters with temperature 0.8 by default. The visualization shows an animated comparison with generation speed proportional to actual inference time.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
go.mod		go.mod
gpt.py		gpt.py
infini-gram.go		infini-gram.go
pyproject.toml		pyproject.toml
uv.lock		uv.lock
visualization.py		visualization.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tiny Infini-Gram

How it works

Setup

Usage

About

Uh oh!

Releases

Packages

Languages

nathan-barry/tiny-infini-gram

Folders and files

Latest commit

History

Repository files navigation

Tiny Infini-Gram

How it works

Setup

Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages