Induction Head Detector

Mechanistic interpretability tool that detects and analyzes induction heads in GPT-2 Small using TransformerLens.

Results

Detected 12 induction heads in GPT-2 Small:

Layer	Head	Score
5	5	0.932
6	9	0.913
7	10	0.910
5	1	0.908
7	2	0.833

K-Composition Analysis

Found the induction circuit. Previous-token heads (layers 2-3) compose with induction heads (layers 5-7):

L3H3 → L5H1: 30.68
L2H2 → L5H1: 29.66
L3H3 → L6H9: 27.07
L3H3 → L5H5: 26.56

What Are Induction Heads?

Induction heads implement in-context learning by completing patterns:

[A][B] ... [A] → predicts [B]

Example: "Harry Potter... Harry" → predicts "Potter"

The mechanism requires two attention heads working together:

Previous Token Head (Layer L): Writes "what came before me" into the residual stream
Induction Head (Layer L+k): Finds previous occurrences of current token, copies what came after

Installation

git clone https://github.com/designer-coderajay/induction-head-detector.git
cd induction-head-detector

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Usage

Basic Detection

python induction_detector.py

Outputs:

induction_heatmap.png - 12×12 grid of induction scores
top_induction_head_attention.png - Attention pattern of top head

Deeper Analysis

python deeper_analysis.py

Runs:

Real text visualization ("Harry Potter... Harry")
Previous-token head detection
K-composition scoring between head pairs

Project Structure

induction-head-detector/
├── induction_detector.py    # Main detection script
├── deeper_analysis.py       # Ablation & composition analysis
├── theory_deep_dive.py      # Educational explanations
├── test_detector.py         # Unit tests (7 passed)
├── requirements.txt
├── induction_heatmap.png
├── previous_token_heads.png
└── induction_real_text.png

How It Works

Induction Score Calculation:

For a repeated sequence [r1, r2, ..., r50, r1, r2, ..., r50]:

At position i in the second half, measure attention to position i - seq_len + 1 (what followed the previous occurrence of the current token).

induction_score = mean(attention[i, i - seq_len + 1])
                  for i in range(seq_len, 2*seq_len)

Score > 0.4 → Induction head
Score < 0.2 → Not an induction head

References

In-context Learning and Induction Heads - Anthropic, 2022
A Mathematical Framework for Transformer Circuits - Anthropic, 2021
TransformerLens

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Induction Head Detector

Results

K-Composition Analysis

What Are Induction Heads?

Installation

Usage

Basic Detection

Deeper Analysis

Project Structure

How It Works

References

License

induction-head-detector

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
deeper_analysis.py		deeper_analysis.py
induction_detector.py		induction_detector.py
induction_heatmap.png		induction_heatmap.png
induction_real_text.png		induction_real_text.png
previous_token_heads.png		previous_token_heads.png
requirements.txt		requirements.txt
test_detector.py		test_detector.py
theory_deep_dive.py		theory_deep_dive.py
top_induction_head_attention.png		top_induction_head_attention.png

License

designer-coderajay/induction-head-detector

Folders and files

Latest commit

History

Repository files navigation

Induction Head Detector

Results

K-Composition Analysis

What Are Induction Heads?

Installation

Usage

Basic Detection

Deeper Analysis

Project Structure

How It Works

References

License

induction-head-detector

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages