This repo accompanies the paper “LinguaLens: Towards Interpreting Linguistic Mechanisms of Large Language Models via Sparse Auto-Encoder” (EMNLP 2025). It provides code to discover and manipulate linguistic features in LLMs using SAEs (PS/PN/FRC, cross-layer trends, EN–ZH overlap, and intervention/visualization utilities).
git clone https://github.com/THU-KEG/LinguaLens
cd LinguaLens
conda create -n lingualens python=3.10 -y && conda activate lingualens
pip install -r requirements.txt
pip install -e .Prereqs
-
A base LLM (e.g.,
Llama-3.1-8B) available viatransformers. -
Per-layer OpenSAE checkpoints, organized as:
/path/to/sae/layer_{:02d} # layer_00 ... layer_31
- Hugging Face:
THU-KEG/LinguaLens-Data - Content: Counterfactual sentence pairs for 100+ linguistic phenomena in English and Chinese. Each row supplies a positive instance (
sentence1) and a minimally edited counterfactual (sentence2) under the same feature label.
Typical columns (you may see minor variations depending on export):
feature— phenomenon name (e.g.,passive_voice)categories— tags likeMorphology,Syntax,Semantics,Pragmaticspair_index— 1..50 per featuresentence1,sentence2— positive vs. counterfactuallanguage—EnglishorChinese
Load and prepare per-feature text files (alternating positive/negative lines) for the analyzers:
from datasets import load_dataset
from pathlib import Path
ds = load_dataset("THU-KEG/LinguaLens-Data")["train"]
def export_feature_txt(feature, language, out_path):
subset = ds.filter(lambda r: r["feature"] == feature and r["language"] == language)
# Ensure stable ordering by pair_index
subset = subset.sort("pair_index")
lines = []
for r in subset:
lines.append(r["sentence1"].strip()) # positive (odd)
lines.append(r["sentence2"].strip()) # counterfactual (even)
Path(out_path).parent.mkdir(parents=True, exist_ok=True)
Path(out_path).write_text("\n".join(lines), encoding="utf-8")
export_feature_txt("passive_voice", "English", "data/features/passive_voice.txt")The alternating line format (odd=phenomenon, even=counterfactual) is required by the PS/PN/FRC calculators.
If you plan to run bilingual overlap analysis, prepare a .txt per feature where each line = one layer and contains space-separated SAE base-vector IDs discovered for that layer:
# 32 lines for 32 layers
12 77 901 1337
... (next line is layer 01)
from lingualens.analyzer import LinguisticAnalyzer
from lingualens.utils import validate_layer_indices
model_path = "/path/to/llama-3.1-8b"
sae_tpl = "/path/to/sae/layer_{:02d}"
feature = "data/features/passive_voice.txt"
layers = validate_layer_indices([0,7,15,19,25,31])
an = LinguisticAnalyzer(model_path, sae_tpl) # device auto-detected
res = an.analyze_feature(feature_file=feature, layers=layers, top_k=10)
print(res["layer_results"][15]["top_features"][:5]) # [(base_id, frc_score), ...]Batch:
an.batch_analyze_features(
feature_files=[
"data/features/passive_voice.txt",
"data/features/comparative.txt"
],
layers=list(range(0,32)),
output_dir="out/analysis",
top_k=10
)from lingualens.intervener import Intervener
iv = Intervener(
model_path="/path/to/llama-3.1-8b",
sae_path="/path/to/sae/layer_15"
)
iv.run_intervention_experiment(
input_prompt="Write a sentence using a simile.",
intervention_indices=[1337,77], # pick from analysis top-K
output_path="out/intervene/simile_L15.txt",
num_generations=5, max_new_tokens=80, temperature=0.7,
experiment_name="simile_L15"
)from lingualens.visualizer import Visualizer
viz = Visualizer("/path/to/llama-3.1-8b", "/path/to/sae/layer_{:02d}")
viz.generate_html_report(
feature_file="data/features/passive_voice.txt",
layer_idx=15,
output_html="out/html/passive_L15.html",
top_k=10, # visualize top-K by FRC
analysis_mode="FRC" # or "frequency"
)from lingualens.bilingual import BilingualAnalyzer
bi = BilingualAnalyzer(vector_data_dir="data/vectors")
pairs = bi.load_default_feature_pairs() # or provide your own mapping
res = bi.analyze_bilingual_feature_similarity(pairs, "out/bilingual/passive.json")
bi.generate_similarity_heatmap(res, "out/bilingual/heatmap.png")
bi.export_similarity_report(res, "out/bilingual/report.md")- GPU strongly recommended; device is auto-detected (override via class
deviceargs). - If the tokenizer lacks a pad token, it will default to EOS automatically.
- Ensure feature files are non-empty and follow the alternating pair format; otherwise PS/PN will be degenerate.