Skip to content

THU-KEG/LinguaLens

 
 

Repository files navigation

LinguaLens — Quick Usage Guide

This repo accompanies the paper “LinguaLens: Towards Interpreting Linguistic Mechanisms of Large Language Models via Sparse Auto-Encoder” (EMNLP 2025). It provides code to discover and manipulate linguistic features in LLMs using SAEs (PS/PN/FRC, cross-layer trends, EN–ZH overlap, and intervention/visualization utilities).


Installation

git clone https://github.com/THU-KEG/LinguaLens
cd LinguaLens
conda create -n lingualens python=3.10 -y && conda activate lingualens
pip install -r requirements.txt
pip install -e .

Prereqs

  • A base LLM (e.g., Llama-3.1-8B) available via transformers.

  • Per-layer OpenSAE checkpoints, organized as:

    /path/to/sae/layer_{:02d}   # layer_00 ... layer_31
    

Data you’ll need

1) Official dataset (recommended starting point)

  • Hugging Face: THU-KEG/LinguaLens-Data
  • Content: Counterfactual sentence pairs for 100+ linguistic phenomena in English and Chinese. Each row supplies a positive instance (sentence1) and a minimally edited counterfactual (sentence2) under the same feature label.

Typical columns (you may see minor variations depending on export):

  • feature — phenomenon name (e.g., passive_voice)
  • categories — tags like Morphology, Syntax, Semantics, Pragmatics
  • pair_index — 1..50 per feature
  • sentence1, sentence2 — positive vs. counterfactual
  • languageEnglish or Chinese

Load and prepare per-feature text files (alternating positive/negative lines) for the analyzers:

from datasets import load_dataset
from pathlib import Path

ds = load_dataset("THU-KEG/LinguaLens-Data")["train"]

def export_feature_txt(feature, language, out_path):
    subset = ds.filter(lambda r: r["feature"] == feature and r["language"] == language)
    # Ensure stable ordering by pair_index
    subset = subset.sort("pair_index")
    lines = []
    for r in subset:
        lines.append(r["sentence1"].strip())  # positive (odd)
        lines.append(r["sentence2"].strip())  # counterfactual (even)
    Path(out_path).parent.mkdir(parents=True, exist_ok=True)
    Path(out_path).write_text("\n".join(lines), encoding="utf-8")

export_feature_txt("passive_voice", "English", "data/features/passive_voice.txt")

The alternating line format (odd=phenomenon, even=counterfactual) is required by the PS/PN/FRC calculators.

2) (Optional) Vector-list files for bilingual overlap

If you plan to run bilingual overlap analysis, prepare a .txt per feature where each line = one layer and contains space-separated SAE base-vector IDs discovered for that layer:

# 32 lines for 32 layers
12 77 901 1337
... (next line is layer 01)

Minimal examples

A) Analyze a feature across layers (PS/PN/FRC)

from lingualens.analyzer import LinguisticAnalyzer
from lingualens.utils import validate_layer_indices

model_path = "/path/to/llama-3.1-8b"
sae_tpl    = "/path/to/sae/layer_{:02d}"
feature    = "data/features/passive_voice.txt"
layers     = validate_layer_indices([0,7,15,19,25,31])

an = LinguisticAnalyzer(model_path, sae_tpl)  # device auto-detected
res = an.analyze_feature(feature_file=feature, layers=layers, top_k=10)

print(res["layer_results"][15]["top_features"][:5])  # [(base_id, frc_score), ...]

Batch:

an.batch_analyze_features(
    feature_files=[
        "data/features/passive_voice.txt",
        "data/features/comparative.txt"
    ],
    layers=list(range(0,32)),
    output_dir="out/analysis",
    top_k=10
)

B) Intervention (ablation/enhancement)

from lingualens.intervener import Intervener

iv = Intervener(
    model_path="/path/to/llama-3.1-8b",
    sae_path="/path/to/sae/layer_15"
)
iv.run_intervention_experiment(
    input_prompt="Write a sentence using a simile.",
    intervention_indices=[1337,77],   # pick from analysis top-K
    output_path="out/intervene/simile_L15.txt",
    num_generations=5, max_new_tokens=80, temperature=0.7,
    experiment_name="simile_L15"
)

C) Visualization (HTML)

from lingualens.visualizer import Visualizer

viz = Visualizer("/path/to/llama-3.1-8b", "/path/to/sae/layer_{:02d}")
viz.generate_html_report(
    feature_file="data/features/passive_voice.txt",
    layer_idx=15,
    output_html="out/html/passive_L15.html",
    top_k=10,              # visualize top-K by FRC
    analysis_mode="FRC"    # or "frequency"
)

D) Bilingual overlap (EN ↔ ZH)

from lingualens.bilingual import BilingualAnalyzer

bi = BilingualAnalyzer(vector_data_dir="data/vectors")
pairs = bi.load_default_feature_pairs()  # or provide your own mapping
res = bi.analyze_bilingual_feature_similarity(pairs, "out/bilingual/passive.json")
bi.generate_similarity_heatmap(res, "out/bilingual/heatmap.png")
bi.export_similarity_report(res, "out/bilingual/report.md")

Tips

  • GPU strongly recommended; device is auto-detected (override via class device args).
  • If the tokenizer lacks a pad token, it will default to EOS automatically.
  • Ensure feature files are non-empty and follow the alternating pair format; otherwise PS/PN will be degenerate.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%