This repository accompanies the paper Beyond Linear Steering: Unified Multi-Attribute Control for Language Models, allowing for the results to be replicated. The multi-layer evaluation from Section 5 of the paper is implemented in src/, and the code for other experiments is accessible in notebooks/.
- Python 3.10+
- Install dependencies:
pip install -r requirements.txtSome features (plot export, judge-based calibration) require optional deps and API keys:
- Static image export:
kaleido - OpenAI judge calibration and tone-judge:
openai+ an API key, andtiktoken
src/steering/– core steering primitivesmodels.py– activation classifier and steering modelhooks.py– gradient/additive forward hookscaa.py– CAA vector computationdct_vectors.py– DCT steering vectorseval.py– activation-space evaluation helpers
src/utils/– utilities and helpersdata.py– task/dataset loading (tones)models.py– HF model loaderfeatures.py– hidden state extraction (get_hidden_cached)generation.py– batched generation with hookspaths.py–results/,data/,data/generationspathsconfig.py,tasks.py,deps.py,model_ops.py,state.py
src/evals/– evaluation flowsood.py– judge-based OOD scoring & alpha calibrationdct_eval.py– DCT-to-label mapping and alpha sweepplotting.py– bar chart plottingsamples.py– sampling steered responses and saving
src/judges/– judge integrationstone.py– tone comparison judge (tiktoken-based first-token mapping)
configs/– example YAML configsdata/– local datasets and generated artifacts (created on first run)generations/– generated samples
results/– evaluation results (CSVs, figures)
-
Set a yaml config in
configs/. There is an example, minimal config that runs quickly available. -
Run the full pipeline (calibrate alphas → find the optimal layer and alpha combination → save results/plots):
python -m src --config configs/bench.example.yaml --full- Enable LLM-judge OOD calibration (requires OpenAI key):
python -m src --config configs/bench.example.yaml --full \
--judge-enabled --judge-model gpt-4o-mini --judge-api-key YOUR_OPENAI_KEYOutputs will be written to results/ (CSVs and figures), and generated samples (if any) to data/generations/.
--config– path to YAML config (see example above)--model– HF model id (overrides YAML)--task–tones--methods– any ofk-steering,caa,dct(quick benchmark mode)--layers– list of layer indices to benchmark--eval-layer– layer used for the activation classifier--num-attributes/--target-labels– target labels selection--max-samples– cap prompts used in evaluation--full– run the full notebook-equivalent pipeline- Judge flags:
--judge-enabled,--judge-model,--judge-api-key