Add mech interp examples & utils by rsmith49 · Pull Request #31 · withmartian/ares

rsmith49 · 2026-01-16T01:29:27Z

User description

In progress - current implementation is mostly vibe coded, will clean up and merge

Generated description

Below is a concise technical summary of the changes proposed in this PR:

graph LR
main_("main"):::added
HookedTransformerLLMClient_("HookedTransformerLLMClient"):::added
ActivationCapture_("ActivationCapture"):::added
TrajectoryActivations_("TrajectoryActivations"):::added
InterventionManager_("InterventionManager"):::added
create_zero_ablation_hook_("create_zero_ablation_hook"):::added
main_ -- "Adds hooked-transformer LLM client enabling local inference with hooks." --> HookedTransformerLLMClient_
main_ -- "Captures activations per step via ActivationCapture context manager." --> ActivationCapture_
ActivationCapture_ -- "get_trajectory returns TrajectoryActivations aggregating step ActivationCaches." --> TrajectoryActivations_
main_ -- "Uses TrajectoryActivations to access/save per-step activation data." --> TrajectoryActivations_
main_ -- "Applies InterventionManager to add/remove hook-based interventions during runs." --> InterventionManager_
InterventionManager_ -- "Uses zero-ablation hook to zero specific attention heads." --> create_zero_ablation_hook_
classDef added stroke:#15AA7A
classDef removed stroke:#CD5270
classDef modified stroke:#EDAC4C
linkStyle default stroke:#CBD5E1,font-size:13px

Integrates TransformerLens into the ARES framework to enable trajectory-level mechanistic interpretability for agents. Introduces specialized components for capturing model activations and performing causal interventions during agent execution.

Topic Details

Examples & Docs

Provides a comprehensive guide and a runnable example demonstrating how to use ActivationCapture and InterventionManager for analyzing agent behavior.

Modified files (3)

examples/07_mech_interp_hooked_transformer.py
pyproject.toml
src/ares/contrib/mech_interp/README.md

Latest Contributors(2)

User	Commit	Date
joshua.greaves@gmail.com	Bump-ARES-to-0.0.2-72	January 29, 2026
ryan@withmartian.com	Add-Tinker-Example-58	January 29, 2026

Interpretability Core

Implements the HookedTransformerLLMClient and ActivationCapture to facilitate activation tracking and hook-based interventions within ARES environments.

Modified files (4)

src/ares/contrib/mech_interp/__init__.py
src/ares/contrib/mech_interp/activation_capture.py
src/ares/contrib/mech_interp/hook_utils.py
src/ares/contrib/mech_interp/hooked_transformer_client.py

Latest Contributors(0)

User	Commit	Date

This pull request is reviewed by Baz. Review like a pro on (Baz).

rsmith49 · 2026-01-22T06:06:16Z

src/ares/contrib/mech_interp/README.md

+  title = {ARES Mechanistic Interpretability Module},
+  author = {Martian},
+  year = {2025},
+  url = {https://github.com/anthropics/ares}


LOL I see you claude code

rsmith49 force-pushed the contrib-mi/add-transformer-lens branch from d852b59 to 9de6668 Compare January 22, 2026 05:37

rsmith49 commented Jan 22, 2026

View reviewed changes

rsmith49 added 2 commits January 30, 2026 11:30

Initial vibe coded mech interp implementation

0490efd

In progress edits/improvements

90c5138

rsmith49 force-pushed the contrib-mi/add-transformer-lens branch from 5953af7 to 90c5138 Compare January 30, 2026 19:31

rsmith49 added 3 commits January 30, 2026 11:32

Moved to new example number

d9ae1ad

Merge conflict updates

c442c54

In progress updates

78396e5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add mech interp examples & utils#31

Add mech interp examples & utils#31
rsmith49 wants to merge 5 commits intomainfrom
contrib-mi/add-transformer-lens

rsmith49 commented Jan 16, 2026 •

edited by baz-reviewer bot

Loading

Uh oh!

rsmith49 Jan 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rsmith49 commented Jan 16, 2026 • edited by baz-reviewer bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

Generated description

Uh oh!

rsmith49 Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

rsmith49 commented Jan 16, 2026 •

edited by baz-reviewer bot

Loading