Semantic Highlight ◦ Coding Agent Native ◦ Flexibly Use ◦ Long Context Tailored
Make Claude Tokens 40% Saving!
🔥 Releases:
- 1/28/2025: We release some scripts for optimize and visualize the result! see
./utils - 1/27/2025: We published our tech blogs:
Towards Real-World Software Agents: How we push Semantic Highlight feature to Agentic Coding? - 1/26/2025: Introduce SWE-Pruner
- 📖 paper: https://arxiv.org/abs/2601.16746
- ⚙️ code: https://github.com/Ayanami1314/swe-pruner
- 🐍 pip: https://pypi.org/project/swe-pruner/
- 🤗 huggingface: https://huggingface.co/ayanami-kitasan/code-pruner
Are you struggling with excessive token costs and latency when using LLM agents for software development? Traditional context compression often relies on fixed metrics like perplexity (PPL) and ignores task-specific code understanding. But generic compression ≠ relevant preservation — we need task-aware context pruning that retains critical implementation details.
Inspired by how human programmers "selectively skim" source code, SWE-Pruner enables agents to formulate explicit goals and uses a lightweight neural skimmer to dynamically select relevant code lines. It operates in two key steps:
- Formulate task-specific goals to guide the pruning process
- Dynamically select relevant code lines using a lightweight neural skimmer
🧠 Task-Aware Pruning Understands the intent (e.g., "focus on error handling") and uses it to guide context pruning process, beyond generic metrics.
🤖 Coding Agent Native Built for multi-turn workflows and integrates seamlessly into agent decision loops, providing just-in-time context for complex software engineering tasks.
🎨 Semantic Highlight A lightweight 0.6B model identifies and preserves semantically critical lines of code, keeping logical structures intact.
⚡ Extreme Compression Delivers significant token savings without sacrificing performance: 23-54% token reduction on SWE-Bench Verified and up to 14.84x compression on LongCodeQA, cutting API costs and latency.
🔧 Flexibly Use Adaptable framework for various LLMs and scenarios, from debugging to feature development.
.
├── data/ # Experiment trace archives and hyperparameter configurations
├── downstream_eval/ # Downstream evaluation benchmarks
│ ├── multi_turn/ # Includes: SWE-bench, SWEQA (coming soon)
│ └── single_turn/ # Includes: LongCodeQA, LCC (LongCodeCompletion)
├── swe-pruner/ # Inference code and model utilities
│ └── model/ # Model files for SWE-Pruner
├── examples # Examples for integrating with other agents like claude code and openhands
This project uses uv for fast and efficient dependency management.
Go to Inference Tutorial and have a try!
Tips: For easier serving and reproducing, we upload our models in
./swe-pruner/modeldirectory(tracked by git lfs). It make the serving more simple but greatly increase the repo size if you usegit clonedirectly without lfs config (and might failed to download model for traffic limit of github lfs service). However, you can use the methods in the tutorial to download it from HuggingFace.
Since different modules have different dependencies, please refer to the specific README file inside each subfolder for detailed installation instructions.
-
For Users, look Inference Tutorial to start a swe-pruner locally and then reading real world examples for agents integration.
- We now support OpenHands and Claude Agent SDK!
-
For Developers, look
./train(coming soon) for training a pruner by yourself! -
For Researchers,
./downstream_evalhas some scripts for reproducing our results. We recommend to use slurm with at least 4 GPU to reuse our scripts.
We provide some utils scripts for continue improving the swe-pruner in ./utils, just look utils/README.md!
- 💻 Update Training Code of SWE-Pruner
- 📁 Upload full parameters and trajectory files & logs
- 📁 Upload Training Dataset of SWE-Pruner
- 📁 Upload SWE-QA evaluation code
- 🤗 Update HuggingFace model card
- 🤗 Update HuggingFace blog to introducing our technical approach in detail.
- 🎮 Update agents integrate demo
@misc{wang2026sweprunerselfadaptivecontextpruning,
title={SWE-Pruner: Self-Adaptive Context Pruning for Coding Agents},
author={Yuhang Wang and Yuling Shi and Mo Yang and Rongrui Zhang and Shilin He and Heng Lian and Yuting Chen and Siyu Ye and Kai Cai and Xiaodong Gu},
year={2026},
eprint={2601.16746},
archivePrefix={arXiv},
primaryClass={cs.SE},
url={https://arxiv.org/abs/2601.16746},
}
- Bytedance Douyin Team for advises.
- Alibaba Qwen Team for open-source models.

