STV ⚡⚡

[AAAI 2026] Official Code for the Paper "Where and What Matters: Sensitivity-Aware Task Vectors for Many-Shot Multimodal In-Context Learning"

We propose a novel Sensitivity-aware Task Vector insertion framework (STV) to figure out \textit{where and what} to insert. Our key insight is that activation deltas across query-context pairs exhibit consistent structural patterns, providing a reliable cue for insertion. Based on the identified sensitive-aware locations, we construct a pre-clustered activation bank for each location by clustering the activation values, and then apply reinforcement learning to choose the most suitable one to insert.

More details can be found in our paper.

Method Description

Overview of the STV framework. It has two stages: (1) identifying context-sensitive heads via activation deltas between query–context and query-only inputs; and (2) selecting task vectors from a pre-computed activation bank using reinforcement learning for those locations.

💻 Setup

Prerequisites

Python 3.8+
CUDA-capable GPU (recommended)
PyTorch 1.12+

Installation

Create a conda environment (recommended)

conda create -n stv python=3.8
conda activate stv

Install required packages

pip install torch torchvision
pip install transformers==4.32.0
pip install accelerate tiktoken einops transformers_stream_generator==0.0.4
pip install scipy scikit-learn matplotlib seaborn tqdm pillow numpy tensorboard
pip install git+https://github.com/davidbau/baukit@main#egg=baukit

Install model-specific dependencies
- For Qwen-VL: Follow the installation instructions in the Qwen-VL repository
- For Idefics2: The model will be automatically downloaded from HuggingFace when first used

Datasets

VizWiz & OKVQA: Please follow the instructions in the Qwen-VL repository to prepare the datasets.
Flower, CUB, and DTD: Download the images from their respective official websites. We provide the 2-way 1-shot text annotations in the data files.

The expected data format is JSONL (for VizWiz and OKVQA) or JSON (for Flower, CUB, DTD) with the following structure:

{
  "image": "path/to/image.jpg",
  "question": "Your question here",
  "answer": "Expected answer",
  "question_id": "unique_id"
}

Models

The framework currently supports:

Qwen-VL: Qwen/Qwen-VL from HuggingFace
Idefics2: HuggingFaceM4/Idefics3-8B-Llama3

To use custom models, please refer to models.py and implement a ModelHelper class following the interface defined there.

🚀 Quick Start

Basic Usage

Run evaluation with the STV framework:

python stv_eval.py \
    --model_name Qwen-VL \
    --data_name vizwiz \
    --train_path /path/to/train.jsonl \
    --val_path /path/to/val.jsonl \
    --num_example 100 \
    --num_shot 8 \
    --topk 96 \
    --num_clusters 64 \
    --n_epochs 600 \
    --cur_mode both \
    --experiment_name my_experiment

Using Evaluation Scripts

We provide ready-to-use evaluation scripts in eval_scripts/:

# Example: Evaluate on VizWiz
bash eval_scripts/eval_vizwiz.sh

# Example: Evaluate on OKVQA
bash eval_scripts/eval_okvqav3.sh

Make sure to update the paths in the scripts to point to your dataset locations.

📁 Project Structure

STV/
├── stv_eval.py          # Main evaluation script
├── stv_utils.py         # Core STV utilities and functions
├── models.py            # Model helper classes
├── preprocess.py        # Data preprocessing and formatting
├── eval_scripts/        # Evaluation scripts for different datasets
│   ├── eval_vizwiz.sh
│   ├── eval_okvqa.sh
│   ├── eval_okvqav3.sh
│   └── ...
└── llava/               # LLaVA model implementation (if needed)

📝 Citation

If you found our work useful, please consider starring and citing. Thank you!

@article{ma2025and,
  title={Where and What Matters: Sensitivity-Aware Task Vectors for Many-Shot Multimodal In-Context Learning},
  author={Ma, Ziyu and Gou, Chenhui and Hu, Yiming and Wang, Yong and Chu, Xiangxiang and Zhuang, Bohan and Cai, Jianfei},
  journal={arXiv preprint arXiv:2511.08246},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
eval_scripts		eval_scripts
llava		llava
.gitignore		.gitignore
README.md		README.md
diff.py		diff.py
models.py		models.py
preprocess.py		preprocess.py
stv.png		stv.png
stv_eval.py		stv_eval.py
stv_utils.py		stv_utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

STV ⚡⚡

[AAAI 2026] Official Code for the Paper "Where and What Matters: Sensitivity-Aware Task Vectors for Many-Shot Multimodal In-Context Learning"

Method Description

💻 Setup

Prerequisites

Installation

Datasets

Models

🚀 Quick Start

Basic Usage

Using Evaluation Scripts

📁 Project Structure

📝 Citation

About

Uh oh!

Releases

Packages

Languages

AMAP-ML/STV

Folders and files

Latest commit

History

Repository files navigation

STV ⚡⚡

[AAAI 2026] Official Code for the Paper "Where and What Matters: Sensitivity-Aware Task Vectors for Many-Shot Multimodal In-Context Learning"

Method Description

💻 Setup

Prerequisites

Installation

Datasets

Models

🚀 Quick Start

Basic Usage

Using Evaluation Scripts

📁 Project Structure

📝 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages