Skip to content
/ STV Public

Official Code for the Paper "Where and What Matters: Sensitivity-Aware Task Vectors for Many-Shot Multimodal In-Context Learning"

Notifications You must be signed in to change notification settings

AMAP-ML/STV

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

STV ⚡⚡

[AAAI 2026] Official Code for the Paper "Where and What Matters: Sensitivity-Aware Task Vectors for Many-Shot Multimodal In-Context Learning"

We propose a novel Sensitivity-aware Task Vector insertion framework (STV) to figure out \textit{where and what} to insert. Our key insight is that activation deltas across query-context pairs exhibit consistent structural patterns, providing a reliable cue for insertion. Based on the identified sensitive-aware locations, we construct a pre-clustered activation bank for each location by clustering the activation values, and then apply reinforcement learning to choose the most suitable one to insert.

More details can be found in our paper.

Method Description


Overview of the STV framework. It has two stages: (1) identifying context-sensitive heads via activation deltas between query–context and query-only inputs; and (2) selecting task vectors from a pre-computed activation bank using reinforcement learning for those locations.

💻 Setup


Prerequisites

  • Python 3.8+
  • CUDA-capable GPU (recommended)
  • PyTorch 1.12+

Installation

  1. Create a conda environment (recommended)

    conda create -n stv python=3.8
    conda activate stv
  2. Install required packages

    pip install torch torchvision
    pip install transformers==4.32.0
    pip install accelerate tiktoken einops transformers_stream_generator==0.0.4
    pip install scipy scikit-learn matplotlib seaborn tqdm pillow numpy tensorboard
    pip install git+https://github.com/davidbau/baukit@main#egg=baukit
  3. Install model-specific dependencies

    • For Qwen-VL: Follow the installation instructions in the Qwen-VL repository
    • For Idefics2: The model will be automatically downloaded from HuggingFace when first used

Datasets

  • VizWiz & OKVQA: Please follow the instructions in the Qwen-VL repository to prepare the datasets.
  • Flower, CUB, and DTD: Download the images from their respective official websites. We provide the 2-way 1-shot text annotations in the data files.

The expected data format is JSONL (for VizWiz and OKVQA) or JSON (for Flower, CUB, DTD) with the following structure:

{
  "image": "path/to/image.jpg",
  "question": "Your question here",
  "answer": "Expected answer",
  "question_id": "unique_id"
}

Models

The framework currently supports:

  • Qwen-VL: Qwen/Qwen-VL from HuggingFace
  • Idefics2: HuggingFaceM4/Idefics3-8B-Llama3

To use custom models, please refer to models.py and implement a ModelHelper class following the interface defined there.

🚀 Quick Start


Basic Usage

Run evaluation with the STV framework:

python stv_eval.py \
    --model_name Qwen-VL \
    --data_name vizwiz \
    --train_path /path/to/train.jsonl \
    --val_path /path/to/val.jsonl \
    --num_example 100 \
    --num_shot 8 \
    --topk 96 \
    --num_clusters 64 \
    --n_epochs 600 \
    --cur_mode both \
    --experiment_name my_experiment

Using Evaluation Scripts

We provide ready-to-use evaluation scripts in eval_scripts/:

# Example: Evaluate on VizWiz
bash eval_scripts/eval_vizwiz.sh

# Example: Evaluate on OKVQA
bash eval_scripts/eval_okvqav3.sh

Make sure to update the paths in the scripts to point to your dataset locations.

📁 Project Structure


STV/
├── stv_eval.py          # Main evaluation script
├── stv_utils.py         # Core STV utilities and functions
├── models.py            # Model helper classes
├── preprocess.py        # Data preprocessing and formatting
├── eval_scripts/        # Evaluation scripts for different datasets
│   ├── eval_vizwiz.sh
│   ├── eval_okvqa.sh
│   ├── eval_okvqav3.sh
│   └── ...
└── llava/               # LLaVA model implementation (if needed)

📝 Citation


If you found our work useful, please consider starring and citing. Thank you!

@article{ma2025and,
  title={Where and What Matters: Sensitivity-Aware Task Vectors for Many-Shot Multimodal In-Context Learning},
  author={Ma, Ziyu and Gou, Chenhui and Hu, Yiming and Wang, Yong and Chu, Xiangxiang and Zhuang, Bohan and Cai, Jianfei},
  journal={arXiv preprint arXiv:2511.08246},
  year={2025}
}

About

Official Code for the Paper "Where and What Matters: Sensitivity-Aware Task Vectors for Many-Shot Multimodal In-Context Learning"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published