Think Before You Segment: An Object-aware Reasoning Agent for Referring Audio-Visual Segmentation

📰 News

🔥2025.11.08: Our paper got accepted to AAAI 2026! Thanks to all co-authors and the anonymous reviewers🎉🎉

🔥2025.11.01: Data, Code, and Checkpoints are released!

📄 Citation

If our work assists your research, feel free to give us a star ⭐ and cite us using

@article{zhou2025think,
  title={Think before you segment: An object-aware reasoning agent for referring audio-visual segmentation},
  author={Zhou, Jinxing and Zhou, Yanghao and Han, Mingfei and Wang, Tong and Chang, Xiaojun and Cholakkal, Hisham and Anwer, Rao Muhammad},
  journal={arXiv preprint arXiv:2508.04418},
  year={2025}
}

⚙️ Installation

git clone https://github.com/jasongief/TGS-Agent.git
cd TGS-Agent

For Think Phase

conda env create -f think_environment.yml
conda activate think

Alternative: you may also refer to Crab for environment installation

For Ground-Segment Phase

cd ground_segment_scripts

git clone https://github.com/IDEA-Research/Grounded-SAM-2.git
cd Grounded-SAM-2

conda env create -f dinosam2_environment.yml
conda activate dino

Alternative: you may refer to Grounded-SAM2 for environment installation

cd ../TGS-Agent

🤗 Setup

Datasets

Download the official Ref-AVSBench dataset from here and put them in ./REFAVS. The metadata (csv file) should also be copyed to ./R2AVSBench
Download our instruction tuning data for Ref-Thinker training from here and put the json file into ./R2AVSBench.
Download metadata of our R2-AVSBench from here and put the csv file into ./R2AVSBench.

Pretrained Backbones

Download the necessary pre-trained backbones and put them in ./pretrained_weights, including

Multimodal Encoder Weights:

download visual encoder openai-clip-vit-large-patch14
download audio encoder BEATs_iter3_plus_AS2M_finetuned_on_AS2M_cpt2

LLM Weights:

download LLaMA-2-Chat-HF

Pretrained Multimodal Projector

download pretrained audio projector: audio pretrain checkpoint
download pretrained visual projector: visual pretrain checkpoint,

Checkpoints

Download the following checkpoints:

download our pretrained Ref-thinker and put it into results_real.
run the following scripts to prepare GroundingDINO weights:

cd ./ground_segment_scripts/Grounded-SAM-2/gdino_checkpoints
bash download_ckpts.sh

run the following scripts to prepare SAM2 weights:

cd ./ground_segment_scripts/Grounded-SAM-2/checkpoints
bash download_ckpts.sh

📌 Getting Started

Train Ref-Thinker

cd TGS-Agent
conda activate think
bash scripts/finetune/finetune_hyperlora.sh

Test Ref-Thinker

cd TGS-Agent
conda activate think
bash scripts/finetune/inference_hyper_lora.sh

This generates the object-aware reasoning chain for each given reference from default Ref-AVSBench. You may change the test meta csv path for evaluating our proposed R^2-AVSBench. After obtaining the fine-grained and simplified object description, we can start the subsequent Ground and Segment phase.

Ground-Segment

cd ground_segment_scripts
conda activate dino

inference on Ref-AVSBench prompted by Ref-Thinker

python ground_segment_with_object_text_after_thinking_for_RefAVSBench.py

inference on Ref-AVSBench prompted by Original raw reference

python ground_segment_with_direct_reference_of_RefAVSBench.py

inference on R^2-AVSBench prompted by Ref-Thinker

python ground_segment_with_object_text_after_thinking_for_R2AVSBench.py

inference on R^2-AVSBench prompted by Original raw reference

python ground_segment_with_direct_reference_of_R2AVSBench.py

Acknowledgement

We thank the Crab and Grounded-SAM2 for their open-source, which help a lot in this project.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
configs		configs
dataset		dataset
deepspeed		deepspeed
ground_segment_scripts		ground_segment_scripts
models		models
peft_hyper		peft_hyper
scripts		scripts
utils		utils
.gitignore		.gitignore
README.md		README.md
dinosam2_environment.yml		dinosam2_environment.yml
think_environment.yml		think_environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Think Before You Segment: An Object-aware Reasoning Agent for Referring Audio-Visual Segmentation

📰 News

📄 Citation

⚙️ Installation

For Think Phase

For Ground-Segment Phase

🤗 Setup

Datasets

Pretrained Backbones

Checkpoints

📌 Getting Started

Train Ref-Thinker

Test Ref-Thinker

Ground-Segment

Acknowledgement

About

Uh oh!

Releases

Packages

Languages

jasongief/TGS-Agent

Folders and files

Latest commit

History

Repository files navigation

Think Before You Segment: An Object-aware Reasoning Agent for Referring Audio-Visual Segmentation

📰 News

📄 Citation

⚙️ Installation

For Think Phase

For Ground-Segment Phase

🤗 Setup

Datasets

Pretrained Backbones

Checkpoints

📌 Getting Started

Train Ref-Thinker

Test Ref-Thinker

Ground-Segment

Acknowledgement

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages