Skip to content

Official Repository for NeurIPS'25 Paper "Tool-Augmented Spatiotemporal Reasoning for Streamlining Video Question Answering Task"

License

Notifications You must be signed in to change notification settings

fansunqi/VideoTool

Repository files navigation

VideoTool

Keep Updating...

GitHub license Arxiv

This repository is the official implementation of Tool-Augmented Spatiotemporal Reasoning for Streamlining Video Question Answering Task (NeurIPS 2025 main track).

News and Todo 🗓️

  • Release all video tools and test scripts

  • Release toolchain algorithm (STAR)

  • Release evaluating scripts

Introduction

In this work, we equip MLLM with a comprehensive and extensible Video Toolkit, to enhance MLLM's spatiotemporal reasoning capabilities and ensure the harmony between the quantity and diversity of tools. To better control the tool invocation sequence and avoid toolchain shortcut issues, we propose a Spatiotemporal Reasoning Framework (STAR) that strategically schedules temporal and spatial tools, thereby progressively localizing the key area in the video. Our STAR framework enhances GPT-4o using lightweight tools, achieving an 8.2% gain on VideoMME and 4.6% on LongVideoBench.

Setup and Configuration 🛠️

  1. Clone the repository 📦:

    git clone git@github.com:fansunqi/VideoTool.git
    cd ToolChainVideo
  2. Create a virtual environment 🧹 and install the dependencies 🧑‍🍳:

    conda create -n videotool python=3.9
    conda activate videotool
    pip install -r requirements.txt
  3. Set up your API key 🗝️ in config/*.yaml:

    openai:
      GPT_API_KEY: "put your openai api key here"
      PROXY: "put your openai base url here"
  4. Bulid related projects 🧩:

    mkdir projects
    cd projects
    • Download Grounded-Video-LLM for temporal grounding and temporal QA

      git clone git@github.com:WHB139426/Grounded-Video-LLM.git
    • Build LLaVA for image QA

      git clone git@github.com:fansunqi/LLaVA.git
      cd LLaVA
      pip install -e .
      cd ..

Tools

Thanks to the authors of these open-source projects for providing excellent projects.

Temporal Tools:

Spatial Tools:

Generalist Solution:

Download Datasets

  • NeXT-QA:
    git clone git@github.com:doc-doc/NExT-QA.git
    
    specify your data path in config/nextqa.yaml

About

Official Repository for NeurIPS'25 Paper "Tool-Augmented Spatiotemporal Reasoning for Streamlining Video Question Answering Task"

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages