A Large Short-video Recommendation Dataset with Raw Text/Audio/Image/Videos (Talk Invited by DeepMind).
-
Updated
Jan 27, 2025 - Python
A Large Short-video Recommendation Dataset with Raw Text/Audio/Image/Videos (Talk Invited by DeepMind).
A benchmark that focuses on the sampling dilemma in long-video tasks. Through well-designed tasks, it evaluates the sampling efficiency of long-video VLMs. (ICCV2025)
[Arxiv 2509.14199] DENSE VIDEO UNDERSTANDING WITH GATED RESIDUAL TOKENIZATION
Additional Videos Data and QA pairs for Balancing Original MUSIC-AVQA Dataset
SCVBench: A Benchmark with multi-turn dialogues for Story-Centric Video Understanding (IJCAI' 25)
(Accepted: NeurIPS 2025 Workshop Mexico City 7HVU) AdCare-VLM: Leveraging Large Vision Language Models (LVLMs) to Monitor Long-Term Medication Adherence and Care
Computer vision that understands temporal relationships and causality in video sequences.
📹 Enhance computer vision with temporal reasoning for deeper understanding of video sequences, causal analysis, and event prediction.
Add a description, image, and links to the video-understanding-dataset topic page so that developers can more easily learn about it.
To associate your repository with the video-understanding-dataset topic, visit your repo's landing page and select "manage topics."