-
Notifications
You must be signed in to change notification settings - Fork 3
Open
Description
Hello! 👋
I would like to share to add our latest work, Unified Video Editing with Temporal Reasoner (aka VideoCoF) to this awesome list.
VideoCoF introduces a Chain-of-Frames (CoF) mechanism that empowers video diffusion models with temporal reasoning capabilities.
- Unlike previous methods, it follows a "See -> Reason -> Edit" paradigm: the model first explicitly reasons about the editing region (generating a reasoning frame) before executing the edit. This allows for precise, mask-free video editing.
- Trained on only 50k data (33 frames), VideoCoF demonstrates robust multi-shot editing and length generalization (e.g., 4× length extrapolation).
- Diverse Editing Tasks: Supports fine-grained (instance and part level, spatial aware) Object Removal, Object Addition, Object Swap, and Local Style Transfer.
- Also, benefited from dmd Lora, we released a 4-step fast inference script (~20-30s per video). The full model and code has been released.
We believe this work fits perfectly into the scope of Video Reasoning.
Here is the information:
Paper Title: Unified Video Editing with Temporal Reasoner
Authors: Xiangpeng Yang, Ji Xie, Yiyuan Yang, Yan Huang, Min Xu, Qiang Wu
Links:
[Paper] | [Project Page] }| [Code]
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels