Skip to content

Share New Paper-VideoCoF #4

@knightyxp

Description

@knightyxp

Hello! 👋

I would like to share to add our latest work, Unified Video Editing with Temporal Reasoner (aka VideoCoF) to this awesome list.

VideoCoF introduces a Chain-of-Frames (CoF) mechanism that empowers video diffusion models with temporal reasoning capabilities.

  1. Unlike previous methods, it follows a "See -> Reason -> Edit" paradigm: the model first explicitly reasons about the editing region (generating a reasoning frame) before executing the edit. This allows for precise, mask-free video editing.
  2. Trained on only 50k data (33 frames), VideoCoF demonstrates robust multi-shot editing and length generalization (e.g., 4× length extrapolation).
  3. Diverse Editing Tasks: Supports fine-grained (instance and part level, spatial aware) Object Removal, Object Addition, Object Swap, and Local Style Transfer.
  4. Also, benefited from dmd Lora, we released a 4-step fast inference script (~20-30s per video). The full model and code has been released.

We believe this work fits perfectly into the scope of Video Reasoning.

Here is the information:

Paper Title: Unified Video Editing with Temporal Reasoner

Authors: Xiangpeng Yang, Ji Xie, Yiyuan Yang, Yan Huang, Min Xu, Qiang Wu

Links:
[Paper] | [Project Page] }| [Code]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions