Trajectory Attention For Fine-grained Video Motion Control

Zeqi Xiao1 Wenqi Ouyang1 Yifan Zhou1 Shuai Yang2 Lei Yang3 Jianlou Si3 Xingang Pan1

1S-Lab, Nanyang Technological University,
2Wangxuan Institute of Computer Technology, Peking University,
3Sensetime Research


Yellow boxes indicate reference contents. Green boxes indicate input frames. Blue boxes indicate output frames.

Trajectory attention facilitates various tasks like camera motion control on images and videos, and video editing.

[Paper]    [Code]

Methods

Our method allows for conditioning on trajectories from various sources -- such as camera motion derived from a single image, as shown in this figure. We inject these conditions into the model through trajectory attention, enabling explicit and fine-grained control over the motion in the generated video.

Comparison

Camera Motion Control On Image

First Frame

NVS Solver

CameraCtrl

Motion I2V

Ours

First Frame

NVS Solver

CameraCtrl

Motion I2V

Ours

Camera Motion Control On Video

Origin Video

Reference Warp

NVS Solver

Ours

Ours w. NVS Solver

Reference Warp

NVS Solver

Ours

Ours w. NVS Solver

Video Editing

Origin Video

Edited First Frame

I2VEdit

AnyV2V

Ours

Origin Video

Edited First Frame

I2VEdit

AnyV2V

Ours

Gallery

Camera Motion Control on Image

First Frame

Camera Motion Control Results

First Frame

Camera Motion Control Results

First Frame

Camera Motion Control Results

First Frame

Camera Motion Control Results

First Frame

Camera Motion Control Results

First Frame

Camera Motion Control Results

Camera Motion Control on Video

Origin Video

Camera Motion Control Results

Origin Video

Camera Motion Control Results

Origin Video

Camera Motion Control Results

Origin Video

Camera Motion Control Results

Video Editing

Origin Video

Edited First Frame

Edited Video

Origin Video

Edited First Frame

Edited Video

Origin Video

Edited First Frame

Edited Video

Origin Video

Edited First Frame

Edited Video

Origin Video

Reconstruted with 4x sparse traj.

Reconstruted with 16x sparse traj.

Reconstruted with 32x sparse traj.

Origin Video

Region Mask

Region Motion Control

Results on 3D DiT

Project page template is borrowed from DreamBooth.