My current research focuses on leveraging diffusion priors, which capture both photorealistic generation and underlying 3D structure,
for tasks including amodal segmentation, video understanding, depth estimation, and 4D reconstruction.
We reformulate novel-view synthesis as a structured inpainting task.
CogNVS is a video diffusion model for dynamic novel-view synthesis trained in a self-supervised manner using only 2D videos!
Given a modal (visible) object sequence in a video,
we develop a two-stage method that generates its amodal (visible + invisible) masks and RGB content via video diffusion.