I’m still exploring what the best generative formulation and 3D representation might be, and how they can be applied to virtual reality, robotics, and other practical scenarios.
We reformulate novel-view synthesis as a structured inpainting task.
CogNVS is a video diffusion model for dynamic novel-view synthesis trained in a self-supervised manner using only 2D videos!
Given a modal (visible) object sequence in a video,
we develop a two-stage method that generates its amodal (visible + invisible) masks and RGB content via video diffusion.