Kaihua Chen

I’m a Master’s student in Computer Vision (MSCV) at the Robotics Institute, School of Computer Science, Carnegie Mellon University, advised by Prof. Deva Ramanan. My current research broadly focuses on diffusion generative models and 3D/4D vision, and I am also interested in exploring robotics and VLMs. Previously, I was fortunate to work as a research intern in the Lin Brain Lab at the University of Toronto during my undergraduate studies at China Agricultural University.

Email  /  CV  /  Scholar  /  Github

profile photo

Research

I keep wondering: is the diffusion/flow formulation alone already enough to build a virtual world as real as Ready Player One, with intelligent agents as alive as in Her? From my work, you can see how current video diffusion models excel in photorealistic generation and their emerging 3D structure. Yet I’m still exploring what the best generative formulation and 3D representation might be, and how they can be applied to virtual reality, robotics, and other practical scenarios.

Reconstruct, Inpaint, Finetune: Dynamic Novel-view Synthesis from Monocular Videos
Kaihua Chen * , Tarasha Khurana *, Deva Ramanan
NeurIPS, 2025
project page / arXiv / code

We reformulate novel-view synthesis as a structured inpainting task. CogNVS is a video diffusion model for dynamic novel-view synthesis trained in a self-supervised manner using only 2D videos!

Using Diffusion Priors for Video Amodal Segmentation
Kaihua Chen, Deva Ramanan, Tarasha Khurana
CVPR, 2025
project page / arXiv / code

Given a modal (visible) object sequence in a video, we develop a two-stage method that generates its amodal (visible + invisible) masks and RGB content via video diffusion.

Metric from Human: Zero-shot Monocular Metric Depth Estimation via Test-time Adaptation
Yizhou Zhao, Hengwei Bian, Kaihua Chen, Pengliang Ji, Liao Qu, Shao-yu Lin, Weichen Yu, Haoran Li, Hao Chen, Jun Shen, Bhiksha Raj, Min Xu,
NeurIPS, 2024
project page / paper / code

MfH converts relative depth estimation to metric depth estimation via generative painting and human mesh recovery.

TAO-Amodal: A Benchmark for Tracking Any Object Amodally
Cheng-Yen Hsieh, Kaihua Chen, Achal Dave, Tarasha Khurana, Deva Ramanan
arXiv, 2024
project page / arXiv / code / dataset

We introduce TAO-Amodal, an amodal tracking dataset featuring 833 diverse categories in thousands of video sequences.

Miscellaneous

In my spare time, I enjoy watching movies 🍿 and playing soccer ⚽️.


Template borrowed from Jon Barron.
Last updated: Oct 30, 2025