Kaihua Chen

I’m a Master’s student in Computer Vision (MSCV) at the Robotics Institute, School of Computer Science, Carnegie Mellon University, advised by Prof. Deva Ramanan. My research broadly focuses on diffusion generative models and 3D/4D vision. Previously, I was fortunate to work as a research intern in the Lin Brain Lab at the University of Toronto during my undergraduate studies at China Agricultural University.

Email / CV / Scholar / Github

Research

My current research focuses on leveraging diffusion priors, which capture both photorealistic generation and underlying 3D structure, for tasks including amodal segmentation, video understanding, depth estimation, and 4D reconstruction.

	Reconstruct, Inpaint, Finetune: Dynamic Novel-view Synthesis from Monocular Videos Kaihua Chen ^* , Tarasha Khurana ^, Deva Ramanan NeurIPS*, 2025 project page / arXiv / code We reformulate novel-view synthesis as a structured inpainting task. CogNVS is a video diffusion model for dynamic novel-view synthesis trained in a self-supervised manner using only 2D videos!
	Using Diffusion Priors for Video Amodal Segmentation Kaihua Chen, Deva Ramanan, Tarasha Khurana CVPR, 2025 project page / arXiv / code Given a modal (visible) object sequence in a video, we develop a two-stage method that generates its amodal (visible + invisible) masks and RGB content via video diffusion.
	Metric from Human: Zero-shot Monocular Metric Depth Estimation via Test-time Adaptation Yizhou Zhao, Hengwei Bian, Kaihua Chen, Pengliang Ji, Liao Qu, Shao-yu Lin, Weichen Yu, Haoran Li, Hao Chen, Jun Shen, Bhiksha Raj, Min Xu, NeurIPS, 2024 project page / paper / code MfH converts relative depth estimation to metric depth estimation via generative painting and human mesh recovery.
	TAO-Amodal: A Benchmark for Tracking Any Object Amodally Cheng-Yen Hsieh, Kaihua Chen, Achal Dave, Tarasha Khurana, Deva Ramanan arXiv, 2024 project page / arXiv / code / dataset We introduce TAO-Amodal, an amodal tracking dataset featuring 833 diverse categories in thousands of video sequences.

Template borrowed from Jon Barron.
Last updated: Oct 30, 2025