paperApril 2025

CoMotion: Concurrent Multi-Person 3D Motion

AuthorsAlejandro Newell, Peiyun Hu, Lahav Lipson, Stephan R. Richter, Vladlen Koltun

We introduce an approach for detecting and tracking detailed 3D poses of multiple people from a single monocular camera stream. Our system maintains temporally coherent predictions in crowded scenes filled with difficult poses and occlusions. Our model performs both strong per-frame detection and a learned pose update to track people from frame to frame. Rather than match detections across time, poses are updated directly from a new input image, which enables online tracking through occlusion. We train on numerous image and video datasets leveraging pseudo-labeled annotations to produce a model that matches state-of-the-art systems in 3D pose estimation accuracy while being faster and more accurate in tracking multiple people through time.

Related readings and updates.

Rooms from Motion: Un-posed Indoor 3D Object Detection as Localization and Mapping

October 24, 2025research area Computer Visionconference NeurIPS

We revisit scene-level 3D object detection as the output of an object-centric framework capable of both localization and mapping using 3D oriented boxes as the underlying geometric primitive. While existing 3D object detection approaches operate globally and implicitly rely on the a priori existence of metric camera poses, our method, Rooms from Motion (RfM) operates on a collection of un-posed images. By replacing the standard 2D keypoint-based…

LivePose: Online 3D Reconstruction from Monocular Video with Dynamic Camera Poses

October 23, 2023research area Computer Visionconference ICCV

Dense 3D reconstruction from RGB images traditionally assumes static camera pose estimates. This assumption has endured, even as recent works have increasingly focused on real-time methods for mobile devices. However, the assumption of one pose per image does not hold for online execution: poses from real-time SLAM are dynamic and may be updated following events such as bundle adjustment and loop closure. This has been addressed in the RGB-D…

CoMotion: Concurrent Multi-Person 3D Motion

Related readings and updates.

Rooms from Motion: Un-posed Indoor 3D Object Detection as Localization and Mapping

LivePose: Online 3D Reconstruction from Monocular Video with Dynamic Camera Poses

Discover opportunities in Machine Learning.