Neural Face Video Compression using Multiple Views
AuthorsAnna Volokitin, Stefan Brugger, Ali Benlalah, Sebastian Martin, Brian Amberg, Michael Tschannen
AuthorsAnna Volokitin, Stefan Brugger, Ali Benlalah, Sebastian Martin, Brian Amberg, Michael Tschannen
Recent advances in deep generative models led to the development of neural face video compression codecs that use an order of magnitude less bandwidth than engineered codecs. These neural codecs reconstruct the current frame by warping a source frame and using a generative model to compensate for imperfections in the warped source frame. Thereby, the warp is encoded and transmitted using a small number of keypoints rather than a dense flow field, which leads to massive savings compared to traditional codecs. However, by relying on a single source frame only, these methods lead to inaccurate reconstructions (e.g. one side of the head becomes unoccluded when turning the head and has to be synthesized). Here, we aim to tackle this issue by relying on multiple source frames (views of the face) and present encouraging results.
May 30, 2025research area Computer Vision, research area Methods and Algorithmsconference CVPR
As diffusion models dominating visual content generation, efforts have been made to adapt these models for multi-view image generation to create 3D content. Traditionally, these methods implicitly learn 3D consistency by generating only RGB frames, which can lead to artifacts and inefficiencies in training. In contrast, we propose generating Normalized Coordinate Space (NCS) frames alongside RGB frames. NCS frames capture each pixel's global...
June 13, 2022research area Computer Vision
Apple sponsored the IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR), which will be held in New Orleans, Louisiana from June 19 to 24. CVPR is an annual computer vision conference with several workshops and short courses.