Sharp Monocular View Synthesis in Less Than a Second

AuthorsLars Mescheder, Wei Dong, Shiwei Li, Xuyang Bai, Marcel Santos, Peiyun Hu, Bruno Lecouat, Mingmin Zhen, Amaël Delaunoy, Tian Fang, Yanghai Tsin, Stephan R. Richter, Vladlen Koltun

View publication

View source code (GitHub)

We present SHARP, an approach to photorealistic view synthesis from a single image. Given a single photograph, SHARP regresses the parameters of a 3D Gaussian representation of the depicted scene. This is done in less than a second on a standard GPU via a single feedforward pass through a neural network. The 3D Gaussian representation produced by SHARP can then be rendered in real time, yielding high-resolution photorealistic images for nearby views. The representation is metric, with absolute scale, supporting metric camera movements. Experimental results demonstrate that SHARP delivers robust zero-shot generalization across datasets. It sets a new state of the art on multiple datasets, reducing LPIPS by 25-34% and DISTS by 21-43% versus the best prior model, while lowering the synthesis time by three orders of magnitude.

Comparison showing SHARP generating a photorealistic 3D representation from a single input photograph, with the top image as the input and the bottom image as a synthesized novel view with fine details. — Figure 1: SHARP synthesizes a photorealistic 3D representation from a single photograph in less than a second. Top: Input image; Bottom: Novel view synthesized by SHARP. The synthesized representation supports high-resolution rendering of nearby views, with sharp details and fine structures, at more than 100 frames per second on a standard GPU.

Related readings and updates.

Large-Scale High-Quality 3D Gaussian Head Reconstruction from Multi-View Captures

May 8, 2026research area Computer Visionconference ECCV

We propose HeadsUp, a scalable feed-forward method for reconstructing high-quality 3D Gaussian heads from large-scale multi-camera setups. Our method employs an efficient encoder-decoder architecture that compresses input views into a compact latent representation. This latent representation is then decoded into a set of UV-parameterized 3D Gaussians anchored to a neutral head template. This UV representation decouples the number of 3D Gaussians…

Fast and Explicit Neural View Synthesis

February 2, 2022research area Computer Vision, research area Methods and Algorithmsconference WACV

We study the problem of novel view synthesis from sparse source observations of a scene comprised of 3D objects. We propose a simple yet effective approach that is neither continuous nor implicit, challenging recent trends on view synthesis. Our approach explicitly encodes observations into a volumetric representation that enables amortized rendering. We demonstrate that although continuous radiance field representations have gained a lot of…

Sharp Monocular View Synthesis in Less Than a Second

Related readings and updates.

Large-Scale High-Quality 3D Gaussian Head Reconstruction from Multi-View Captures

Fast and Explicit Neural View Synthesis

Discover opportunities in Machine Learning.