Drop-In Perceptual Optimization for 3D Gaussian Splatting
AuthorsEzgi Özyılkan†‡, Zhiqi Chen‡, Oren Rippel, Jona Ballé†, Kedar Tatwawadi
Drop-In Perceptual Optimization for 3D Gaussian Splatting
AuthorsEzgi Özyılkan†‡, Zhiqi Chen‡, Oren Rippel, Jona Ballé†, Kedar Tatwawadi
Despite their output being ultimately consumed by human viewers, 3D Gaussian Splatting (3DGS) methods often rely on ad-hoc combinations of pixel-level losses, resulting in blurry renderings. To address this, we systematically explore perceptual optimization strategies for 3DGS by searching over a diverse set of distortion losses. We conduct the first-of-its-kind large-scale human subjective study on 3DGS, involving 39,320 pairwise ratings across several datasets and 3DGS frameworks. A regularized version of Wasserstein Distortion, which we call WD-R, emerges as the clear winner, excelling at recovering fine textures without incurring a higher splat count. WD-R is preferred by raters more than 2.3× over the original 3DGS loss, and 1.5× over current best method Perceptual-GS. WD-R also consistently achieves state-of-the-art LPIPS, DISTS, and FID scores across various datasets, and generalizes across recent frameworks, such as Mip-Splatting and Scaffold-GS, where replacing the original loss with WD-R consistently enhances perceptual quality within a similar resource budget (number of splats for Mip-Splatting, model size for Scaffold-GS), and leads to reconstructions being preferred by human raters 1.8× and 3.6×, respectively. We also find that this carries over to the task of 3DGS scene compression, with ≈50% bitrate savings for comparable perceptual metric performance.
Figure 1: 3DGS representation and compression frameworks optimized using 2D distortion and rate-distortion objectives, incorporating perceptual losses as part of the training framework.
Figure 2: Bayesian Elo scores for 3DGS representation methods across indoor scenes (Deep Blending, Mip-NeRF 360 indoor), outdoor scenes (Tanks & Temples, Mip-NeRF 360 outdoor, and BungeeNeRF), and all scenes combined. WD-R and WD achieve the highest scores in all settings (within the 95% confidence interval).
DSplats: 3D Generation by Denoising Splats-Based Multiview Diffusion Models
January 18, 2025research area Computer Vision
Generating high-quality 3D content requires models capable of learning robust distributions of complex scenes and the real-world objects within them. Recent Gaussian-based 3D reconstruction techniques have achieved impressive results in recovering high-fidelity 3D assets from sparse input images by predicting 3D Gaussians in a feed-forward manner. However, these techniques often lack the extensive priors and expressiveness offered by Diffusion…
HUGS: Human Gaussian Splats
December 7, 2023research area Computer Vision
Recent advances in neural rendering have improved both training and rendering times by orders of magnitude. While these methods demonstrate state-of-the-art quality and speed, they are designed for photogrammetry of static scenes and do not generalize well to freely moving humans in the environment. In this work, we introduce Human Gaussian Splats (HUGS) that represents an animatable human together with the scene using 3D Gaussian Splatting…