Drop-In Perceptual Optimization for 3D Gaussian Splatting
AuthorsEzgi Özyılkan†‡, Zhiqi Chen‡, Oren Rippel, Jona Ballé†, Kedar Tatwawadi
Drop-In Perceptual Optimization for 3D Gaussian Splatting
AuthorsEzgi Özyılkan†‡, Zhiqi Chen‡, Oren Rippel, Jona Ballé†, Kedar Tatwawadi
Despite their output being ultimately consumed by human viewers, 3D Gaussian Splatting (3DGS) methods often rely on ad-hoc combinations of pixel-level losses, resulting in blurry renderings. To address this, we systematically explore perceptual optimization strategies for 3DGS by searching over a diverse set of distortion losses. We conduct the first-of-its-kind large-scale human subjective study on 3DGS, involving 39,320 pairwise ratings across several datasets and 3DGS frameworks. A regularized version of Wasserstein Distortion, which we call WD-R, emerges as the clear winner, excelling at recovering fine textures without incurring a higher splat count. WD-R is preferred by raters more than 2.3× over the original 3DGS loss, and 1.5× over current best method Perceptual-GS. WD-R also consistently achieves state-of-the-art LPIPS, DISTS, and FID scores across various datasets, and generalizes across recent frameworks, such as Mip-Splatting and Scaffold-GS, where replacing the original loss with WD-R consistently enhances perceptual quality within a similar resource budget (number of splats for Mip-Splatting, model size for Scaffold-GS), and leads to reconstructions being preferred by human raters 1.8× and 3.6×, respectively. We also find that this carries over to the task of 3DGS scene compression, with ≈50% bitrate savings for comparable perceptual metric performance.
Figure 1: 3DGS representation and compression frameworks optimized using 2D distortion and rate-distortion objectives, incorporating perceptual losses as part of the training framework.
Figure 2: Bayesian Elo scores for 3DGS representation methods across indoor scenes (Deep Blending, Mip-NeRF 360 indoor), outdoor scenes (Tanks & Temples, Mip-NeRF 360 outdoor, and BungeeNeRF), and all scenes combined. WD-R and WD achieve the highest scores in all settings (within the 95% confidence interval).
HUGS: Human Gaussian Splats
December 7, 2023research area Computer Vision
Recent advances in neural rendering have improved both training and rendering times by orders of magnitude. While these methods demonstrate state-of-the-art quality and speed, they are designed for photogrammetry of static scenes and do not generalize well to freely moving humans in the environment. In this work, we introduce Human Gaussian Splats (HUGS) that represents an animatable human together with the scene using 3D Gaussian Splatting…
Coarse-to-fine Optimization for Speech Enhancement
August 21, 2019research area Speech and Natural Language Processingconference Interspeech
In this paper, we propose the coarse-to-fine optimization for the task of speech enhancement. Cosine similarity loss [1] has proven to be an effective metric to measure similarity of speech signals. However, due to the large variance of the enhanced speech with even the same cosine similarity loss in high dimensional space, a deep neural network learnt with this loss might not be able to predict enhanced speech with good quality. Our…