In this paper, we approach the problem of uncertainty quantification in deep learning through a predictive framework, which captures uncertainty in model parameters by specifying our assumptions about the predictive distribution of unseen future data. Under this view, we show that deep ensembling (Lakshminarayanan et al., 2017) is a fundamentally mis-specified model class, since it assumes that future data are supported on existing observations only -- a situation rarely encountered in practice. To address this limitation, we propose MixupMP, a method that constructs a more realistic predictive distribution using popular data augmentation techniques. MixupMP operates as a drop-in replacement for deep ensembles, where each ensemble member is trained on a random simulation from this predictive distribution. Grounded in the recently-proposed framework of Martingale posteriors (Fong et al., 2023), MixupMP returns samples from an implicitly defined Bayesian posterior. Our empirical analysis showcases that MixupMP achieves superior predictive performance and uncertainty quantification on various image classification datasets, when compared with existing Bayesian and non-Bayesian approaches.

Related readings and updates.

This paper was accepted at the Workshop on Reliable and Responsible Foundation Models (RRFMs) Workshop at ICML 2025. Uncertainty quantification plays a pivotal role when bringing large language models (LLMs) to end-users. Its primary goal is that an LLM should indicate when it is unsure about an answer it gives. While this has been revealed with numerical certainty scores in the past, we propose to use the rich output space of LLMs, the space of…
Read more
This paper was accepted at the Safe Generative AI Workshop (SGAIW) 2024 at NeurIPS 2024. Uncertainty quantification (UQ) is crucial for ensuring the safe deployment of large language model, particularly in high-stakes applications where hallucinations can be harmful. However, existing UQ methods often demand substantial computational resources, e.g., multi-sample methods such as Semantic Entropy (Kuhn et al., 2023) usually require 5-10 inference…
Read more