Self-reflective Uncertainties: Do LLMs Know Their Internal Answer Distribution?

AuthorsMichael Kirchhof, Luca Füger**, Adam Golinski, Eeshan Gunesh Dhekane, Arno Blaas, Sinead Williamson

This paper was accepted at the Workshop on Reliable and Responsible Foundation Models (RRFMs) Workshop at ICML 2025.

Uncertainty quantification plays a pivotal role when bringing large language models (LLMs) to end-users. Its primary goal is that an LLM should indicate when it is unsure about an answer it gives. While this has been revealed with numerical certainty scores in the past, we propose to use the rich output space of LLMs, the space of all possible strings, to give a string that describes the uncertainty. In particular, we seek a string that describes the distribution of LLM answers to a question. To measure this, we first propose the SelfReflect distance between a string and a distribution of strings. We verify that it works as intended, and then apply it to study how well modern LLMs can summarize their thoughts, either after sampling responses or even without sampling, just by chain-of-thoughts reasoning.

** Work done while at Apple

Related readings and updates.

April 8, 2025research area Speech and Natural Language Processingconference ICLR

Large language models (LLMs) could be valuable personal AI agents across various domains, provided they can precisely follow user instructions. However, recent studies have shown significant limitations in LLMs’ instruction-following capabilities, raising concerns about their reliability in high-stakes applications. Accurately estimating LLMs’ uncertainty in adhering to instructions is critical to mitigating deployment risks. We present, to our...

July 24, 2019research area Speech and Natural Language Processing

Many language-related tasks, such as entering text on your iPhone, discovering news articles you might enjoy, or finding out answers to questions you may have, are powered by language-specific natural language processing (NLP) models. To decide which model to invoke at a particular point in time, we must perform language identification (LID), often on the basis of limited evidence, namely a short character string. Performing reliable LID is more critical than ever as multi-lingual input is becoming more and more common across all Apple platforms. In most writing scripts — like Latin and Cyrillic, but also including Hanzi, Arabic, and others — strings composed of a few characters are often present in more than one language, making reliable identification challenging. In this article, we explore how we can improve LID accuracy by treating it as a sequence labeling problem at the character level, and using bi-directional long short-term memory (bi-LSTM) neural networks trained on short character sequences. We observed reductions in error rates varying from 15% to 60%, depending on the language, while achieving reductions in model size between 40% and 80% compared to previously shipping solutions. Thus the LSTM LID approach helped us identify language more correctly in features such as QuickType keyboards and Smart Responses, thereby leading to better auto-corrections, completions, and predictions, and ultimately a more satisfying user experience. It also made public APIs like the Natural Language framework more robust to multi-lingual environments.

Self-reflective Uncertainties: Do LLMs Know Their Internal Answer Distribution?

Related readings and updates.

Do LLMs Estimate Uncertainty Well in Instruction-Following?

Language Identification from Very Short Strings

Discover opportunities in Machine Learning.