Flexible Routing via Uncertainty Decomposition
AuthorsCharlotte Peale†**, Siddartha Devic‡**, Parikshit Gopalan, Udi Wieder, Aravind Gollakota
Flexible Routing via Uncertainty Decomposition
AuthorsCharlotte Peale†**, Siddartha Devic‡**, Parikshit Gopalan, Udi Wieder, Aravind Gollakota
This paper was accepted at the Statistical Frameworks for Uncertainty in Agentic Systems Workshop at ICML 2026.
A key strategy for balancing performance and cost in modern machine learning systems is to dynamically route queries to either a low-cost model or a more expensive oracle (such as a large pretrained model or human expert), an approach known as model routing. In this work we present a new uncertainty-aware router that (1) avoids unnecessary oracle calls on inherently ambiguous queries, and (2) adapts dynamically to different loss functions and cost parameters through simple hyperparameter changes, without retraining. Our method, applicable to any classification setting where multiple independent annotations per input are available, is based on decomposing total uncertainty into irreducible and reducible components using higher-order predictors [Ahdritz et al., 2025]. This enables a unified approach to both routing and abstention: predict with the weak model when uncertainty is low, route to the oracle when reducible uncertainty is high, and abstain when irreducible uncertainty is high. Our router comes with strong theoretical guarantees bounding regret relative to optimal task-specific routers. We conduct experiments on both synthetic and real-world datasets that demonstrate the benefits of our approach in suitable regimes—in particular, whenever reducible and irreducible uncertainty are not too correlated.
Omni-Router: Sharing Routing Decisions in Sparse Mixture-of-Experts for Speech Recognition
July 11, 2025research area Methods and Algorithms, research area Speech and Natural Language Processingconference ASRU
Mixture-of-experts (MoE) architectures have expanded from language modeling to automatic speech recognition (ASR). Traditional MoE methods, such as the Switch Transformer, route experts independently within each layer. Our analysis reveals that routers in most layers make expert choices that are not strongly correlated with the choices of the routers in other layers. To increase the cooperation between experts in different layers and encourage…
Capsules with Inverted Dot-Product Attention Routing
February 12, 2020research area Methods and Algorithmsconference ICLR
We introduce a new routing algorithm for capsule networks, in which a child capsule is routed to a parent based only on agreement between the parent’s state and the child’s vote. The new mechanism 1) designs routing via inverted dot-product attention; 2) imposes Layer Normalization as normalization; and 3) replaces sequential iterative routing with concurrent iterative routing. When compared to previously proposed routing algorithms, our method…