Towards a World-English Language Model

AuthorsRricha Jalota, Lyan Verwimp, Markus Nussbaum-Thom, Amr Mousa, Arturo Argueta, Youssef Oualil

Neural Network Language Models (NNLMs) of Virtual Assistants (VAs) are generally language-, region-, and in some cases, device-dependent, which increases the effort to scale and maintain them. Combining NNLMs for one or more of the categories could be one way to improve scalability. In this work, we combine regional variants of English by building a “World English” NNLM. We examine three data sampling techniques and we experiment with adding adapter bottlenecks to the existing production NNLMs to model dialect-specific characteristics and investigate different strategies to train adapters. We find that adapter modules are more effective in modeling dialects than specialized sub-networks containing a set of feedforward layers. Our experimental results show that adapter-based architectures can achieve up to 4.57% Word Error Rate (WER) reduction over single-dialect baselines on head-heavy test sets and up to 8.22% on tail entities.

Related readings and updates.

Analyzing Dialectical Biases in LLMs for Knowledge and Reasoning Benchmarks

October 9, 2025research area Fairness, research area Speech and Natural Language Processingconference EMNLP

Large language models (LLMs) are ubiquitous in modern day natural language processing. However, previous work has shown degraded LLM performance for under-represented English dialects. We analyze the effects of typifying “standard” American English language questions as non-”standard” dialectal variants on multiple choice question answering tasks and find up to a 20% reduction in accuracy. Additionally, we investigate the grammatical basis of…

Training Large-Vocabulary Neural Language Model by Private Federated Learning for Resource-Constrained Devices

December 18, 2023research area Privacy, research area Speech and Natural Language Processingconference ICASSP

*Equal Contributors

Federated Learning (FL) is a technique to train models using data distributed across devices. Differential Privacy (DP) provides a formal privacy guarantee for sensitive data. Our goal is to train a large neural network language model (NNLM) on compute-constrained devices while preserving privacy using FL and DP. However, the DP-noise introduced to the model increases as the model size grows, which often prevents…

Towards a World-English Language Model

Related readings and updates.

Analyzing Dialectical Biases in LLMs for Knowledge and Reasoning Benchmarks

Training Large-Vocabulary Neural Language Model by Private Federated Learning for Resource-Constrained Devices

Discover opportunities in Machine Learning.