Discriminating Form and Meaning in Multilingual Models with Minimal-Pair ABX Tasks

AuthorsMaureen de Seyssel*, Jie Chi*, Skyler Seto, Maartje ter Hoeve, Masha Fedzechkina, Natalie Schluter

We introduce a set of training-free ABX-style discrimination tasks to evaluate how multilingual language models represent language identity (form) and semantic content (meaning). Inspired from speech processing, these zero-shot tasks measure whether minimal differences in representation can be reliably detected. This offers a flexible and interpretable alternative to probing. Applied to XLM-R (Conneau et al, 2020) across pretraining checkpoints and layers, we find that language discrimination declines over training and becomes concentrated in lower layers, while meaning discrimination strengthens over time and stabilizes in deeper layers. We then explore probing tasks, showing some alignment between our metrics and linguistic learning performance. Our results position ABX tasks as a lightweight framework for analyzing the structure of multilingual representations.

*Equal Contributors

Related readings and updates.

Do Large Language Models Have an English Accent? Evaluating and Improving the Naturalness of Multilingual LLMs

May 16, 2025research area Speech and Natural Language Processingconference ACL

Current Large Language Models (LLMs) are predominantly designed with English as the primary language, and even the few that are multilingual tend to exhibit strong English-centric biases. Much like speakers who might produce awkward expressions when learning a second language, LLMs often generate unnatural outputs in non-English languages, reflecting English-centric patterns in both vocabulary and grammar. Despite the importance of this issue,…

Improving Language Identification for Multilingual Speakers

January 29, 2020research area Speech and Natural Language Processingconference ICASSP

Spoken language identification (LID) technologies have improved in recent years from discriminating largely distinct languages to discriminating highly similar languages or even dialects of the same language. One aspect that has been mostly neglected, however, is discrimination of languages for multilingual speakers, despite being a primary target audience of many systems that utilize LID technologies. As we show in this work, LID systems can…

Discriminating Form and Meaning in Multilingual Models with Minimal-Pair ABX Tasks

Related readings and updates.

Do Large Language Models Have an English Accent? Evaluating and Improving the Naturalness of Multilingual LLMs

Improving Language Identification for Multilingual Speakers

Discover opportunities in Machine Learning.