Analyzing the Effect of Linguistic Similarity on Cross-Lingual Transfer: Tasks and Input Representations Matter

AuthorsVerena Blaschke†‡§, Masha Fedzechkina, Maartje ter Hoeve

Cross-lingual transfer is a popular approach to increase the amount of training data for NLP tasks in a low-resource context. However, the best strategy to decide which cross-lingual data to include is unclear. Prior research often focuses on a small set of languages from a few language families or a single task. It is still an open question how these findings extend to a wider variety of languages and tasks. In this work, we contribute to this question by analyzing cross-lingual transfer for 263 languages from a wide variety of language families. Moreover, we include three popular NLP tasks in our analysis: POS-tagging, dependency parsing, and topic classification. Our findings indicate that the effect of linguistic similarity on transfer performance depends on a range of factors: the NLP task, the (mono- or multilingual) input representations, and the definition of linguistic similarity.

† Work done while at Apple
‡ LMU Munich
§ Munich Center for Machine Learning

Related readings and updates.

Steering into New Embedding Spaces: Analyzing Cross-Lingual Alignment Induced by Model Interventions in Multilingual Language Models

July 25, 2025research area Methods and Algorithms, research area Speech and Natural Language Processingconference ACL

Aligned representations across languages is a desired property in multilingual large language models (mLLMs), as alignment can improve performance in cross-lingual tasks. Typically alignment requires fine-tuning a model, which is computationally expensive, and sizable language data, which often may not be available. A data-efficient alternative to fine-tuning is model interventions — a method for manipulating model activations to steer…

Languages You Know Influence Those You Learn: Impact of Language Characteristics on Multi-Lingual Text-to-Text Transfer

December 11, 2022research area Speech and Natural Language ProcessingWorkshop at NeurIPS

Multi-lingual language models (LM), such as mBERT, XLM-R, mT5, mBART, have been remarkably successful in enabling natural language tasks in low-resource languages through cross-lingual transfer from high-resource ones. In this work, we try to better understand how such models, specifically mT5, transfer any linguistic and semantic knowledge across languages, even though no explicit cross-lingual signals are provided during pre-training. Rather,…

Analyzing the Effect of Linguistic Similarity on Cross-Lingual Transfer: Tasks and Input Representations Matter

Related readings and updates.

Steering into New Embedding Spaces: Analyzing Cross-Lingual Alignment Induced by Model Interventions in Multilingual Language Models

Languages You Know Influence Those You Learn: Impact of Language Characteristics on Multi-Lingual Text-to-Text Transfer

Discover opportunities in Machine Learning.