Generating Gender Alternatives in Machine Translation
AuthorsSarthak Garg, Mozhdeh Gheini, Clara Emmanuel, Tatiana Likhomanenko, Qin Gao, Matthias Paulik
Generating Gender Alternatives in Machine Translation
AuthorsSarthak Garg, Mozhdeh Gheini, Clara Emmanuel, Tatiana Likhomanenko, Qin Gao, Matthias Paulik
This paper was accepted at the Workshop on Gender Bias in Natural Language Processing 2024.
Machine translation (MT) systems often translate terms with ambiguous gender (e.g., English term “the nurse”) into the gendered form that is most prevalent in the systems’ training data (e.g., “enfermera”, the Spanish term for a female nurse). This often reflects and perpetuates harmful stereotypes present in society. With MT user interfaces in mind that allow for resolving gender ambiguity in a frictionless manner, we study the problem of generating all grammatically correct gendered translation alternatives. We open source train and test datasets for five language pairs and establish benchmarks for this task. Our key technical contribution is a novel semi-supervised solution for generating alternatives that integrates seamlessly with standard MT models and maintains high performance without requiring additional components or increasing inference overhead.
ProText: A Benchmark Dataset for Measuring (Mis)gendering in Long-Form Texts
March 31, 2026research area Fairness, research area Speech and Natural Language Processing
We introduce ProText, a dataset for measuring gendering and misgendering in stylistically diverse long-form English texts. ProText spans three dimensions: Theme nouns (names, occupations, titles, kinship terms), Theme category (stereotypically male, stereotypically female, gender-neutral/non-gendered), and Pronoun category (masculine, feminine, gender-neutral, none). The dataset is designed to probe (mis)gendering in text transformations such as…
Improving How Machine Translations Handle Grammatical Gender Ambiguity
October 7, 2024research area Speech and Natural Language Processing
Machine Translation (MT) enables people to connect with others and engage with content across language barriers. Grammatical gender presents a difficult challenge for these systems, as some languages require specificity for terms that can be ambiguous or neutral in other languages. For example, when translating the English word “nurse” into Spanish, one must decide whether the feminine “enfermera” or the masculine “enfermero” is appropriate…