Multilingual Reasoning Gym: Multilingual Scaling of Procedural Reasoning Environments
AuthorsKonstantin Dobler†**‡, Simon Lehnerer‡, Federico Scozzafava, Jonathan Janke, Mohamed Ali
Multilingual Reasoning Gym: Multilingual Scaling of Procedural Reasoning Environments
AuthorsKonstantin Dobler†**‡, Simon Lehnerer‡, Federico Scozzafava, Jonathan Janke, Mohamed Ali
We present the Multilingual Reasoning Gym, an extension of Reasoning Gym (Stojanovski et al., 2025), that procedurally generates verifiable reasoning problems across 14 languages. We translate templates for 94 tasks with native-speaker validation in 10 languages and targeted code or template adaptations to ensure linguistic naturalness. The Multilingual Reasoning Gym preserves the core benefits of the procedural generation approach used in the original Reasoning Gym, such as virtually unlimited problem instance generation and adjustable difficulty, and remains directly usable for Reinforcement Learning from Verifiable Rewards and evaluation settings. Problems in the Multilingual Reasoning Gym are parallel across languages, enabling crosslingually parallel data generation at massive scale due to the procedural nature of the environments. We release our implementation to support research into multilingual reasoning models.
mAceReason-Math: A Dataset of High-Quality Multilingual Math Problems Ready For RLVR
March 13, 2026research area Speech and Natural Language Processing
Reinforcement Learning with Verifiable Rewards (RLVR) has been successfully applied to significantly boost the capabilities of pretrained large language models, especially in the math and logic problem domains. However, current research and available training datasets remain English-centric. While multilingual training data and benchmarks have been created in the past, they were not created with RLVR and current model capability in mind, and…
Training Software Engineering Agents and Verifiers with SWE-Gym
October 16, 2025research area Speech and Natural Language Processingconference ICML
We present SWE-Gym, the first environment for training real-world software engineering (SWE) agents. SWE-Gym contains 2,438 real-world Python task instances, each comprising a codebase with an executable runtime environment, unit tests, and a task specified in natural language. We use SWE-Gym to train language model based SWE agents, achieving up to 19% absolute gains in resolve rate on the popular SWE-Bench Verified and Lite test sets. We also…