Training Software Engineering Agents and Verifiers with SWE-Gym
AuthorsJiayi Pan*, Xingyao Wang*, Graham Neubig†, Navdeep Jaitly‡, Heng Ji§, Alane Suhr†, Yizhe Zhang‡
Training Software Engineering Agents and Verifiers with SWE-Gym
AuthorsJiayi Pan*, Xingyao Wang*, Graham Neubig†, Navdeep Jaitly‡, Heng Ji§, Alane Suhr†, Yizhe Zhang‡
We present SWE-Gym, the first environment for training real-world software engineering (SWE) agents. SWE-Gym contains 2,438 real-world Python task instances, each comprising a codebase with an executable runtime environment, unit tests, and a task specified in natural language. We use SWE-Gym to train language model based SWE agents, achieving up to 19% absolute gains in resolve rate on the popular SWE-Bench Verified and Lite test sets. We also experiment with inference-time scaling through verifiers trained on agent trajectories sampled from SWE-Gym. When combined with our fine-tuned SWE agents, we achieve 32.0% and 26.0% on SWE-Bench Verified and Lite, respectively, reflecting a new state-of-the-art for open-weight SWE agents. To facilitate further research, we publicly release SWE-Gym, models, and agent trajectories.
Multilingual Reasoning Gym: Multilingual Scaling of Procedural Reasoning Environments
March 13, 2026research area Speech and Natural Language Processing
We present the Multilingual Reasoning Gym, an extension of Reasoning Gym (Stojanovski et al., 2025), that procedurally generates verifiable reasoning problems across 14 languages. We translate templates for 94 tasks with native-speaker validation in 10 languages and targeted code or template adaptations to ensure linguistic naturalness. The Multilingual Reasoning Gym preserves the core benefits of the procedural generation approach used in the…
AgentBuilder: Exploring Scaffolds for Prototyping User Experiences of Interface Agents
January 9, 2026research area Human-Computer Interaction
Interface agents powered by generative AI models (referred to as “agents”) can automate actions based on user commands. An important aspect of developing agents is their user experience (i.e., agent experience). There is a growing need to provide scaffolds for a broader set of individuals beyond AI engineers to prototype agent experiences, since they can contribute valuable perspectives to designing agent experiences. In this work, we explore the…