An Exploration of Data Augmentation and Sampling Techniques for Domain-Agnostic Question Answering

AuthorsShayne Longpre, Yi Lu, Zhucheng Tu, Chris DuBois

This paper was accepted at the 2nd Workshop on Machine Reading for Question Answering at the EMNLP 2019 Conference.

To produce a domain-agnostic question answering model for the Machine Reading Question Answering (MRQA) 2019 Shared Task, we investigate the relative benefits of large pre-trained language models, various data sampling strategies, as well as query and context paraphrases generated by back-translation. We find a simple negative sampling technique to be particularly effective, even though it is typically used for datasets that include unanswerable questions, such as SQuAD 2.0. When applied in conjunction with per-domain sampling, our XLNet (Yang et al., 2019)-based submission achieved the second best Exact Match and F1 in the MRQA leaderboard competition

Related readings and updates.

February 28, 2022research area Speech and Natural Language Processing

The task of Outside Knowledge Visual Question Answering (OKVQA) requires an automatic system to answer natural language questions about pictures and images using external knowledge. We observe that many visual questions, which contain deictic referential phrases referring to entities in the image, can be rewritten as "non-grounded" questions and can be answered by existing text-based question answering systems. This allows for the reuse of...

July 30, 2020research area Knowledge Bases and Search, research area Speech and Natural Language Processing

Progress in cross-lingual modeling depends on challenging, realistic, and diverse evaluation sets. We introduce Multilingual Knowledge Questions and Answers (MKQA), an open-domain question answering evaluation set comprising 10k question-answer pairs aligned across 26 typologically diverse languages (260k question-answer pairs in total). The goal of this dataset is to provide a challenging benchmark for question answering quality across a wide...

An Exploration of Data Augmentation and Sampling Techniques for Domain-Agnostic Question Answering

Related readings and updates.

Can Open Domain Question Answering Models Answer Visual Knowledge Questions?

MKQA: A Linguistically Diverse Benchmark for Multilingual Open Domain Question Answering

Discover opportunities in Machine Learning.