Improving Human Annotation Effectiveness for Fact Collection by Identifying the Most Relevant Answers
AuthorsPranav Kamath, Yiwen Sun, Thomas Semere, Adam Green, Scott Manley, Xiaoguang Qi, Kun Qian, Yunyao Li, and Mina Farid
Improving Human Annotation Effectiveness for Fact Collection by Identifying the Most Relevant Answers
AuthorsPranav Kamath, Yiwen Sun, Thomas Semere, Adam Green, Scott Manley, Xiaoguang Qi, Kun Qian, Yunyao Li, and Mina Farid
This paper was accepted at the Workshops on Data Science with Human in the Loop at EMNLP 2022
Identifying and integrating missing facts is a crucial task for knowledge graph completion to ensure robustness towards downstream applications such as question answering. Adding new facts to a knowledge graph in real world system often involves human verification effort, where candidate facts are verified for accuracy by human annotators. This process is labor-intensive, time-consuming, and inefficient since only a small number of missing facts can be identified. This paper proposes a simple but effective human-in-the-loop framework for fact collection that searches for a diverse set of highly relevant candidate facts for human annotation. Empirical results presented in this work demon- strate that the proposed solution leads to both improvements in i) the quality of the candidate facts as well as ii) the ability of discovering more facts to grow the knowledge graph without requiring additional human effort.
ODKE+: Ontology-Guided Open-Domain Knowledge Extraction with LLMs
October 27, 2025research area Knowledge Bases and Search, research area Speech and Natural Language Processing
Knowledge graphs (KGs) are foundational to many AI applications, but maintaining their freshness and completeness remains costly. We present ODKE+, a production-grade system that automatically extracts and ingests millions of open-domain facts from web sources with high precision. ODKE+ combines modular components into a scalable pipeline: (1) the Extraction Initiator detects missing or stale facts, (2) the Evidence Retriever collects supporting…
Growing and Serving Large Open-domain Knowledge Graphs
June 2, 2023research area Knowledge Bases and Search, research area Speech and Natural Language Processingconference SIGMOD
*= Equal Contributors
Applications of large open-domain knowledge graphs (KGs) to real-world problems pose many unique challenges. In this paper, we present extensions to Saga our platform for continuous construction and serving of knowledge at scale. In particular, we describe a pipeline for training knowledge graph embeddings that powers key capabilities such as fact ranking, fact verification, a related entities service, and support for…