FLEEK: Factual Error Detection and Correction with Evidence Retrieved from External Knowledge
AuthorsFarima Fatahi Bayat, Kun Qian, Benjamin Han, Yisi Sang, Anton Belyi, Samira Khorshidi, Fei Wu, Ihab Ilyas, Yunyao Li
FLEEK: Factual Error Detection and Correction with Evidence Retrieved from External Knowledge
AuthorsFarima Fatahi Bayat, Kun Qian, Benjamin Han, Yisi Sang, Anton Belyi, Samira Khorshidi, Fei Wu, Ihab Ilyas, Yunyao Li
Large language models’ inability to attribute their claims to external knowledge and their tendency to hallucinate makes it difficult to trust their responses. Even humans are prone to factual errors in their writing. Therefore verifying the factual accuracy of textual information, whether generated by large language models or curated by humans, is an important task. However, manually validating and correcting factual errors tends to be a tedious and labor-intensive process. In this paper, we propose FLEEK for automatic fact verification and correction. FLEEK automatically extracts factual cliams within the text, retrieves relevant evidence for each claim from various sources of external knowledge, and then evaluates the factual status for each claim based on the retrieved evidence. The system also automatically corrects detected factual errors in claims based on the retrieved evidence. Experiments show that FLEEK is able to exhaustively extract factual claims, correctly determine their factual status, and propose meaningful corrections based on the evidence retrieved.
Can External Validation Tools Improve Annotation Quality for LLM-as-a-Judge?
July 25, 2025research area Speech and Natural Language Processing, research area Tools, Platforms, Frameworksconference ACL
Pairwise preferences over model responses are widely collected to evaluate and provide feedback to large language models (LLMs). Given two alternative model responses to the same input, a human or AI annotator selects the “better” response. Such data can provide a feedback signal in domains where traditional hard-coded metrics are difficult to obtain (e.g. quality of a chat interactions), thereby helping measure model progress or model…
KGLens: Towards Efficient and Effective Knowledge Probing of Large Language Models with Knowledge Graphs
August 6, 2024research area Knowledge Bases and Search, research area Speech and Natural Language Processingconference ACL
This paper was accepted at the Workshop Towards Knowledgeable Language Models at ACL 2024.
Large Language Models (LLMs) might hallucinate facts, while curated Knowledge Graph (KGs) are typically factually reliable especially with domain-specific knowledge. Measuring the alignment between KGs and LLMs can effectively probe the factualness and identify the knowledge blind spots of LLMs. However, verifying the LLMs over extensive KGs can be…