Den 5/1-2026 kommer GUPEA att vara otillgängligt för alla under hela dagen.
Consistent Question Answering via Knowledge Graph Querying
Abstract
Large Language Models (LLMs) have demonstrated impressive generative capabilities, but their tendency to produce hallucinated or inconsistent outputs poses challenges in fact-based applications. To address this, Retrieval-Augmented Generation (RAG) systems enhance LLMs with external knowledge sources, such as Knowledge Graphs (KGs), to ground responses in structured facts. While prior work has largely focused on generation quality, this thesis investigates the reliability of the retriever component in KG-augmented RAG pipelines. Specifically, it examines whether LLMs can consistently translate paraphrased natural language questions into accurate structured queries in Cypher, a popular query language for KGs.
Using the ParaRel* benchmark, which provides multiple paraphrased variants of factual questions, we evaluate both factual accuracy and consistency, defined as the model’s ability to produce stable outputs across linguistic rephrasings. To support
this analysis, we construct a property graph from the T-REx dataset and implement a retrieval pipeline using the Text2CypherRetriever framework. Multiple model configurations are tested, varying language model type, fine-tuning status, and provided few-shot prompt examples.
Our results highlight a significant decrease in performance for KG-based retrieval compared to text-based baselines, especially in paraphrase consistency, which drops by approximately 31 percentage points, and accuracy, which declines by around 11 percentage points, both with increased variability. However, certain configurations of the retriever show strong performance for specific relation types, indicating the potential for targeted improvements. This work contributes to a deeper understanding
of the challenges and opportunities in aligning natural language questions with structured knowledge representations for robust factual retrieval. In particular, we find that retrieval quality hinges on precise alignment between the phrasing of the question and the structure of the KG, highlighting a key limitation: even minor variations in surface form can dramatically affect whether the correct relation is matched. These insights point to the need for schema-aware prompt engineering and more robust query translation in future KG-RAG systems.
Degree
Student essay
Collections
View/ Open
Date
2025-10-07Author
J. Drexler, Martin
M. Helmstad, Johan
Keywords
Knowledge Graphs
Retrieval-Augmented Generation
Paraphrase Consistency
Cypher
Neo4j
Language Models
Structured Query Generation
Language
eng