Evaluation of In-Context Retrieval Augmented Language Models for Factual Consistency

Loading...
Thumbnail Image

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Pre-trained large language models (LLMs) have shown remarkable performance in natural language processing (NLP) tasks, especially in question-answering. However, these models face challenges such as limited memory expansion, interpretability issues, and susceptibility to hallucinations. To address these limitations, Retrieval- Augmented Language Models (RALMs), which integrate parametric and non-parametric memory, have been proposed. These models use a retriever to access external knowledge bases, enhancing memory flexibility and interpretability. Although RALMs have been shown to outperform pre-trained parametric-only models in various knowledgeintensive NLP tasks, one caveat with RALMs studied in the majority of the previous research is that they rely on fine-tuning the retrieval-augment architectures to downstream NLP tasks, which can be costly and difficult. To address this challenge, Ram et al. (2023) have recently introduced a simpler alternative called In-Context RALM, which simply prepends retrieved documents to the input and feeds the input to existing pre-trained language models without any further fine-tuning. Considering the importance of predictions being not only accurate but also consistent, this study evaluates In-Context RALM’s effectiveness in prediction consistency compared to a parametric-only model (Llama-2-7B) and a fine-tuned RALM (Atlas). Results show that In-Context RALM produces more consistent predictions than the parametriconly model, demonstrating its capability to enhance consistency. Although it is less effective than the fine-tuned RALM (Atlas) in improving consistency, In-Context RALM remains a viable alternative when fine-tuning is impractical, particularly if retrieved contexts are relevant. However, its performance declines with irrelevant contexts, making it less robust in such scenarios compared to fine-tuned models. These findings highlight In-Context RALM’s potential to improve the robustness to be a more competitive alternative to fine-tuned RALMs.

Description

Keywords

NLP, RALM, In-Context RALM, RAG, information retrieval, retrievalaugmented generation, LLM

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By