Evaluation of In-Context Retrieval Augmented Language Models for Factual Consistency
Abstract
Pre-trained large language models (LLMs) have shown remarkable performance in
natural language processing (NLP) tasks, especially in question-answering. However,
these models face challenges such as limited memory expansion, interpretability issues,
and susceptibility to hallucinations. To address these limitations, Retrieval-
Augmented Language Models (RALMs), which integrate parametric and non-parametric
memory, have been proposed. These models use a retriever to access external knowledge
bases, enhancing memory flexibility and interpretability. Although RALMs
have been shown to outperform pre-trained parametric-only models in various knowledgeintensive
NLP tasks, one caveat with RALMs studied in the majority of the previous
research is that they rely on fine-tuning the retrieval-augment architectures to downstream
NLP tasks, which can be costly and difficult. To address this challenge, Ram
et al. (2023) have recently introduced a simpler alternative called In-Context RALM,
which simply prepends retrieved documents to the input and feeds the input to existing
pre-trained language models without any further fine-tuning. Considering the
importance of predictions being not only accurate but also consistent, this study
evaluates In-Context RALM’s effectiveness in prediction consistency compared to a
parametric-only model (Llama-2-7B) and a fine-tuned RALM (Atlas). Results show
that In-Context RALM produces more consistent predictions than the parametriconly
model, demonstrating its capability to enhance consistency. Although it is less
effective than the fine-tuned RALM (Atlas) in improving consistency, In-Context
RALM remains a viable alternative when fine-tuning is impractical, particularly if
retrieved contexts are relevant. However, its performance declines with irrelevant
contexts, making it less robust in such scenarios compared to fine-tuned models.
These findings highlight In-Context RALM’s potential to improve the robustness to
be a more competitive alternative to fine-tuned RALMs.
Degree
Student essay
Collections
View/ Open
Date
2024-10-16Author
UENO, YURA
Keywords
NLP
RALM
In-Context RALM
RAG
information retrieval
retrievalaugmented generation
LLM