Den 5/1-2026 kommer GUPEA att vara otillgängligt för alla under hela dagen.
Enhancing NLU with Paraphrasing in Task-Oriented Dialogue Systems Toward Data-Efficient Generalization in Low-Resource Scenarios
Abstract
In task-oriented dialogue systems (TODS), Natural Language Understanding (NLU) is
fundamental to interpreting user intent and extracting semantic information. However,
training robust NLU models often requires large amounts of annotated data, which is
costly and difficult to obtain in many real-world domains. While large language models
(LLMs) have demonstrated impressive zero- and few-shot performance, their deployment
in NLU pipelines remains constrained by high inference cost, cloud dependency, and lack
of controllability—especially in latency-sensitive or privacy-critical settings.
This thesis investigates an alternative approach: using LLMs as offline paraphrase generators
to augment training data for lightweight NLU models. We evaluate this method
across three core NLU tasks—domain classification, intent recognition, and slot labeling—
on two benchmark datasets: Banking77 and MultiWOZ 2.2. Results show that
paraphrasing improves model generalization under data-scarce conditions, particularly
for intent recognition. For slot labeling, however, performance gains are limited due to
span misalignment in paraphrased utterances.
Our slot-level analysis reveals that categorical slots (e.g., food type, price range) are more
robust to paraphrasing than span-sensitive slots (e.g., time, location). These findings
suggest that LLM-based paraphrasing is a practical and resource-efficient augmentation
strategy, but its success depends on task granularity, label structure, and alignment
fidelity.
Degree
Student essay
Collections
View/ Open
Date
2025-10-14Author
Xue, Xiumei
Keywords
large language models (LLMs), natural language understanding (NLU), data augmentation, task-oriented dialogue systems
Language
eng