Exploring Moderation Consistency and Relaxed Think-Aloud in AI-Moderated Usability Studies
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
This thesis explores how a large language model (GPT-4o) can be trained and evaluated as a professional moderator in usability testing sessions conducted using the relaxed think-aloud protocol. While large language models (LLMs) are often employed in assistive roles like customer service or education, this study investigates their potential to adopt a non-directive and neutral tone suitable for research moderation. Informed by principles from usability studies facilitation concept, GPT-4o was trained through prompt-based customization to emulate human moderation strategies, including open-ended questioning, tone neutrality, and sensitivity to user pacing. Nine usability sessions were conducted on an e-commerce website, and AI–user dialogues were transcribed and coded using a reflexive thematic framework. The analysis focused on AI prompt style, timing, and tone, with user responses categorized into three verbalization levels (real-time observations, interpretations, and elaborations). Findings reveal that GPT-4o can sustain Level 1–3 verbalizations through non-leading moderation techniques, though occasional lapses into directive or overly affirming tones were observed. This study contributes to emerging research on AI-moderated usability testing and demonstrates how prompt-engineered LLMs can approach human-like moderation, offering insights into both the design and evaluation of conversational agents in research contexts.