• English
    • svenska
  • English 
    • English
    • svenska
  • Login
View Item 
  •   Home
  • Student essays / Studentuppsatser
  • Department of Philosophy,Lingustics and Theory of Science / Institutionen för filosofi, lingvistik och vetenskapsteori
  • Masteruppsatser / Master in Language Technology
  • View Item
  •   Home
  • Student essays / Studentuppsatser
  • Department of Philosophy,Lingustics and Theory of Science / Institutionen för filosofi, lingvistik och vetenskapsteori
  • Masteruppsatser / Master in Language Technology
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Drawing with a social robot: Evaluating a vision-language model on spatial prepositions in an L2 drawing-based learning task

Drawing with a social robot: Evaluating a vision-language model on spatial prepositions in an L2 drawing-based learning task

Abstract
This study develops and evaluates a multimodal pipeline for studying interactive drawing in a robot-assisted language learning scenario with human participants. A social robot is paired with a vision-language model (VLM) that interprets freehand drawings. The system processes the sketches as they are drawn by the participant, capturing an image every five seconds. At each capture it detects objects and spatial relations and integrates this information into the robot’s verbal feedback, which evaluates the drawing’s correctness. A Wizard of Oz (WoZ) manual evaluation guides this feedback in real time and guards the human-robot interaction against model errors. Correctness of the drawings is evaluated at two levels: the entire sketch and its individual components (objects and relations). The study also examines how the agent’s outward appearance and the type of feedback influence participants’ perceptions of the interaction during the drawing tasks. Findings indicate that VLM captures objects and the spatial relations between them with moderate accuracy, while its behaviour remains unstable across drawing stages. In practice, the model could function well as an initial filter, but participants’ guidance still requires human judgement. Appearance of the agent seems to be more important than the feedback but within the existing sample we cannot confirm a consistent advantage for either factor. With minor adaptations, the pipeline can be reused by researchers to explore other language pairs (beyond the current Greek-English setup), additional spatial prepositions and alternative object sets. Future work will focus on expanding the study and refining the model.
Degree
Student essay
URI
https://hdl.handle.net/2077/90101
Collections
  • Masteruppsatser / Master in Language Technology
View/Open
Master thesis (3.764Mb)
Date
2025-11-06
Author
Daniilidou, Viktoria Paraskevi
Keywords
drawing, vision and language models, social robot, spatial prepositions, language learning
Language
eng
Metadata
Show full item record

DSpace software copyright © 2002-2016  DuraSpace
Contact Us | Send Feedback
Theme by 
Atmire NV
 

 

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

LoginRegister

DSpace software copyright © 2002-2016  DuraSpace
Contact Us | Send Feedback
Theme by 
Atmire NV