FINDING MEANING IN A HAYSTACK: On How Vision and Language Models Process Figurative Language
FINDING MEANING IN A HAYSTACK: On How Vision and Language Models Process Figurative Language
Abstract
Figurative language is an integral part of human communication and everyday life. As a Natural Language
Processing task it has long been the focus of attention in research, and recently it has been translated into
a vision and language task, where multi-modal models seem to outperform uni-modal ones. This thesis
explores how a vision and language transformer-based model, specifically VisualBERT, understands figurative
language -idioms, metaphors, and similes- and examines if its visual embeddings can be enhanced to
align better with figurative meaning. Understanding these alignments is critical for assessing whether these
models can truly grasp the abstract and symbolic layers of language, beyond surface-level pattern recognition.
Through a series of experiments and attention analysis, this research highlights both the potential and
limitations of a vision and language model, illuminating the broader challenges in grounding language to
visual contexts.
Degree
Student essay
Collections
View/ Open
Date
2024-11-28Author
Filippatou, Viktoria
Keywords
figurative language, vision, language, VisualBert
Language
eng