Why the pond is not outside the frog? Grounding in contextual representations by neural language models
Why the pond is not outside the frog?
Grounding in contextual representations by neural language models
Abstract
In this thesis, to build a multi-modal system for language generation and understanding, we study grounded neural language models.
Literature in psychology informs us that spatial cognition involves different aspects of knowledge that include visual perception and human interaction with the world. This makes spatial descriptions a compelling case for the study of how spatial language is grounded in different kinds of knowledge.
In seven studies, we investigate what and how neural language models (NLM) encode spatial knowledge.
In the first study, we explore the traces of functional-geometric distinction of spatial relations in uni-modal NLM.
This distinction is essential since the knowledge about object-specific relations is not grounded in the visible situation.
Following that, in the second study, we inspect representations of spatial relations in a uni-modal NLM to understand how they capture the concept of space from the corpus.
The predictability of grounding spatial relations from contextual embeddings is vital for the evaluation of grounding in multi-modal language models.
On the argument for the geometric meaning, in the third study, we inspect the spectrum of bounding box annotations on image descriptions.
We show that less geometrically biased spatial relations are more likely to deviate from the norm of their bounding box features.
In the fourth study, we try to evaluate the degree of grounding in language and vision with adaptive attention.
In the fifth study, we use adaptive attention to understand if and how additional bounding box geometric information could improve the generation of relational image descriptions.
In the sixth study, we ask if the language model has an ability of systematic generalisation to learn the grounding on the unseen composition of representations.
Then in the seventh study, we show the potentials in using uni-modal knowledge for detecting metaphors in adjective-nouns compositions.
The primary argument of the thesis is built on the fact that spatial expressions in natural language are not always grounded in direct interpretations of the locations.
We argue that distributional knowledge from corpora of language use and their association with visual features constitute grounding with neural language models.
Therefore, in a joint model of vision and language, the neural language model provides spatial knowledge that is contextualising the visual representations about locations.
Parts of work
Dobnik, S., Ghanimifard, M., & Kelleher, J. (2018). Exploring the Functional and Geometric Bias of Spatial Relations Using Neural Language Models. In Proceedings of the First International Workshop on Spatial Language Understanding (pp. 1-11). ::doi:: 10.18653/v1/W18-1401 Ghanimifard, M., & Dobnik, S. (2019). What a neural language model tells us about spatial relations. In Proceedings of the Combined Workshop on Spatial Language Understanding (SpLU) and Grounded Communication for Robotics (RoboNLP) (pp. 71-81). ::doi:: 10.18653/v1/W19-1608 Dobnik, S., Ghanimifard, M. (2020). Spatial descriptions on a functional-geometric spectrum: the location of objects. Accepted in Spatial Cognition XII,
Papers from 12th International Conference, Spatial Cognition 2020/21, Riga, Latvia. Ghanimifard, M., & Dobnik, S. (2018). Knowing When to Look for What and Where: Evaluating Generation of Spatial Descriptions with Adaptive Attention. In European Conference on Computer Vision (pp. 153-161). Springer, Cham. ::doi:: 10.1007/978-3-030-11018-5_14 Ghanimifard, M., & Dobnik, S. (2019). What goes into a word: generating image descriptions with top-down spatial knowledge. In Proceedings of the 12th International Conference on Natural Language Generation (pp. 540-551). ::doi:: 10.18653/v1/W19-8668 Ghanimifard, M., & Dobnik, S. (2017). Learning to Compose Spatial Relations with Grounded Neural Language Models. In IWCS 2017-12th International Conference on Computational Semantics-Long papers. Bizzoni, Y., Chatzikyriakidis, S., & Ghanimifard, M. (2017, September). “Deep” Learning: Detecting Metaphoricity in Adjective-Noun Pairs. In Proceedings of the Workshop on Stylistic Variation (pp. 43-52). ::doi:: 10.18653/v1/W17-4906
Degree
Doctor of Philosophy
University
Göteborgs universitet. Humanistiska fakulteten
University of Gothenburg. Faculty of Arts
Institution
Department of Philosophy, Linguistics and Theory of Science ; Institutionen för filosofi, lingvistik och vetenskapsteori
Disputation
27 maj 2020, kl 15:15, Lilla Hörsalen, C350, Humanisten, Renströmsgatan 6. https://gu-se.zoom.us/j/63108152441?pwd=UDV1NytSM1RuNXE4ZWFieHlyOURxQT09
Date of defence
2020-05-27
mehdi.ghanimifard@gu.se
mehdi.ghanimifard@gmail.com
mmehdi.g@gmail.com
Date
2020-05-05Author
Ghanimifard, Mehdi
Keywords
Computational linguistics
Language grounding
Spatial language
Distributional semantics
Computer vision
Language modelling
Vision and language
Neural language model
Grounded language model
Publication type
Doctoral thesis
ISBN
978-91-7833-917-4
978-91-7833-916-7
Language
eng
Metadata
Show full item recordRelated items
Showing items related by title, author, creator and subject.
-
Steg för steg. Naturvetenskapligt ämnesspråk som räknas
Ribeck, Judy (2015-11-13)In this work, I present a linguistic investigation of the language of Swedish textbooks in the natural sciences, i.e., biology, physics and chemistry. The textbooks, which are used in secondary and upper secondary school, ... -
LIVE and LEARN - Festschrift in honor of Lars Borin
Volodina, Elena; Dannélls, Dana; Berdicevskis, Aleksandrs; Forsberg, Markus; Virk, Shafqat; Institutionen för svenska, flerspråkighet och språkteknologi, Göteborgs universitet (2022-11)This Festschrift has been compiled to honor Professor Lars Borin on his 65th anniversary. It consists of 30 articles which reflect a fraction of Lars’ scholarly interests within computational linguistics and related fields. ... -
Proceedings of the 2022 CLASP Conference on (Dis)embodiment
Dobnik, Simon; Grove, Julian; Sayeed, Asad; Department of Philosophy, Linguistics and Theory of Science (FLoV); Centre for Linguistic Theory and Studies in Probability (CLASP) (The Association for Computational Linguistics, 2022-09-14)Dis)embodiment brings together researchers from several areas examining the role of grounding and embodiment in modelling human language and behaviour – or limits thereof. The conference covers areas such as machine learning, ...