dc.contributor.author | Morger, Felix | |
dc.date.accessioned | 2024-11-18T10:28:08Z | |
dc.date.available | 2024-11-18T10:28:08Z | |
dc.date.issued | 2024-11-18 | |
dc.identifier.isbn | 978-91-8069-944-0 (PDF) | |
dc.identifier.isbn | 978-91-8069-943-3 (Print) | |
dc.identifier.uri | https://hdl.handle.net/2077/83731 | |
dc.description.abstract | The arrival of large language models (LLMs) in recent years has changed the landscape of natural language processing (NLP). Their impressive performance on popular benchmarks, ability to solve a range of different tasks and their human-like linguistic interactional abilities, have prompted a debate into whether these are just "stochastic parrots" who are cleverly repeating what humans say without understanding its meaning or whether they are acquiring essential language capabilities, which would be an important stepping stone towards artificial general intelligence.
To tackle this question, developing analysis methods to measure and understand the language capabilities of LLMs has become a defining challenge. These include developing benchmarks to reliably measure their performance as well and interpretability methods to gauge their inner-workings. This is especially relevant at a time when these models already are having a considerable impact on our society. An increasing amount users are affected by the technology and calls are made for transparent, regulated and thorough evaluation of AI. In these efforts, it is important to estimate the possibilities and limitations of these analysis methods since they will play an important role in holding technologies in AI accountable.
In this compilation thesis, I expound on the components and processes involved in analyzing LLMs. The articles included in this compilation thesis use different approaches for analyzing LLMs, from introducing a multi-task benchmark Superlim for Swedish NLU to investigating LLMs' ability to predict language variation. To this effort I explore what the possibilities and limitations are of popular analysis methods and what implications these have for developing LLMs. I argue that integrating explanatory approaches from empirical linguistic research is important to understand the role of both the data and the linguistic features used when analyzing LLMs. Doing so does not only help guide the development of LLMs, but also bring insights into linguistics. | sv |
dc.language.iso | eng | sv |
dc.relation.ispartofseries | 32 | sv |
dc.relation.haspart | Morger, Felix, Stephanie Brandl, Lisa Beinborn & Nora Hollenstein. 2022. A cross-lingual comparison of human and model relative word importance. In Simon Dobnik, Julian Grove & Asad Sayeed (eds.), Proceedings of the 2022 CLASP Conference on (Dis)embodiment, 11–23. Gothenburg: Association for Computational Linguistics. https://aclanthology.org/2022.clasp-1.2 | sv |
dc.relation.haspart | Morger, Felix. 2024. SweDiagnostics: A diagnostics natural language inference dataset for Swedish. In Pierre Zweigenbaum, Reinhard Rapp & Serge Sharoff (eds.), Proceedings of the 17th Workshop on Building and Using Comparable Corpora (BUCC) @ LREC-COLING 2024, 118–124. Torino: ELRA & ICCL.https://aclanthology.org/2024.bucc-1.13/ | sv |
dc.relation.haspart | Morger, Felix. 2023. Are there any limits to English-Swedish language transfer? A fine-grained analysis using natural language inference. In Proceedings of the Second Workshop on Resources and Representations for Under-resourced Languages and Domains (RESOURCEFUL-2023), 30–41. Tórshavn: Association for Computational Linguistics. https://aclanthology.org/2023.resourceful-1.5/ | sv |
dc.relation.haspart | Berdicevskis, Aleksandrs, Gerlof Bouma, Robin Kurtz, Felix Morger, Joey Öhman, Yvonne Adesam, Lars Borin, Dana Dannélls, Markus Forsberg, Tim Isbister, Anna Lindahl, Martin Malmsten, Faton Rekathati, Magnus Sahlgren, Elena Volodina, Love Börjeson, Simon Hengchen & Nina Tahmasebi. 2023. Superlim: A Swedish language understanding evaluation benchmark. In Houda Bouamor, Juan Pino & Kalika Bali (eds.), Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 8137–8153. Singapore: Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.emnlp-main.506 | sv |
dc.relation.haspart | Felix Morger. 2024. When Sparv met Superlim…A Sparv plugin for natural language understanding analysis of Swedish. Tech. rep. University of Gothenburg. https://gupea.ub.gu.se/handle/2077/83664 | sv |
dc.relation.haspart | Felix Morger, Aleksandrs Berdicevskis. 2024. Gauging linguistic variation using LLMs. Unpublished manuscript. | sv |
dc.subject | natural language processing, machine learning, machine learning interpretability, large language models, benchmarking | sv |
dc.title | In the minds of stochastic parrots: Benchmarking, evaluating and interpreting large language models | sv |
dc.type | Text | |
dc.type.svep | Doctoral thesis | eng |
dc.gup.mail | felix.morger@gu.se | sv |
dc.type.degree | Doctor of Philosophy | sv |
dc.gup.origin | Göteborgs universitet. Humanistiska fakulteten | swe |
dc.gup.origin | University of Gothenburg. Faculty of Humanities | eng |
dc.gup.department | Department of Swedish, Multilingualism, Language Technology ; Institutionen för svenska, flerspråkighet och språkteknologi | sv |
dc.gup.defenceplace | Fredagen den 13 december, kl. 13:15, J330, Humanisten, Renströmsgatan 6 | sv |
dc.gup.defencedate | 2024-12-13 | |
dc.gup.dissdb-fakultet | HF | |