Exploring natural language processing for single-word and multi-word lexical complexity from a second language learner perspective

Alfter, David

dc.contributor.author	Alfter, David
dc.date.accessioned	2021-02-09T09:26:48Z
dc.date.available	2021-02-09T09:26:48Z
dc.date.issued	2021-02-09
dc.identifier.isbn	978-91-87850-79-0
dc.identifier.issn	0347-948X
dc.identifier.uri	http://hdl.handle.net/2077/66861
dc.description.abstract	In this thesis, we investigate how natural language processing (NLP) tools and techniques can be applied to vocabulary aimed at second language learners of Swedish in order to classify vocabulary items into different proficiency levels suitable for learners of different levels. In the first part, we use feature-engineering to represent words as vectors and feed these vectors into machine learning algorithms in order to (1) learn CEFR labels from the input data and (2) predict the CEFR level of unseen words. Our experiments corroborate the finding that feature-based classification models using 'traditional' machine learning still outperform deep learning architectures in the task of deciding how complex a word is. In the second part, we use crowdsourcing as a technique to generate ranked lists of multi-word expressions using both experts and non-experts (i.e. language learners). Our experiment shows that non-expert and expert rankings are highly correlated, suggesting that non-expert intuition can be seen as on-par with expert knowledge, at least in the chosen experimental configuration. The main practical output of this research comes in two forms: prototypes and resources. We have implemented various prototype applications for (1) the automatic prediction of words based on the feature-engineering machine learning method, (2) language learning applications using graded word lists, and (3) an annotation tool for the manual annotation of expressions across a variety of linguistic factors.	sv
dc.language.iso	eng	sv
dc.relation.ispartofseries	Data linguistica	sv
dc.relation.ispartofseries	31	sv
dc.relation.haspart	Alfter, David and Yuri Bizzoni and Anders Agebjörn and Elena Volodina and Ildikó Pilán 2016. From distributions to labels: A lexical proficiency analysis using learner corpora. Proceedings of the joint workshop on NLP for Computer Assisted Language Learning and NLP for Language Acquisition at SLTC, Umeå, 16th November 2016 (No. 130, pp. 1-7). Linköping University Electronic Press.	sv
dc.relation.haspart	Alfter, David and Elena Volodina 2018. Towards single word lexical complexity prediction. Proceedings of the thirteenth workshop on innovative use of NLP for building educational applications (pp. 79-88).	sv
dc.relation.haspart	Alfter, David and Ildikó Pilán 2018. SB@GU at the Complex Word Identification 2018 Shared Task. Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications (pp. 315-321).	sv
dc.relation.haspart	Alfter, David and Therese Lindström Tiedemann and Elena Volodina 2020. Crowdsourcing Relative Rankings of Multi-Word Expressions: Experts versus Non-Experts. Northern European Journal of Language Technology.	sv
dc.relation.haspart	Alfter, David and Therese Lindström Tiedemann and Elena Volodina 2019. LEGATO: A flexible lexicographic annotation tool. In NEAL Proceedings of the 22nd Nordic Conference on Computional Linguistics (NoDaLiDa), September 30-October 2, Turku, Finland (No. 167, pp. 382-388). Linköping University Electronic Press.	sv
dc.relation.haspart	Graën, Johannes and David Alfter and Gerold Schneider 2020. Using Multilingual Resources to Evaluate CEFRLex for Learner Applications. Proceedings of The 12th Language Resources and Evaluation Conference (pp.346-355).	sv
dc.relation.haspart	Alfter, David and Lars Borin and Ildikó Pilán and Therese Lindström Tiedemann and Elena Volodina 2019. Lärka: From Language Learning Platform to Infrastructure for Research on Language Learning. In Selected papers from the CLARIN Annual Conference 2018, Pisa, 8-10 October 2018 (No. 159, pp. 1-14). Linköping University Electronic Press.	sv
dc.relation.haspart	Alfter, David and Johannes Graën 2019. Interconnecting lexical resources and word alignment: How do learners get on with particle verbs?. In Proceedings of the 22nd Nordic Conference on Computational Linguistics (pp. 321-326).	sv
dc.subject	natural language processing	sv
dc.subject	lexical complexity	sv
dc.subject	CEFR	sv
dc.subject	second language learning	sv
dc.subject	machine learning	sv
dc.subject	crowdsourcing	sv
dc.title	Exploring natural language processing for single-word and multi-word lexical complexity from a second language learner perspective	sv
dc.type	Text
dc.type.svep	Doctoral thesis	eng
dc.gup.mail	alfter.david@gmx.net	sv
dc.type.degree	Doctor of Philosophy	sv
dc.gup.origin	Göteborgs universitet. Humanistiska fakulteten	swe
dc.gup.origin	University of Gothenburg. Faculty of Humanities	eng
dc.gup.department	Department of Swedish ; Institutionen för svenska språket	sv
dc.gup.defenceplace	Tisdag den 2 mars 2021, kl 13, hörsal J330, Humaninsten, Renströmsgatan	sv
dc.gup.defencedate	2021-03-02
dc.gup.dissdb-fakultet	HF

Files in this item

Name:: gupea_2077_66861_2.pdf
Size:: 962.3Kb
Format:: PDF
Description:: Cover

View/Open

Name:: gupea_2077_66861_3.pdf
Size:: 85.87Kb
Format:: PDF
Description:: Abstract

View/Open

Name:: gupea_2077_66861_4.pdf
Size:: 1.645Mb
Format:: PDF
Description:: Thesis

View/Open

This item appears in the following Collection(s)

Doctoral Theses / Doktorsavhandlingar Institutionen för svenska, flerspråkighet och språkteknologi
Doctoral Theses from University of Gothenburg / Doktorsavhandlingar från Göteborgs universitet

Show simple item record