GUPEA >
Faculty of Humanities / Humanistiska fakulteten >
Department of Philosophy, Linguistics and Theory of Science / Institutionen för filosofi, lingvistik och vetenskapsteori >
Doctoral Theses / Doktorsavhandlingar Institutionen för filosofi, lingvistik och vetenskapsteori >

Natural Language Processing for Low-resourced Code-switched Colloquial Languages – The Case of Algerian Language


Please use this identifier to cite or link to this item: http://hdl.handle.net/2077/64548

Files in This Item:

File Description SizeFormat
gupea_2077_64548_1.pdfThesis frame1698KbAdobe PDF
View/Open
gupea_2077_64548_2.pdfNailing sheet39KbAdobe PDF
View/Open
gupea_2077_64548_3.pdfCover3365KbAdobe PDF
View/Open
Title: Natural Language Processing for Low-resourced Code-switched Colloquial Languages – The Case of Algerian Language
Authors: Adouane, Wafia
E-mail: wafia.adouane@gu.se
wafia.gu@gmail.com
Issue Date: 9-Jun-2020
University: Göteborgs universitet. Humanistiska fakulteten
University of Gothenburg. Faculty of Humanities
Institution: Department of Philosophy, Linguistics and Theory of Science ; Institutionen för filosofi, lingvistik och vetenskapsteori
Parts of work: Wafia Adouane and Simon Dobnik. 2017. “Identification of Languages in Algerian Arabic Multilingual Documents”. In Proceedings of The 3rd Arabic Natural Language Processing Workshop (WANLP), pages 1–8. Association for Computational Linguistics.
VIEW ARTICLE


Wafia Adouane, Simon Dobnik, Jean-Philippe Bernardy, and Nasredine Semmar. 2018. “A Comparison of Character Neural Language Model and Boot- strapping for Language Identification in Multilingual Noisy Texts”. In Proceedings of the 2nd Workshop on Subword and Character Level Models in NLP (SCLeM), pages 22–31. Association for Computational Linguistics.
VIEW ARTICLE


Wafia Adouane, Jean-Philippe Bernardy, and Simon Dobnik. 2018. “Improving Neural Network Performance by Injecting Background Knowledge: Detecting Code-switching and Borrowing in Algerian texts”. In Proceedings of the 3rd Workshop on Computational Approaches to Linguistic Code-Switching, pages 20–28. Association for Computational Linguistics.
VIEW ARTICLE


Wafia Adouane, Jean-Philippe Bernardy, and Simon Dobnik. 2019. “Neural Models for Detecting Binary Semantic Textual Similarity for Algerian and MSA”. In Proceedings of the 4th Arabic Natural Language Processing Workshop (WANLP), pages 78–87. Association for Computational Linguistics.
VIEW ARTICLE


Wafia Adouane, Jean-Philippe Bernardy, and Simon Dobnik. 2019. “Normalising Non-standardised Orthography in Algerian Code-switched User-generated Data”. In Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT), pages 131–140. Association for Computational Linguistics.
VIEW ARTICLE


Wafia Adouane, Samia Touileb, and Jean-Philippe Bernardy. 2020. “Identifying Sentiments in Algerian Code-switched User-generated Comments”. In Proceedings of the 12th International Conference on Language Resources and Evaluation (LREC 2020), pages 2691–2698. European Language Resources Association.
VIEW ARTICLE


Wafia Adouane and Jean-Philippe Bernardy. 2020. “When is Multi-task Learning Beneficial for Low-Resource Noisy User-generated Algerian Texts?” In Proceedings of the 4th Workshop on Computational Approaches to Linguistic Code-Switching, pages 17–25. European Language Resources Association.
VIEW ARTICLE

Date of Defence: 2020-09-02
Disputation: September 2, 2020 at 17:00 in C350, Humanisten, Renströmsgatan 6, Gothenburg
Degree: Doctor of Philosophy
Publication type: Doctoral thesis
Keywords: Natural language processing
Deep neural networks
Low-resourced language
Colloquial language
Code-switch
Dialectal Arabic
User-generated data
Non-standardised orthography
Algerian language
Abstract: In this thesis we explore to what extent deep neural networks (DNNs), trained end-to-end, can be used to perform natural language processing tasks for code-switched colloquial languages lacking both large automated data and processing tools, for instance tokenisers, morpho-syntactic and semantic parsers, etc. We opt for an end-to-end learning approach because this kind of data is hard to control due to its high orthographic and linguistic variability. This variability makes it unrealistic to ei... more
ISBN: 978-91-7833-958-7 (print)
978-91-7833-959-4 (pdf)
URI: http://hdl.handle.net/2077/64548
Appears in Collections:Doctoral Theses from University of Gothenburg / Doktorsavhandlingar från Göteborgs universitet
Doctoral Theses / Doktorsavhandlingar Institutionen för filosofi, lingvistik och vetenskapsteori

 

 

© Göteborgs universitet 2011