Show simple item record

dc.contributor.authorMasciolini, Arianna
dc.date.accessioned2023-03-30T08:54:12Z
dc.date.available2023-03-30T08:54:12Z
dc.date.issued2023-03-30
dc.identifier.urihttps://hdl.handle.net/2077/75889
dc.description.abstractThis thesis presents a syntax-based approach to Concept Alignment (CA), the task of finding semantical correspondences between parts of multilingual parallel texts, with a focus on Machine Translation (MT). Two variants of CA are taken into account: Concept Extraction (CE), whose aim is to identify new concepts by means of mere linguistic comparison, and Concept Propagation (CP), which consists in looking for the translation equivalents of a set of known concepts in a new language. As opposed to standard statistical alignment methods, our approach allows to simultaneously align individual words and multiword expressions (even discontinuous). Since phrase-level alignments are useful to correctly translate idiomatic expressions, this can be beneficial for grammar-based translation pipelines, such as those based on Grammatical Framework (GF), which we use to put our system to the test. This is made possible by the fact that the alignments extracted by our CA model are not correspondences between strings, but rather between grammatical objects. Another advantage of our system with respects to the solutions adopted in statistical MT is that, being essentially rule-based, it performs consistently well even on extremely small amounts of data. Our system does, however, rely on the quality of the analyses of the parallel corpora it is applied to. In order to mitigate the consequences of the lack of robustness of existing GF and, in general, constituency parsers, alignment is performed on the Universal Dependency (UD) trees generated by a neural dependency parser. The resulting concepts are then used, exploiting the similarities between UD and GF, as a starting point for automatically generating a GF lexicon to be used in translation. The tangible fruit of this work is a Haskell library, accompanied by a number of executables offering a user-friendly interface to perform both variants of CA, extraction and propagation, evaluate their results and use them in MT experiments.en
dc.language.isoengen
dc.subjectcomputational linguisticen
dc.subjectmachine translationen
dc.subjectconcept alignmenten
dc.subjectsyntaxen
dc.subjectdependency parsingen
dc.subjectUniversal Dependenciesen
dc.subjectGrammatical Frameworken
dc.titleSyntax-based Concept Alignment for Machine Translationen
dc.typetext
dc.setspec.uppsokTechnology
dc.type.uppsokH2
dc.contributor.departmentGöteborgs universitet/Institutionen för data- och informationsteknikswe
dc.contributor.departmentUniversity of Gothenburg/Department of Computer Science and Engineeringeng
dc.type.degreeStudent essay


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record