• English
    • svenska
  • English 
    • English
    • svenska
  • Login
View Item 
  •   Home
  • Student essays / Studentuppsatser
  • Department of Computer Science and Engineering / Institutionen för data- och informationsteknik
  • Masteruppsatser
  • View Item
  •   Home
  • Student essays / Studentuppsatser
  • Department of Computer Science and Engineering / Institutionen för data- och informationsteknik
  • Masteruppsatser
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Syntax-based Concept Alignment for Machine Translation

Abstract
This thesis presents a syntax-based approach to Concept Alignment (CA), the task of finding semantical correspondences between parts of multilingual parallel texts, with a focus on Machine Translation (MT). Two variants of CA are taken into account: Concept Extraction (CE), whose aim is to identify new concepts by means of mere linguistic comparison, and Concept Propagation (CP), which consists in looking for the translation equivalents of a set of known concepts in a new language. As opposed to standard statistical alignment methods, our approach allows to simultaneously align individual words and multiword expressions (even discontinuous). Since phrase-level alignments are useful to correctly translate idiomatic expressions, this can be beneficial for grammar-based translation pipelines, such as those based on Grammatical Framework (GF), which we use to put our system to the test. This is made possible by the fact that the alignments extracted by our CA model are not correspondences between strings, but rather between grammatical objects. Another advantage of our system with respects to the solutions adopted in statistical MT is that, being essentially rule-based, it performs consistently well even on extremely small amounts of data. Our system does, however, rely on the quality of the analyses of the parallel corpora it is applied to. In order to mitigate the consequences of the lack of robustness of existing GF and, in general, constituency parsers, alignment is performed on the Universal Dependency (UD) trees generated by a neural dependency parser. The resulting concepts are then used, exploiting the similarities between UD and GF, as a starting point for automatically generating a GF lexicon to be used in translation. The tangible fruit of this work is a Haskell library, accompanied by a number of executables offering a user-friendly interface to perform both variants of CA, extraction and propagation, evaluate their results and use them in MT experiments.
Degree
Student essay
URI
https://hdl.handle.net/2077/75889
Collections
  • Masteruppsatser
View/Open
Thesis (2.540Mb)
Date
2023-03-30
Author
Masciolini, Arianna
Keywords
computational linguistic
machine translation
concept alignment
syntax
dependency parsing
Universal Dependencies
Grammatical Framework
Language
eng
Metadata
Show full item record

DSpace software copyright © 2002-2016  DuraSpace
Contact Us | Send Feedback
Theme by 
Atmire NV
 

 

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

LoginRegister

DSpace software copyright © 2002-2016  DuraSpace
Contact Us | Send Feedback
Theme by 
Atmire NV