NEURAL MACHINE TRANSLATION FROM NORTH SÁMI TO SWEDISH
Neural machine translation is a method used in automatic translation that makes use of artificial neural networks. A single model takes an input sequence and predicts the most likely output sequence of words after being trained on parallel data. In this master thesis, a neural machine translation model for the language pair North Sámi - Swedish was developed and trained. Since no parallel corpus exists between the two languages, a data set of Norwegian and North Sámi of about 225.000 sentences was translated to Swedish and used as training data. The model architecture is based on Vaswani et al. (2017)’s transformer, which is the state-of-the-art approach, if enough parallel data is available. Following Sennrich et al. (2016)’s techniques of combining methods to lower the amount of necessary data, a BLEU score of 44.11 was achieved. Due to the relatively small amount of available parallel data, techniques of incorporating monolingual bitext and creating synthetic additional data were implemented, but did not result in any further improvements.
Neural Machine Translation, low-resource language, North Sámi - Swedish