Corpus Christi – En korpuslingvistisk studie av latinets semantiska utveckling i kristendomens spår.

Loading...
Thumbnail Image

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

The objective of this master thesis is to measure the effect of Christianity on Latin semantics with a specially trained Large Language Model and departing from a corpus-driven approach. First, I am investigating if we can confirm that words selected in the literature about Christian Latin de facto have undergone a measurable semantic shift in the Christian age, and if we can enrich this list with previously unnoticed words. Next, I want to find out if the results differ significantly depending on how Christian Latin is defined. The methodology is based on theories of distributional semantics and the Distributional Hypothesis, and follows other works in the field. First, an existing BERT model (LatinBERT) is trained on the Patrologia Latina corpus, under the assumption that this corpus is representative of Christian Latin. An algorithm is then selected from a metastudy to perform a Graded Change Detection and three different tests are performed in order to evaluate the model’s performance. Finally, the results are computed and analyzed quantitatively and qualitatively, and inferential statistics are applied to the data. The results show that the new model, XPLatinBERT (XPL), outperforms the SemEval2020 models for Latin on a benchmark based on a similar task. By and large all the words in the literature on Christian Latin are confirmed and other words are proposed by using a corpus based (the third quantile) and a corpus-driven approach. Due to lemmatization issues in the corpora under investigation, some words are false-positives, which calls for a deeper qualitative investigation of the results. Although a difference can be observed in the dataset as a whole, as well as on specific words, this difference is not strong enough to be statistically significant. It is therefore possible to consider Christian Latin as a register and to regard deviations as an effect of other factors such as Late and Medieval Latin, but more work has to be done. XPL can now be found on Github (Lafage, 2025b)

Description

Keywords

NLP, LatinBERT, XPLatinBERT, corpus linguistics, distributive semantics, Christian Latin, Sondersprache, word embeddings, MLM, machine-learning

Citation

Endorsement

Review

Supplemented By

Referenced By