Corpus Christi – En korpuslingvistisk studie av latinets semantiska utveckling i kristendomens spår.
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The objective of this master thesis is to measure the effect of Christianity on Latin semantics
with a specially trained Large Language Model and departing from a corpus-driven approach.
First, I am investigating if we can confirm that words selected in the literature about Christian
Latin de facto have undergone a measurable semantic shift in the Christian age, and if we can
enrich this list with previously unnoticed words. Next, I want to find out if the results differ
significantly depending on how Christian Latin is defined.
The methodology is based on theories of distributional semantics and the Distributional
Hypothesis, and follows other works in the field. First, an existing BERT model (LatinBERT)
is trained on the Patrologia Latina corpus, under the assumption that this corpus is
representative of Christian Latin. An algorithm is then selected from a metastudy to perform a
Graded Change Detection and three different tests are performed in order to evaluate the
model’s performance. Finally, the results are computed and analyzed quantitatively and
qualitatively, and inferential statistics are applied to the data.
The results show that the new model, XPLatinBERT (XPL), outperforms the SemEval2020
models for Latin on a benchmark based on a similar task. By and large all the words in the
literature on Christian Latin are confirmed and other words are proposed by using a corpus based (the third quantile) and a corpus-driven approach. Due to lemmatization issues in the
corpora under investigation, some words are false-positives, which calls for a deeper qualitative
investigation of the results.
Although a difference can be observed in the dataset as a whole, as well as on specific words,
this difference is not strong enough to be statistically significant. It is therefore possible to
consider Christian Latin as a register and to regard deviations as an effect of other factors such
as Late and Medieval Latin, but more work has to be done.
XPL can now be found on Github (Lafage, 2025b)
Description
Keywords
NLP, LatinBERT, XPLatinBERT, corpus linguistics, distributive semantics, Christian Latin, Sondersprache, word embeddings, MLM, machine-learning