A Comparative Study with LDA and BERTopic:
AI Policies Across Different Democracy
Indexes
Master’s thesis in Applied Data Science
Anne Söderwall & Gabija Telešova
Department of Computer Science and Engineering
CHALMERS UNIVERSITY OF TECHNOLOGY
UNIVERSITY OF GOTHENBURG
Gothenburg, Sweden 2025

Master’s thesis 2025
A Comparative Study with LDA and BERTopic:
AI Policies Across Different Democracy Indexes
Anne Söderwall & Gabija Telešova
Department of Computer Science and Engineering
Chalmers University of Technology
University of Gothenburg
Gothenburg, Sweden 2025
A Comparative Study with LDA and BERTopic: AI Policies Across Different Democ-
racy Indexes
Anne Söderwall & Gabija Telešova
© Anne Söderwall & Gabija Telešova, 2025.
Supervisor: Denitsa Saynova, Department of Computer Science and Engineering
Examiner: Moa Johansson, Department of Computer Science and Engineering
Master’s Thesis 2025
Department of Computer Science and Engineering
Chalmers University of Technology and University of Gothenburg
SE-412 96 Gothenburg
Telephone +46 31 772 1000
Typeset in LATEX
Gothenburg, Sweden 2025
iv
A Comparative Study with LDA and BERTopic: AI Policies Across Different Democ-
racy Indexes
Anne Söderwall & Gabija Telešova
Department of Computer Science and Engineering
Chalmers University of Technology and University of Gothenburg
Abstract
In times of global political instability, paired with an evolving and experimental
phase in artificial intelligence, the future of AI remains unclear. What is even less
defined is how governments around the world plan to use, regulate, or develop
it. Therefore, this thesis aims to evaluate how topic models perform in policy
documents and how different government types influence these policies. This was
done by scraping AI policies collected by the OECD’s AI Policy Observatory across
different countries, later categorized by government type – Full Democracy, Flawed
Democracy, Hybrid Regime, and Authoritarian Regime. Two topic models, LDA
and BERTopic, were applied to extract topics and keywords for each regime. The
results suggest that LDA’s topics were more detailed but less interpretable, whilst
BERTopic was better suited for human interpretation and understanding. All
government types, more or less, focused on ethics and digital governance themes. On
a deeper level, Full Democracy emphasized regulations of already existing technology,
Flawed Democracy focused on military development, Hybrid Regime was centered
around the actual implementation, and Authoritarian Regime emphasized research
and a broader context of state control. The final results obtained by using OCTIS
measurements proposed that LDA exceeded in quantitative and statistical evaluations,
while BERTopic was consistently preferred for human interpretation. This discrepancy
illustrates the methodological tension between how models are evaluated and how
understandable they are in practical application.
Keywords: data science, political science, thesis, AI, policies, government, BERTopic,
LDA, OCTIS, ethical framework
v

Acknowledgements
We would like to express our sincere gratitude to our supervisor for her continuous
support, guidance, and encouragement throughout this thesis. We are also thankful
to our families, friends, and peers for their moral support, helpful feedback, and
constructive suggestions during the writing process.
Anne Söderwall & Gabija Telešova, Gothenburg, 2025-06-10
vii

Contents
List of Figures xi
List of Tables xiii
1 Introduction 1
1.1 Research Question . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Thesis Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Theory 5
2.1 Natural Language Processing . . . . . . . . . . . . . . . . . . . . . . 5
2.1.1 Latent Dirichlet Allocation . . . . . . . . . . . . . . . . . . . . 6
2.1.1.1 Structure and Parameters . . . . . . . . . . . . . . . 6
2.1.1.2 Inference and Parameter Estimation . . . . . . . . . 7
2.1.1.3 Bag-of-Words . . . . . . . . . . . . . . . . . . . . . . 9
2.1.2 BERTopic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.2.1 Transformers . . . . . . . . . . . . . . . . . . . . . . 9
2.1.2.2 BERT . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.1.2.3 Model Architecture . . . . . . . . . . . . . . . . . . . 14
2.2 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.2.1 Intrinsic Evaluation Metrics . . . . . . . . . . . . . . . . . . . 19
2.2.2 OCTIS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.3 Political and Ethical Context of AI Policy . . . . . . . . . . . . . . . 21
3 Methods 23
3.1 Data Collection and Processing . . . . . . . . . . . . . . . . . . . . . 23
3.1.1 Data Cleaning (Pre-Scraping Stage) . . . . . . . . . . . . . . . 23
3.1.2 Scraping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.1.3 Data Preprocessing (Post-Scraping Stage) . . . . . . . . . . . 24
3.2 LDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2.1 Text Processing and Chunking . . . . . . . . . . . . . . . . . . 27
3.2.2 LDA Modeling and Hyperparameter Tuning . . . . . . . . . . 27
3.3 BERTopic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.4 Qualitative Topic Analysis . . . . . . . . . . . . . . . . . . . . . . . . 31
3.5 Quantitative Comparison Using OCTIS . . . . . . . . . . . . . . . . . 32
3.6 Ethical Topic Variation Across Government Type . . . . . . . . . . . 32
ix
Contents
3.6.1 UNESCO Recommendation on the Ethics of Artificial Intelli-
gence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.6.2 Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.6.3 Analytical Application . . . . . . . . . . . . . . . . . . . . . . 34
3.6.3.1 Government Types versus Framework . . . . . . . . 35
4 Results 37
4.1 Model Configuration and Setup . . . . . . . . . . . . . . . . . . . . . 37
4.1.1 LDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.1.2 BERTopic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.2 Qualitative Topic Analysis . . . . . . . . . . . . . . . . . . . . . . . . 38
4.2.1 LDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.2.2 BERTopic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.3 Ethical Topic Variation Across Government Type . . . . . . . . . . . 48
4.4 Quantitative Results Using OCTIS . . . . . . . . . . . . . . . . . . . 51
5 Discussions 53
5.1 Model Comparison and Qualitative Analysis . . . . . . . . . . . . . . 53
5.1.1 Model Comparison per Government Type . . . . . . . . . . . 53
5.1.2 Cross-Regime Comparison . . . . . . . . . . . . . . . . . . . . 55
5.2 Ethical Topic Variation Across Government Types . . . . . . . . . . . 56
5.2.1 Topic-Level Comparison . . . . . . . . . . . . . . . . . . . . . 56
5.2.2 Government-Level Comparison . . . . . . . . . . . . . . . . . 57
5.3 Quantitative Analysis Using OCTIS . . . . . . . . . . . . . . . . . . . 58
6 Conclusion 61
6.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
6.2 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
6.3 Further Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Bibliography 65
A Appendix 1 I
x
List of Figures
2.1 Plate Notation of LDA [8]. . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Variational Distribution Used to Approximate the Posterior in LDA. . 9
2.3 The Transformer Model Architecture from the Attention Is All You
Need Paper [7]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.4 The Main Two Approaches Used When Constructing the BERT Model
for Different Tasks [12]. . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.5 BERTopic Sequence of Steps to Create Its Topic Representations [37]. 15
3.1 Total English Words by Government Type. . . . . . . . . . . . . . . . 25
3.2 The 11 Framework Categories Based on UNESCO’s Recommendation. 34
4.1 Stacked Bar Charts of Normalized Topic-Framework Scores from Both
Topic Modeling Approaches. . . . . . . . . . . . . . . . . . . . . . . . 50
A.1 Distributions of Overlap Scores by Topic Model. . . . . . . . . . . . . XIII
A.2 Comparison of Topic Diversity by Model and Government Type. . . . XIV
A.3 Comparison of CV Coherence by Model and Government Type. . . . XIV
A.4 Comparison of IRBO by Model and Government Type. . . . . . . . . XV
A.5 Comparison of WECoherencePairwise by Model and Government Type.XV
xi
List of Figures
xii
List of Tables
2.1 Democracy Index Classifications [3]. . . . . . . . . . . . . . . . . . . . 22
3.1 Aggregated Metrics by Government Type. . . . . . . . . . . . . . . . 26
3.2 Grid Search Parameters for LDA Model Optimization. . . . . . . . . 28
3.3 Hyperparameter Grid Search for BERTopic. . . . . . . . . . . . . . . 31
4.1 LDA Hyperparameters by Government Type. . . . . . . . . . . . . . 38
4.2 Hyperparameters Used for Each Government Type. . . . . . . . . . . 38
4.3 LDA: Full Democracy Topics and Qualitative Interpretation. . . . . . 40
4.4 LDA: Flawed Democracy Topics and Qualitative Interpretation. . . . 41
4.5 LDA: Hybrid Regime Topics and Qualitative Interpretation. . . . . . 42
4.6 LDA: Authoritarian Regime Topics and Qualitative Interpretation. . 43
4.7 BERTopic: Full Democracy Topics and Qualitative Interpretation. . . 45
4.8 BERTopic: Flawed Democracy Topics and Qualitative Interpretation. 46
4.9 BERTopic: Hybrid Regime Topics and Qualitative Interpretation. . . 47
4.10 BERTopic: Authoritarian Regime Topics and Qualitative Interpretation. 48
4.11 The Number of Government Type Overlaps With the Created Ethical
Framework’s Topics (Out of 11) (A.1) for LDA and BERTopic. . . . . 49
4.12 Model Topic Overlap with the Framework Categories, Normalized
Scores. Green Color Indicates the Highest Score Overlaps, and Red
Color Indicates the Lowest Scores. The Framework Categories That
Both Models Match Are Also Highlighted Respectively. . . . . . . . . 51
4.13 Comparative OCTIS Metrics for LDA vs BERTopic by Government
Type. Higher Values Are Bolded. . . . . . . . . . . . . . . . . . . . . 52
A.2 “Public access URL” Status Codes and Counts. . . . . . . . . . . . . III
A.4 Grid Search Results for LDA Hyperparameters Across Government
Types. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VI
A.5 Top 5 (and Top 10 for Hybrid Regime) Hyperparameter Sets for Each
Government Type. . . . . . . . . . . . . . . . . . . . . . . . . . . . . VII
A.6 Coherence Scores per Government Type: With and Without POS
Tagging. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VIII
A.7 LDA: Raw scores for Topics by Government Type. . . . . . . . . . . . X
A.8 BERTopic: Raw scores for Topics by Government Type. . . . . . . . XII
xiii
List of Tables
xiv
Acronyms
AI Artificial Intelligence
BERT Bidirectional Encoder Representations from Transformers
BoW Bag of Words
CNN Convolutional Neural Network
CSV Comma-Separated Values
HDBSCAN Hierarchical Density-Based Spatial Clustering of Applications with
Noise
IRBO Inverted Ranked-Biased Overlap
LDA Latent Dirichlet Allocation
MCMC Markov Chain Monte Carlo
MLM Masked Language Modeling
NER Named Entity Recognition
NLP Natural Language Processing
NSP Next Sentence Prediction
OCTIS Optimizing and Comparing Topic Models Is Simple
OECD Organization for Economic Co-operation and Development
POS Part-of-Speech
QA Question Answering
RNN Recurrent Neural Network
SOTA State-of-the-Art
UMAP Uniform Manifold Approximation and Projection
UNESCO United Nations Educational, Scientific and Cultural Organization
xv
List of Tables
xvi
1
Introduction
Countries are experiencing a transformative period marked by drastic political
changes, while simultaneously navigating Artificial Intelligence (AI) as an expanding
field with a growing number of actors in the global market. This convergence
of political changes and AI advancement raises concerns and questions about the
interplay between government structures and AI policy development. Given the
international nature of the development and its use transcending national jurisdictions,
it is important to recognize the different approaches to AI within distinct political
contexts.
Recent work by the Center for AI and Democratic Values [1] has addressed similar
themes by qualitatively analyzing AI governance across political systems. Their
findings draw attention to the importance of understanding how regime types shape
ethical and strategic priorities in AI policies. However, their approach relies on
manual qualitative analysis, which offers valuable insights but remains limited in
terms of scalability and reproducibility.
This thesis applies data science, particularly Natural Language Processing (NLP)
techniques, to analyze AI policy initiative documents from different countries and
correlate their characteristics with democratic indexes. By comparing different topic
modeling approaches (Latent Dirichlet Allocation (LDA) and BERTopic), we aim
not only to understand the themes of AI policies globally but also to contribute
methodologically to how NLP can be applied to understand policy discourse and
societal priorities. By grounding our analysis in United Nations Educational, Scientific
and Cultural Organization (UNESCO)’s ethical framework for AI governance, we will
systematically explore how political systems prioritize key aspects of AI governance,
such as ethical considerations, public accountability, and technological innovation.
The Organization for Economic Co-operation and Development (OECD) [2] has
collected more than 1000 AI policies and initiatives from 70 countries. These
documents range from comprehensive National AI Strategies to countries’ projects
to implement AI in healthcare or education. The countries in this collection vary in
political governance structure, and the collected policies give insight into the diverse
implementation of global AI strategies.
To compare different political contexts, this thesis adopts the Economist Intelligence
Unit’s Democracy Index [3], which categorizes countries into four regime types: Full
Democracies, Flawed Democracies, Hybrid Regimes, and Authoritarian Regimes.
1
1. Introduction
These classifications are based on criteria such as electoral process, civil liberties,
political participation, and functioning of government. Examining how these different
government types articulate their approaches to AI gives us a valuable understanding
of the respective priorities in shaping the field of AI.
However, the volume and complexity of AI policy documents make traditional
qualitative analysis impractical for examining large multilingual corpora produced
by diverse actors across various governance systems. Applying data science and NLP
allows us to examine the structure and meaning of unstructured text on a larger scale;
in particular, topic modeling can identify latent semantic structures and recurring
themes within documents. In a political context, topic models can help reveal how
states conceptualize issues like ethics, surveillance, or innovation. From a technical
perspective, this problem also presents an opportunity to explore interdisciplinary
adaption and comparative analysis. By applying and evaluating two topic modeling
techniques, LDA and BERTopic, this thesis aims to identify which approaches are
most effective in extracting coherent, interpretable, and politically relevant themes
from global AI policy documents.
1.1 Research Question
The aim of this thesis is twofold: first, to evaluate the technical performance
of different topic models in policy texts; and second, to derive insights on how
government structures shape the focus and framing of AI policies using topic models.
The primary research question guiding this study is:
How do LDA and BERTopic compare in their ability to extract meaningful and
interpretable topics from AI policy documents?
To answer the primary question, we will consider the following sub-questions:
• What are the dominant themes and keywords in AI policy documents across
different types of governance, as defined by the Democracy Index?
• Are there measurable differences in the thematic emphasis on ethical consid-
erations, public accountability, or economic priorities based on a country’s
governance structure?
1.2 Thesis Structure
This thesis includes the following chapters:
Chapter 2 introduces the background and theory needed to answer the research
questions. In particular, it covers key concepts in NLP, the Transformer architectures,
and Bidirectional Encoder Representations from Transformers (BERT) as the basis
for our topic modeling methods – LDA and BERTopic. It also defines chosen
evaluation metrics (coherence score and Optimizing and Comparing Topic Models Is
2
1. Introduction
Simple (OCTIS) measurements) and includes the categorization of the documents
into different government types.
Chapter 3 introduces the methodology of the thesis, including data collection and
processing, BERTopic and LDA model setup and hyperparameter selection, and
qualitative interpretation of the output topics. The chapter closes with quantitative
and qualitative assessments using OCTIS and an ethical framework, respectively.
Chapter 4 presents the key results obtained from the modeling experiments.
Chapter 5 examines the trends, patterns, and observations made from these results,
linking them back to the research questions.
Chapter 6 concludes the thesis by summarizing the key findings of the discussion and
forming the answers to the research questions. It includes limitations, suggesting
that further research may be needed.
3
1. Introduction
4
2
Theory
This chapter starts by introducing the broad concept of NLP, with a focus on topic
modeling in political science research. Then it goes further into details of the LDA
and BERTopic models, outlining their structures, key parameters, and applications.
The Transformer and BERT architectures are emphasized to understand BERTopic’s
underlying mechanisms. Finally, OCTIS metrics are used for a quantitative compari-
son of model coherence and diversity, whilst an ethical framework is used as a lens
for a qualitative measurement of different regimes.
2.1 Natural Language Processing
NLP is the field of study concerning the interaction between computers and human
language. The field has rapidly evolved over the last decades from focusing on
syntax in the 1960s to today’s advanced machine learning and AI applications.
While early efforts in NLP were characterized by hand-crafted, rule-based systems
designed to encode linguistic knowledge explicitly, recent developments have shifted
toward statistical and data-driven approaches. As a result, NLP has moved beyond
its traditional linguistic focus to influence a wide range of everyday technologies,
including digital assistants, machine translation, and automated content analysis [4].
Newer NLP approaches rely heavily on machine learning, which allows systems
to learn patterns from examples rather than, as previously done, relying solely
on rule-based calculations [5], [6]. Neural architectures, such as Recurrent Neural
Network (RNN) and transformers, have performed well on different NLP tasks such as
translation and summarization, and enable more context-aware modeling of language
[7].
Within this broader evolution, topic models have emerged as a family of unsuper-
vised machine learning methods designed to automatically identify latent semantic
structures in large corpora. Instead of manually reading and coding each text,
topic models infer groups of words that frequently occur together. They uncover
latent thematic patterns without previously labeled data, making them suitable for
exploratory analysis. Topic models also support scalability by summarizing large
text datasets and supporting systematic, comparative studies in areas such as policy
analysis. Furthermore, topic modeling offers transparency and interpretability and
facilitates replicable thematic comparisons.
5
2. Theory
2.1.1 Latent Dirichlet Allocation
Among the first models in topic modeling is Latent Dirichlet Allocation (LDA), where
the goal is to uncover central topics and their distributions across documents and
generate succinct representations of large datasets [8]. Its probabilistic framework
preserves important statistical patterns and supports a range of downstream tasks,
including classification, anomaly detection, summarization, and measuring similarity
or relevance.
2.1.1.1 Structure and Parameters
LDA models each document in a collection as a mixture of topics, where each topic
is represented by a probability distribution over words. This structure allows LDA
to capture the underlying thematic structure within a corpus. The core idea is that
documents express multiple topics to varying degrees, and each word is assumed to
be generated from one of these topics.
Before outlining the model’s formal structure, it is helpful to define the key variables
and parameters involved in LDA:
• N : the number of words in a document (often observed).
• ξ : the Poisson parameter that governs the expected length of a document
• k: the number of latent topics in the model (a fixed hyperparameter).
• V : the size of the vocabulary.
• θ: the topic distribution for a document, drawn from a Dirichlet prior with
parameter α.
• α: the hyperparameter that shapes the Dirichlet prior over topic distributions.
• wn: the n-th word in a document.
• zn: the latent topic assignment for the n-th word in a document.
• β: the topic-word distribution matrix, with dimensions k × V , where each row
βi represents the word distribution for topic i.
Given these definitions, the generative process for LDA, as defined by Blei, Ng, and
Jordan [8], is as follows:
1. Choose N ∼ Poisson(ξ)
2. Choose θ ∼ Dir(α)
3. For each of the N words wn:
(a) Choose a topic zn ∼ Multinomial(θ)
(b) Choose a word wn from p(wn|zn, β), a multinomial probability conditioned
on the topic zn
6
2. Theory
Figure 2.1: Plate Notation of LDA [8].
The Poisson (ξ) distribution is used to model the number of words N in a document,
with ξ denoting the document length, which accounts for the variation of document
sizes. However, the ξ component is often excluded since the document length is often
a fixed number and can be observed. Instead, the focus is primarily on the latent
topic structure.
Following the generative process, θ represents the topic distribution for a document
drawn from a Dirichlet prior parameterized by α. The Dirichlet distribution serves as
a prior over multinomial distributions and encodes assumptions about topic density
within a document. Smaller values of α create a more focused topic distribution, and
higher values distribute it more uniformly.
Each latent topic assignment zn is drawn from aMultinomial distribution over a fixed
number of topics k, corresponding to the dimensionality of the Dirichlet distribution
θ ∼ Dir(α). In this context, the dimensionality k refers to the number of latent
topics assumed in the model, and determines the number of components in the topic
distribution θ.
In the basic model, the number of topics k is set prior to training and remains
constant through the modeling process. Therefore, each latent topic assignment zn
represents a discrete selection from the k selected topics. This topic assignment
determines from which topic-specified word distribution the observed word wn is
drawn, which in turn determines the probability distribution from which the word
wn is generated.
The matrix β, estimated during training, defines the word-topic density. It is
structured as a k × V matrix, where each row represents a multinomial distribution
over the corpus vocabulary for k latent topics and V as the vocabulary size. Each
individual value βij tells the probability of the word wj being generated in topic i.
When the topic assignment zn is sampled for a word position, the observed word is
drawn from the corresponding distribution βzn .
2.1.1.2 Inference and Parameter Estimation
To uncover the latent topic structure in LDA, it is necessary to compute the posterior
distribution of the latent variables given the observed data. This is expressed as:
( | p(θ, z, w | α, β)p θ, z w, α, β) = ( ,p w | α, β)
7
2. Theory
where θ denotes the hidden variables’ topic proportions for each document, z topic
assignments for each word, w for the observed words in the corpus, and α and β the
Dirichlet hyperparameters.
The posterior distribution expresses the conditional probability of the latent variables
θ and z after observing the actual data w. It combines the previous assumption
about the document-topic and topic-word distributions, encoded by α and β, with
the tokens appearing in each document of the corpus, and generates the most likely
latent structure in the corpus.
However, the posterior is intractable. The marginal likelihood, expressed as p(w |
α, β), needs to be calculated by iterating over all possible topic proportions and
summing over all possible topic assignments. This is expressed as:
∫ (∑ ∏ )N
p(w | α, β) = p(θ | α) p(z | θ) p(wn | zn, β) dθ.
z n=1
This computation is expensive for large real-world datasets, as the number of possible
combinations of latent variables, topic proportions θ for each document and topic
assignments z, grows exponentially with the number of documents, topics, and words
[8].
Approximate inference methods can be used to identify a family of lower bounds of
the log-likelihood of the data by introducing a set of variational parameters indexes.
The variational parameters are decided based on optimizing to find the lowest bound.
These parameters are adjusted to minimize the Kullback-Leibler divergence between
the true and the approximate posterior. Two main approaches exist: sampling-based
methods, such as Markov Chain Monte Carlo (MCMC), and optimization-based
methods, such as variational inference. The latter has become the standard approach
for LDA.
In the standard plate notation of LDA (see Figure 2.1), the model is defined in its
generative form. However, the interaction between θ and β through z and w creates a
computational bottleneck, or problematic coupling, making exact inference difficult.
To enable approximate inference, a modified presentation is used. Figure 2.2 shows the
variational distribution, where the original parameters such as θ and β are replaced
with variational parameters γ and ϕ. The edges in the diagram are decoupled to
reflect the independence assumptions introduced by the variational approximation,
allowing for a tractable optimization of the posterior.
Even with approximate methods, calculating inference can still be computationally
expensive, so online variations has been developed to be able to scale the inference.
Online Variational Bayes is an approximate technique based on stochastic optimiza-
tion. It processes text in mini-batches and does not require storing the entire corpus
in memory, allowing documents to be discarded after processing [9].
8
2. Theory
Figure 2.2: Variational Distribution Used to Approximate the Posterior in LDA.
2.1.1.3 Bag-of-Words
Within the field of NLP, various techniques exist for representing and processing text
data. One such foundational approach is Bag of Words (BoW), which represents
each document as a sparse vector in a high-dimensional space, where each dimension
corresponds to a unique term from the vocabulary, and the value typically encodes
the term frequency or a weighted variant. A corpus is therefore not considered to
be a flowing text, but instead a collection of isolated words without a specific order
or relation to each other, and thus does not consider contexts in which the words
appear.
The BoW process is based on the assumption of the probabilistic concept of ex-
changeability, which infers that the order of words within a document (and even the
documents within a corpus) can be neglected without loss of generality [8]. This
assumption makes it possible to model a sequence of words as conditionally indepen-
dent and identically distributed once a latent parameter is introduced. This is the
basis for probabilistic models like LDA, which represents each document as a mixture
of latent topics, and each topic as a distribution over words. While the original
LDA assumes full exchangeability over unigrams, more recent extensions relax this
assumption by allowing sequences such as bigrams or trigrams to be modeled. This
enables the capture of limited word-order dependencies and multi-word expressions,
making the resulting topics more coherent and semantically meaningful.
2.1.2 BERTopic
BERTopic, introduced by Grootendorst [10], is a semantic topic model that uses
Sentence-BERT (SBERT) embeddings. For the model to work well with complex
semantic topic representations, BERTopic encodes each document as a fixed-length
vector to further cluster these vectors into topics [11].
2.1.2.1 Transformers
Different NLP tasks require distinct model architectures. The Transformer, intro-
duced by scientists at Google in 2017, is a modern, widely used architecture that
9
2. Theory
combines encoder and decoder components (explained later in this section) that have
eliminated the need for recurrence by using the attention mechanism [7]. It intro-
duces efficient parallelization and long-term memory, addressing limitations inherent
to RNN and Convolutional Neural Network (CNN) used in sequence-to-sequence
translation tasks, such as sequential processing constraints and difficulty capturing
long-range dependencies. The Transformer is the foundation of key models used for
NLP applications – including BERT [12] and GPT [13] – where BERT utilizes only
the encoder stack and GPT uses only the decoder.
The Transformer – unlike RNN and CNN – eliminates the need for recurrence and
convolution by relying entirely on self-attention.
Self-Attention
The self-attention mechanism allows each word in a sentence to be compared to every
other word, regardless of the distance between them. This capability enables the
model to understand context, like connecting a noun at the start of a sentence to a
pronoun at the end, or across multiple sentences. In doing so, it mimics how humans
understand text by focusing on relevant parts regardless of position. Specifically, the
model generates three vectors – Query, Key, and Value – for each word. These vectors
are derived from the input embeddings. The self-attention then compares each word
(Query) to all others (Keys) in a sentence, and outputs a score that indicates how
strongly the words are related and what weight to assign to each word.
The attention score of a word’s likelihood of appearing in a particular position is
calculated by the following:
T
Attention(Q,K, V ) = softmax(Q√K )V,
dk
where Q, K, and V values are the Query, Key, and Value vectors, and dk is the
dimension of the Key vector.
Architecture: Encoder and Decoder
As described by the authors [7] and shown in Figure 2.3, the Transformer architecture
consists of two main components: the encoder and the decoder.
• Encoder: Each encoder takes an input sequence and processes it in parallel. It
starts by applying embedding and positional encoding, which adds additional
information about the position of each token in the sequence, as attention
alone lacks this information. The embedded inputs are then passed through N
identical layers, each containing:
– Multi-Head Self-Attention: Each self-attention (“head”) focuses on differ-
ent aspects of the input. For instance, studies have shown that different
heads can specialize in different tasks, such as handling infrequent words,
encoding syntactic dependencies, or positional information of words [14].
10
2. Theory
Figure 2.3: The Transformer Model Architecture from the Attention Is All You Need
Paper [7].
Moreover, the multiplicity of heads allows efficient computation of rela-
tionships among tokens obtained by the previous layer.
– Feed Forward Network: Applies a linear transformation with ReLU ac-
tivation between layers to compute similarity scores and further process
the features for deeper understanding.
• Decoder: The output from the encoder layers is passed to the decoder.
Similarly, the decoder consists of the positional encoding and feed-forward
network layers. However, the decoder uses Masked Multi-Head Attention
instead of regular multi-head attention to prevent the model from seeing future
words and focus on already generated words during training. The final Add &
Norm layer outputs the normalized vectors that are then passed through linear
and softmax layers to produce the final output.
Applications
The Transformer is the foundation for powerful models, including GPT and BERT
[12]. Several surveys ([15]–[17]) have documented the Transformer applications
across real-world domains. The advancement of architecture in handling long-term
dependencies has made a significant impact in both NLP and deep learning fields,
including computer vision and multimodal applications [15], [17]. Due to its attention
11
2. Theory
mechanism and contextual awareness, the Transformer is widely used in many NLP
applications. It has shown outstanding performance in a variety of subtasks, such
as Question Answering (QA), machine translation, and sentiment analysis [15].
More specifically, the encoder-decoder architecture plays a crucial role in modern
topic modeling approaches, enabling models such as BERT or probabilistic TNTM
(Transformer-Representation Neural Topic Model) [18] to capture nuanced language
patterns across diverse texts and clusters.
2.1.2.2 BERT
BERT is a language model created by Google researchers [12]. Using the Transformer’s
encoder, it bidirectionally encodes text passages and thus considers context from
both directions, which, at the time, gave it State-of-the-Art (SOTA) performance on
many NLP tasks. Additionally, BERT has the advantage of being relatively small
and computationally lighter than other models.
BERT’s architecture consists of the Transformer encoder (parts explained in detail
in Section 2.1.2.1) [12]. As in the training state of the Transformer, the BERT
model uses Masked Language Modeling (MLM), randomly masking tokens in a
document to predict the original input. Unlike OpenAI GPT, which processes text
left-to-right, BERT is bidirectional, meaning it considers both left and right context
simultaneously to predict masked tokens. An example of MLM could be the sentence
“This document discusses AI [MASK] from different government types”, where BERT’s
task is to predict the most likely word that could replace the [MASK] token (e.g.,
“policies”, “laws”, “rules”) by considering context from both directions. Contrary to
deep learning word embedding models that provide stable word embeddings that
lack contextual dependency (e.g., Word2Vec and GloVe), BERT produces token
embeddings based on the word’s/token’s role in a sentence [19]. For instance, the
word season will have different encodings for sentences Spring is my favorite season.
and Can you season the pasta?.
Moreover, BERT takes special tokens like [CLS] (classification) and [SEP] (separation)
to indicate the beginning and end of a sequence, respectively. These tokens are then
used in the further Next Sentence Prediction (NSP) task (explained later).
Pre-training and Fine-tuning
As mentioned, the BERT model was built to handle a variety of NLP tasks. However,
not every task can be effectively tackled using the same training settings, which
introduces the need for additional pre-training and fine-tuning steps tailored to
specific tasks [12]. The following two approaches are used (shown in Figure 2.4):
• Pre-training: The model is trained on large-scale unlabeled text using two
tasks: MLM and NSP. MLM enables BERT’s bidirectional context understand-
ing by randomly masking 15% of tokens in a sequence and predicting them
based on surrounding text. For each training sequence, 15% of tokens are
selected for prediction. Of these:
– 80% are replaced with the [MASK] token,
12
2. Theory
– 10% are replaced with a random token, and
– 10% are left unchanged.
NSP enables BERT to model relationships between sentences by predicting
whether one sentence logically follows another. During training, 50% of sentence
pairs are consecutive, while the remaining 50% are randomly selected from the
corpus [12].
• Fine-tuning: The pre-trained model is further trained on labeled data to adapt
it for specific NLP tasks. By including an extra output layer, the model can be
trained for different tasks, such as QA, Named Entity Recognition (NER), and
text classification, often leading to improved task performance. Research shows
that fine-tuning largely preserves the spatial structure of the original token
embeddings while adjusting task-relevant parameters, making classification
boundaries clearer [20]. Additionally, fine-tuning refines representations by
bringing tokens belonging to the same label closer together, enabling more
accurate classifications.
Figure 2.4: The Main Two Approaches Used When Constructing the BERT Model
for Different Tasks [12].
Applications
At the time of its introduction, BERT had superior performance compared to previous
models on 11 different NLP tasks [12], including well-established benchmarks such
as language inference (GLUE), QA (SQuAD), and NER. With its contextually-
aware embeddings, BERT’s architecture makes it well-suited for tasks requiring deep
semantic understanding, as words or phrases with similar meanings are positioned
closely in the embedding space. Consequently, with some additional tuning, the
model can be particularly useful for topic modeling. The semantically similar words
can be modeled as a collection of topics.
As mentioned, BERT is important for various NLP applications. For QA tasks,
it significantly enhances performance by analyzing text bidirectionally, enabling
13
2. Theory
Conversational QA (ConvQA) [21], SQuAD-based fine-tuned models, and a first-
order pruning model (PAL-BERT) [22]. Moreover, BERT achieves SOTA results in
NER tasks via both fine-tuning and feature-based approaches, with language-specific
variants such as Chinese BERT for mineral NER [23], Persian MorphoBERT for
NER [24], and German BERT for legal NER [25]. It also supports commonsense
reasoning and NSP tasks, including NSP-BERT [26] (prompt-based BERT), Sense-
BERT [27] (predicts masked words with supersense categories), and KVL-BERT
[28] (applies BERT to visual reasoning). In particular, BERT has been used for
Part-of-Speech (POS) Tagging tasks – assigning grammatical tags for words from
a given document (e.g., noun, verb, adjective, etc.) [29]. Due to contextual aware-
ness, BERT outperforms CountVectorizer, TF-IDF, FastText, and ELMo in POS
tagging, extending the applications to include DA-BERT (BERT with deep-attention
mechanism understanding relationships between target and emotional words) [30],
BERT-POS (BERT with POS for sentiment analysis) [31], and applications such as
BERT for POS tagging task for various languages, including Arabic [32], Uzbek [33],
and the Algerian dialect [34].
Although BERT has been widely adopted for numerous NLP applications, its use
in topic modeling is still emerging. The most well-known topic model, BERTopic,
leverages BERT embeddings and c-TF-IDF, demonstrating its potential in generating
interpretable topics [10]. However, the full potential of BERT in topic modeling
remains relatively underutilized.
Variants and Adaptations
Originally, BERT was introduced in two sizes: BERT_base and BERT_large, with 12
and 24 encoder layers, respectively. Since then, numerous variations and compact
versions have been developed to improve efficiency and performance. For instance,
some of the most well-known include ALBERT (reduced parameters), DistilBERT
(lighter computational footprint), and RoBERTa (enhanced MLM and removal of the
NSP task) [19]. The paper discusses DistilBERT’s suitability for computationally-
limited devices, noting that it does not surpass BERT_large in performance. Similar
to the standard models, DistilBERT comes in base and large versions, with half the
number of layers and a more compact architecture compared to BERT_base.
Furthermore, BERT is frequently fine-tuned for domain-specific tasks, leading to
specialized models such as MobileBERT for mobile applications, language-specific
models (BERTino for Italian, ITALIAN-LEGAL-BERT for legal Italian texts, BERTje
for Dutch), and topic modeling approaches like BERTopic. Another variation of BERT
is Sentence BERT (SBERT) [35], a sentence-specific Siamese (or triplet) transformer
that produces fixed-length sentence embeddings, significantly outperforming BERT
in semantic tasks. These adaptations allow BERT to achieve better performance in
specialized fields while maintaining its core architecture.
2.1.2.3 Model Architecture
Unlike LDA and other traditional topic models, BERTopic takes into consideration
the semantics and context of the documents in question. Despite their popularity and
14
2. Theory
success in topic modeling, older models often struggle with somewhat inconsistent
data [36]. Due to their probabilistic nature, even minor details can affect the output
of the models, impeding accurate and coherent result interpretations. Additionally,
traditional models generally output fewer topics compared to BERTopic. While this
does not necessarily negatively affect the results, having a higher number of topics
can increase the interpretability of the documents.
The following sections discuss the key components of BERTopic. As seen in Figure
2.5, BERTopic includes Embedding Generation to create sentence- or paragraph-level
vectors, usually using SBERT. These vectors then undergo Dimensionality Reduction
to improve the model’s performance. Furthermore, Clustering (HDBSCAN) is applied
to construct hierarchical topic groups based on density. Once the clusters are formed,
CountVectorizer and c-TF-IDF are used to extract word sequences based on their
occurrence in a document. Optionally, Fine-Tuning can be utilized for task-specific
requirements and further processing.
Figure 2.5: BERTopic Sequence of Steps to Create Its Topic Representations [37].
Embedding Generation
BERTopic uses sentence- or paragraph-level vector embeddings to make documents
and topics comparable [10]. These 768-dimensional embeddings provide numer-
ical representations for clusters based on semantic similarity for further dimen-
sionality reduction and clustering. By default, BERTopic uses Sentence Trans-
formers (SBERT), though the optimal embedding model depends on the applica-
tion and goal. Other sentence embedding models include all-mpnet-base-v2,
paraphrase-albert-small-v2, and multi-qa-mpnet-base-dot-v1, fine-tuned for
various use-cases (e.g., paraphrasing, semantic search, and multi-QA). Moreover,
Gensim word embeddings (e.g., GloVe, FastText, Word2Vec) are also widely used.
Several multilingual versions, such as distiluse-base-multilingual-cased-v2
and paraphrase-multilingual-MiniLM-L12-v2, support over 50 languages.
15
2. Theory
Dimensionality Reduction with UMAP
The embeddings generated in the previous step are high-dimensional and can notably
slow down the topic modeling and downstream clustering process. While each embed-
ding’s dimension remains constant, the increase in dataset size results in an overall
larger embedding matrix. Therefore, to manage the embeddings and to improve
the performance of clustering models, topic models often require dimensionality
reduction, which involves reducing the number of features while preserving the struc-
ture of the data [38]. While other models use latent topic models (including LDA)
combined with machine learning techniques for dimensionality reduction, BERTopic
uses Uniform Manifold Approximation and Projection (UMAP). This technique
represents the data as a weighted graph in a high-dimensional space and reduces it
to a low-dimensional form while preserving its underlying structure [39].
Specifically, to control the formation of the clusters, UMAP’s n_neighbors hyperpa-
rameter is used, where decreasing the value leads to smaller, more distinct clusters,
and increasing it results in larger, broader ones.
The UMAP is applied before the clustering algorithm.
Clustering (HDBSCAN)
Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDB-
SCAN) was introduced as an improvement to DBSCAN [40]. Unlike the latter, HDBSCAN
builds clusters using a hierarchical structure based on varying density levels. While
alternatives like K-Means, OPTICS, and Gaussian Mixtures are commonly used,
HDBSCAN offers more flexibility and adaptability due to its tunable parameters. The
key parameter, min_cluster_size, regulates the minimum number of documents
required to form a cluster (i.e., a topic based on similar documents). Increasing this
value results in fewer but larger clusters, whilst decreasing it can result in so-called
miniclusters [41]. In parallel, min_samples can be used to influence the clustering
after adjusting min_cluster_size. A higher min_samples value results in stricter
clustering, as it increases the likelihood of classifying points as noise.
Representation: CountVectorizer and c-TF-IDF
To generate human-interpretable keyword lists that represent the topics extracted
during the clustering step, the documents must first be transformed into a machine-
readable format. In particular, BERTopic uses CountVectorizer, which converts
document tokens into matrix form [42]. This BoW representation involves converting
strings into tokens (usually words), counting the occurrences of each token, and then
normalizing the token frequencies across the documents. The resulting matrix is
sparse, as it typically contains many zero values for tokens that do not appear in a
particular document.
CountVectorizer is commonly used alongside the term-frequency times inverse
document-frequency (TF-IDF) measure. TF-IDF is a tool for equally distributing the
weight of a token, minimizing the importance of commonly used words in a document
16
2. Theory
[43]. TF-IDF can be calculated as the multiplication of the term frequency and the
inverse document frequency by using the following formula:
( )
tf-idf( ) = tf( )× log 1 + nt, d t, d 1 + df( ) + 1,t
where t is the term in a document d, n is the total number of documents, tf(t, d)
is the number of times term t appears in document d, and df(t) is the number of
documents in which t appears.
For topic modeling and clustering, class-based TF-IDF (c-TF-IDF) can be used instead.
Rather than computing TF-IDF for individual documents, c-TF-IDF is applied after
clustering to represent entire groups of documents (i.e., topics). In BERTopic,
CountVectorizer and c-TF-IDF together help build a topic representation model.
c-TF-IDF is useful for identifying cohesive terms that characterize each topic and
can be calculated using the following formula:
( )
Wx,c = ∥tfx,c∥ × log 1 +
A
,
fx
where tfx,c is the frequency of the term x within class (cluster) c, fx is the total
frequency of term x across all classes, and A is the total number of words across all
classes.
Additionally, CountVectorizer can be used alongside c-TF-IDF to remove stopwords
and overly frequent terms after the topics have been identified, ensuring that the
most informative content is retained.
Additional Preprocessing/Fine-Tuning
Whilst CountVectorizer and c-TF-IDF could be the final step in some topic mod-
eling tasks, further fine-tuning and model refinements might be necessary. Since
BERTopic with default parameters provides general results, additional adjustments
can improve the coherence, relevance, and interpretability of the extracted topics.
Some fine-tuned representation models include LangChain (leverages LLMs and QA
chains to generate descriptive topic labels from documents), MaximalMarginalRele-
vance (which extracts diverse keywords by minimizing redundancy), and KeyBERT
(which optimizes keyword extraction using BERT-based embeddings) [44], [45].
For multimodal data, the VisualRepresentation model can be used to associate
topics with corresponding images. Another important refinement technique is POS
tagging, which filters and refines topic representations based on parts of speech.
To optimize both computational efficiency and topic quality, the PartOfSpeech
model selects documents containing relevant keywords and applies predefined POS
filters to generate new refined keyword sets. Depending on the n-grams outputted,
these filters can include nouns, adjectives, verbs, etc., as well as combinations (e.g.,
adjective+noun, noun+noun, etc).
17
2. Theory
Performance
BERTopic is a solid candidate among the highest-performing topic models today.
The model’s rise in popularity has prompted a growing body of competitive research
against traditional topic models. Additionally, some researchers ([11], [36]) identified
BERTopic’s superiority over the SOTA topic models.
For instance, BERTopic outperforms traditional models in terms of higher coherence,
often producing a greater number of meaningful topics [36]. The model automatically
finds and categorizes topics, requiring little to no manual intervention, though
some fine-tuning or hyperparameter optimization might be needed for custom or
fine-grained tasks and applications.
Thanks to c-TF-IDF, BERTopic can distinguish distinct topics across different
clusters, even when they share overlapping vocabulary. This means the same words
can appear in different contexts and still contribute to clearly separated topic
representations.
Moreover, the model supports hierarchical topic reduction, dividing the topics into
higher and lower sublayers. This feature helps users explore specific topics or view
broader topic categories when needed. To visualize the topics, BERTopic includes
intuitive and interactive built-in graphs, which highlight the probabilities of the most
frequent keywords within a topic.
Lastly, the BERTopic documentation explains every step and underlying logic in
detail, making the model transparent and easy to follow. This level of clarity reduces
room for ambiguity or misinterpretation, and helps minimize the risk of biases [11].
Whilst BERTopic excels in many aspects of topic modeling, it is important to
highlight its limitations. BERTopic encodes and processes text at the sentence or
paragraph level, meaning it benefits from clearly structured input. The model might
struggle with texts that have irregular sentence structures or missing punctuation.
In addition, to achieve high-quality results, the model often requires careful hyper-
parameter selection. Grid search or other optimization methods may be needed to
outperform traditional topic modeling approaches in some contexts. Similarly, some
SOTA models have simpler architectures, whereas BERTopic, even after dimensional-
ity reduction, requires more computational power due to its reliance on transformer
embeddings and clustering.
Finally, as BERTopic relies on pretrained language models, it may inherit the language
biases present in those models, affecting performance on non-English or low-resource
languages.
2.2 Evaluation Metrics
This section includes a description of the metrics used to evaluate the performance
of the two topic models. The evaluations include intrinsic metrics, such as coherence
and perplexity, and several OCTIS framework metrics.
18
2. Theory
2.2.1 Intrinsic Evaluation Metrics
Topic models generate sets of topics, each typically represented as a list of words, but
they can also be expressed as distributions over documents or other forms, depending
on the use case. However, the output of a topic model is not immediately measurable
in terms of quality. To assess whether a model has learn meaningful topics, a range
of evaluation metrics has been developed that quantify different aspects of topic
quality.
In topic modeling, coherence has become one of the most commonly used intrinsic
evaluation metrics [46], [47]. Its goal is to quantify how semantically consistent the
top words in a topic are by examining how frequently these words appear together
in reference corpora. A coherent topic, for instance, might contain the words doctor,
nurse, hospital, vaccination, as these words commonly co-occur in natural language
and are likely to be interpreted by humans as belonging to the same semantic theme.
A less coherent topic would be donut, house, europe, cat, dollar, where the words do
not have a clear connection and would be judged incoherent by human standards.
One commonly used coherence metric is the CV measure, which evaluates semantic
similarity among top-ranked topic words based on their co-occurrence within a sliding
window and a sliding segmentation of a reference corpus [46]. For example, given a
set of top words W = {w1, ..., wN}, CV coherence evaluates all pairwise combinations
(wi, wj), scoring each pair based on how frequently they appear together in the
context of the corpus. These pairwise scores are aggregated using the arithmetic
mean, resulting in a single coherence value for the topic. High coherence values
generally suggest more interpretable topics that reflect real-world concepts.
However, coherence scores have shown limits in their actual comparison to human
interpretation [48]. For instance, a topic like data, database, dataset, datum, data
drive may achieve a high coherence score due to frequent word co-occurrence. From
a statistical perspective, the topic appears coherent. Yet semantically, it offers little
depth or distinction and revolves around slight variations of a single word rather than
capturing a broader or more informative concept. In such cases, the model may be
learning repetitive or overly narrow topics. This issue arises in part because coherence
measures tend to reward statistical regularity, which may not always correspond with
actual thematic relevance. Additionally, coherence is highly sensitive to preprocessing
decisions and the choice of reference corpus. This means that coherence scores are
not always comparable across datasets or experimental setups.
Perplexity is another metric used to evaluate topic models [48]. It measures the
model’s ability to predict a held-out test set by calculating the average log-likelihood
of unseen data. A lower perplexity indicates a better generalization performance and
will not be negatively affected by a new document, while a higher perplexity will
have trouble understanding new information.
Since perplexity is based on the likelihood of word distributions, it is suited for
probabilistic models like LDA, which explicitly model the probability of each word in a
document based on learned topic-word and document-topic distributions. In contrast,
BERTopic does not follow a generative probabilistic framework. Instead of modeling
19
2. Theory
word probabilities, it clusters document embeddings and extracts representative
keywords. As a result, its output does not provide the necessary probabilistic
structure for computing perplexity, making this metric inapplicable to BERTopic in
a meaningful way.
However, it is important to acknowledge that the extent to which perplexity and
coherence reflect human interpretability remains debated. Even when applied to
probabilistic models such as LDA, perplexity has known limitations. Prior work
has shown that perplexity does not always correlate with human judgment of topic
quality and may favor statistically optimal yet semantically weak topics [48]. Similarly,
while coherence, particularly the CV measure, is often used to approximate human
interpretability [46], [47], others have shown that statistically coherent topics can
result from repetitive or overly narrow topics that lack semantic meaning or thematic
clarity [48]. These inconsistencies reflect a broader issue in topic model evaluation,
where statistical metrics often reward internal regularity rather than meaningful
thematic structure. Given this tension, these metrics should not be treated as
absolute measures of model performance, but should be used as practical guides for
hyperparameter tuning [8].
2.2.2 OCTIS
To evaluate topic models effectively and under consistent conditions, it is important
to consider both the quality of individual topics and the usefulness of the mod-
els as a whole. The OCTIS framework offers a reproducible pipeline to facilitate
such evaluations [49]. OCTIS allows for standardized experimentation across various
datasets and models by providing a unified framework for metric-based evalua-
tion. While many evaluation metrics exist, this thesis emphasizes four core mea-
sures: Coherence, Topic Diversity, Inverted Ranked-Biased Overlap (IRBO),
and WECoherencePairwise. Together, these metrics provide insights into the inter-
pretability, diversity, and redundancy of the resulting topics.
In the context of OCTIS, the most commonly used coherence measure is CV Coherence.
As described in Section 2.2.1, it evaluates the semantic consistency of topic words
based on co-occurrence statistics extracted from the input corpus. However, coher-
ence alone does not fully characterize the quality of a topic model, particularly in
cases where multiple topics share overlapping vocabulary or only differ slightly in
emphasis.
While CV Coherence relies on co-occurrence within the evaluation corpus, OCTIS’s
metric WECoherencePairwise adopts a different strategy based on semantic similar-
ity using word embeddings [49]. It computes the average pairwise cosine similarity
between the top-k words in each topic within a high-dimensional vector space. The
embedding model used by default is built into OCTIS and consists of 300-dimensional
Word2Vec vectors trained on the Google News corpus. These embeddings are auto-
matically downloaded and handled by the framework. Because this metric leverages
distributional semantics rather than observed frequencies, WECoherencePairwise is
less sensitive to corpus size and noise. As a result, it can capture hidden semantic
relationships that may not be observed through local co-occurrence patterns.
20
2. Theory
Topic Diversity quantifies the uniqueness of words across topics, thus penalizing
models that produce near identical topics with only minor variations [49]. It is
calculated as the proportion of unique words in the top-k topic words across all
topics. The metric aligns with the idea of decomposability in interpretability, each
topic should ideally contribute new, distinct information to the model’s overall
representation of the dataset. A high Topic Diversity score implies that the
model has learned a broad and varied set of themes rather than repeating the same
information across multiple topics.
While Topic Diversity assesses the uniqueness by measuring how many distinct
words are used across topics, it does not account for how similar those topics are in
terms of their ranked word structure. To complement this, IRBO is used to evaluate
topic redundancy by comparing the ranked word lists of all topic pairs [50]. Rather
than counting unique words alone, IRBO measures how often the same words appear
in similar positions across different topics. A high IRBO score indicates that many
topics share similar high-ranking words, while a lower score suggests that the model
has produced more clearly separated and structurally diverse topics.
2.3 Political and Ethical Context of AI Policy
To understand how AI is evolving, it is important to understand the circumstances
under which it is developed and used. The evolution of AI is not purely isolated to
a technical context, but is as much a political and ethical one. The development
is affected by who benefits from its use and who bears the risks [51]. National
AI strategies are therefore not only blueprints for innovation, but they also reflect
political systems, cultural norms, and regulatory philosophies.
AI is neither neutral nor isolated from existing power structures [52]. The choices
made in AI policy documents, about what to regulate, what to incentivize, and what
ethical principles to prioritize, reflect broader governance ideologies [53]. This means
that analyzing AI policies also provides insight into how governments understand
and frame their responsibilities towards citizens in the context of AI development
and use.
To enable comparative analysis of AI policies, this thesis draws on the Democracy
Index published by the Economist Intelligence Unit [3]. The Index score is derived
from expert assessment and public opinion survey, based on 60 indicators grouped
into 5 categories: electoral process and pluralism, functioning of government, political
participation, political culture, and civil liberties. A panel of country analysts and
experts from the Economist Intelligence Unit reviews the data and assigns scores
according to a standardized coding system (e.g., 0-1 or a three-point scale), ensuring
consistency across countries. This, in turn, categorized the countries into one of
four government types (see Table 2.1): Full Democracy, Flawed Democracy, Hybrid
Regime, and Authoritarian Regime.
The UNESCO Recommendation on the Ethics of Artificial Intelligence [54] provides a
useful normative framework for analyzing AI policies. This document was constructed
by researchers collaborating with international stakeholders (e.g., government, private
21
2. Theory
Government Type Democracy Index Score
Full Democracies 8.00–10.00
Flawed Democracies 6.00–7.99
Hybrid Regimes 4.00–5.99
Authoritarian Regimes 0.00–3.99
Table 2.1: Democracy Index Classifications [3].
companies, academia, etc.). It builds on already existing frameworks, implementing
international law that is focused on human rights and dignity, equality, protection,
and more. In particular, the recommendation identifies ethical principles and en-
courages countries to align their AI governance with these goals. These principles
are split into the following themes: Ethical Impact Assessment, Ethical Governance
and Stewardship, Data Policy, Development and International Cooperation, Environ-
mental and Ecosystems, Gender, Culture, Education and Research, Communication
and Information, Economy and Labour, and Health and Social Well-Being. However,
implementation remains voluntary, and the symbolic nature of many national AI
strategies means that they may function more as public relations tools than as
enforceable commitments [55]. Understanding AI strategies through these theoretical
lenses provides a more comprehensive picture of global AI development and shows
how data science intersects with governance, ethics, and public values.
22
3
Methods
This chapter includes detailed steps taken to answer the research questions based
on the introduced theory. The first section includes the data collection process,
highlighting data cleaning, web scraping, and post-scraping processing steps to
make data usable for topic modeling. Furthermore, BERTopic and LDA modeling
techniques are described, including chunking, hyperparameter optimization using
GridSearch, and topic extraction. Lastly, the chapter concludes with a policy
analysis framework design based on UNESCO’s ethical recommendations, which will
then be used for qualitative analysis of the models and different regimes.
3.1 Data Collection and Processing
For this project, the dataset was constructed using data from the OECD AI Obser-
vatory, which provides a Comma-Separated Values (CSV) file containing countries
and their respective AI policies. The following subsections detail the data processing
steps, including pre-scraping cleaning, web scraping, and post-scraping preprocessing.
3.1.1 Data Cleaning (Pre-Scraping Stage)
To ensure data usability, we filtered the dataset to retain only the “Country” and
“Public access URL” columns, discarded rows with missing data, and saved the
resulting data into a new dataframe. We manually created a dictionary associating
each country from the OECDs AI Observatory CSV file with its respective democracy
index. The democracy index values were sourced from The Economist [3], as their
methodology is grounded in empirical observations of state governance. Thus, the
countries were divided into 4 classifications (see Table 2.1), where the higher the
democracy index, the more democratic the country is said to be. The government
type was used for further analysis.
Additionally, we checked the status code of each “Public access URL” – URLs
with status code 400 or higher (indicating errors such as bad requests, forbidden
access, or not found) were flagged, whilst those with status codes below 400 were
considered operational. Since some websites have specific anti-bot protection to
avoid overloading the server, basic measures were taken to maintain as many links
as possible, including retry strategies using HEAD and GET requests, and custom
user-agent headers Table A.2 provides an overview of the status codes with the
23
3. Methods
number of times they were encountered.
3.1.2 Scraping
Given that scraping strategies differ by document type, we categorized the links into
two groups: PDFs and non-PDFs. Either one of the following conditions had to be
satisfied for a link to be considered a PDF: Content-Type being “application/pdf”,
links ending with “.pdf” or “.docsx”, or Content-Disposition including an at-
tachment.
Originally, the CSV file contained data for 70 countries. However, 3 countries were
dropped due to no working links, resulting in 67 total countries. After the PDF and
non-PDF checks, 60 countries returned non-PDFs, whilst only 35 returned PDFs.
The two lists of documents were combined, resulting in a total of 66 countries with
at least PDF and/or non-PDF scraped content.
To scrape the text from the non-PDFs, BeautifulSoup, Cloudscraper, and Selenium
were used. BeautifulSoup library [56] extracts visible text, removing irrelevant
information for further processing (e.g., script and style elements, headers, and
footers). Due to its efficiency, Cloudscraper [57] is introduced as the first approach
to overcome anti-bot protections. However, if the module fails to access the link,
Selenium [58] was used as a fallback. Although the project is computationally
more expensive, the automatic user interaction emulated with websites provides
satisfactory results, increasing the number of links scraped.
To scrape the PDF files, we used the PyPDF2, BytesIO, and EasyOCR modules [59]–
[61]. BytesIO was used to interact with the links directly, without requiring separate
downloading. PyPDF2 was used to efficiently split the files into pages and extract the
text. However, some files resulted in noisy data, requiring an OCR-based method
instead. Therefore, for several countries – including those with handwritten content
and poorly formatted PDFs where traditional text extraction failed – EasyOCR was
used. These tools were chosen for their complementary strengths, which ensured a
high retrieval rate across diverse document formats.
3.1.3 Data Preprocessing (Post-Scraping Stage)
To finalize the data gathered from the scraping stage, we applied several preprocessing
methods. To standardize the dataset language, we evaluated whether to exclude
non-English data or translate it into English. The need for translation was assessed
by comparing the words in the scraped text against a reference list of 479,000
English words [62]. This list was selected for its substantial size – approximating the
estimated 600,000 words in the English language – and served as a reasonable proxy
for estimating the proportion of English content in the dataset. The analysis revealed
that approximately 1.7 million out of 4 million words did not match the reference list.
Most of the matching English content originated from English-speaking countries
such as the United Kingdom, Canada, and Australia. To maximize the size and
comparability of our corpus, we opted to translate non-English text into English.
24
3. Methods
• Using the Google Translate API (Googletrans), we split the files into chunks
and then automatically detected the source language. After translation, the
original format of the files using translated chunks was restored.
• After applying the same English word check, we selected country files containing
less than 80% English text. Additional translation was used by manually
detecting the languages within those files, and then Googletrans was used to
translate from those languages to English.
• A similar approach was used for countries using Latin-based alphabets with
diacritical marks (e.g., Sweden, Poland, Lithuania). In these cases, certain
characters specific to the respective languages (such as “ö”, “ł”, or “š”) were
being removed or distorted during later processing steps, resulting in unintelli-
gible or incorrect words. To prevent this, files from these countries were further
translated by using Googletrans API with the source language manually set
to the country’s native language.
Figure 3.1 displays the distribution of the English words in each government type
after translation and data processing.
Full Democracy
Hybrid Regime 36.2%
Authoritarian Regime 2.6%
3.6%
57.6%
Flawed Democracy
Figure 3.1: Total English Words by Government Type.
However, these numbers are just a rough estimation solely based on the English
25
3. Methods
word list [62] and were used as a reference for whether the files need additional
processing. The final output of the test run indicated that the English corpus of the
mentioned text file classifies some English words as non-English (e.g., “cybersecurity”,
“nanotechnology”, and dates).
Furthermore, some data cleaning was applied. For instance, some of the translated
text combined sentences, resulting in paragraphs without proper spacing between
them. Additionally, some files contained noise from scraped PDFs, including scraper
interface residue such as “opens submenu items.” Moreover, there were instances
of unigrams and bigrams repeated multiple times in a row, further contributing to
noise. Removing the mentioned nonsensical text, along with unusual characters
and extra spacing, finalized our data processing steps. Unlike these non-informative
elements, country references were intentionally retained, as they are considered
meaningful textual features, as they can signal the geographic focus of a policy or
reflect differences in discourse. These references are treated as relevant components
of the topic model output, contributing to the interpretability of themes related to
the national context.
Moreover, UTF-8 character encoding was used throughout various preprocessing
stages, including reading, writing, and saving files. It encoded characters into
a sequence of 8-bit bytes and is widely preferred for web pages and electronic
communication due to its compatibility with ASCII and support for a wide range of
characters.
Lastly, model-specific processing was applied. Because the BERT model has a token
limit, the text was segmented into uniform 512-token chunks. Additionally, we
generated a separate version of the dataset applying spaCy’s lemmatization for
the LDA model. Lemmatization (e.g., converting “policies” to “policy”, “data” to
“datum”) ensures that similar words are consolidated, which is crucial for accurately
capturing word frequencies in probabilistic models.
Following preprocessing, the country-level dataset was split into four documents, one
for each government type.
Table 3.1 summarizes the key metrics for each government type. Detailed metrics for
the countries obtained from the original CSV dataset are provided in Appendix A.
Government Type Countries per Gov Type Working URLs Total Tokens
Full Democracy 23 307 1,050,225
Flawed Democracy 28 329 1,879,726
Hybrid Regime 10 60 78,080
Authoritarian Regime 9 65 106,976
Total 70 761 3,114,997
* Total number of countries per government type includes the four countries that result
in 0 working links and 0 tokens.
Table 3.1: Aggregated Metrics by Government Type.
26
3. Methods
3.2 LDA
This section describes the methodology for LDA to output topics from policy docu-
ments. In particular, preprocessing and chunking, as well as hyperparameter search
steps, are highlighted.
3.2.1 Text Processing and Chunking
Prior to LDA modeling, the data was preprocessed to build a Gensim dictionary and
corpus. In addition to standard English stopwords, a set of domain-specific stopwords
was compiled and removed. These included frequently occurring but semantically
uninformative tokens such as “artificial”, “intelligence”, “cookie”, and “http”. These
terms were overrepresented due to the scraping of policy documents that discussed
AI and web interfaces, but they did not contribute meaningfully to distinguishing
topics. Moreover, isolated characters (e.g., “a”, “b”, “c”, etc.) were removed to
prevent noise in the data.
All policy text documents were tokenized using NLTK’s sentence and word tokenizers.
The tokenized data was segmented into overlapping chunks to accommodate LDA’s
assumptions regarding document size and topic distribution. Each chunk was capped
at 512 tokens, with a 40-token overlap between consecutive chunks to preserve
contextual continuity. Unlike BERTopic, which relies on semantic embeddings and
benefits from larger overlaps to maintain contextual flow, LDA treats documents as
bags-of-words and models topic distribution across the entire corpus. Therefore, a
smaller overlap was sufficient to maintain coherence while improving computational
efficiency. This approach ensured a balance between semantic completeness and
computational tractability.
Each chunk was then tokenized into lowercase words and filtered to exclude custom
stopwords. The processed chunks were then transformed into bag-of-words represen-
tations in Gensim’s dictionary object. This step created a mapping between each
token and a numerical reference to standardize the vocabulary across the corpus.
3.2.2 LDA Modeling and Hyperparameter Tuning
For the purpose of topic modeling, this study employed LDA implemented via
the LdaModel class from the Gensim library [63]. We selected it over Scikit-Learn
because it allows a more in-depth analysis of the results and supports online variational
Bayes inference, which is more computationally efficient and scalable. Given the
objective of comparing the thematic structure of policy discourse across government
regime types, Gensim provides methodological reliability while enabling the analysis
to focus on the interpretation of cross-regime thematic patterns. Developing a custom
LDA implementation would be more appropriate in a study aimed at improving or
extending the algorithm itself, which falls outside the scope of the present research.
The LDA models were trained separately for each government category and tailored
for their own data to account for variations in structure and quantity. A grid search
approach was used to tune the hyperparameters for each type of government. The
27
3. Methods
hyperparameters used were the number of topics the model should identify, the
number of passes through the training corpus, alpha, and eta.
To identify the most effective configuration for each regime type, the grid search was
conducted over the predefined ranges of hyperparameters presented in Table 3.2.
Parameter Candidate Values
num_topics 5, 10, 15
passes 20, 30
alpha ’symmetric’, ’asymmetric’, 0.01, 0.1
eta ’symmetric’, ’asymmetric’, 0.01, 0.1, 0.5
Table 3.2: Grid Search Parameters for LDA Model Optimization.
The model was evaluated using topic coherence and perplexity to assess both semantic
interpretability and statistical fit. For each government type, except for Flawed
Democracy, the five configurations with the highest coherence scores from the grid
search were shortlisted. If multiple configurations had identical coherence scores,
perplexity was used as a secondary criterion to break ties. From these five top
candidates, the configuration with the highest number of topics was selected to
enable a more detailed thematic analysis and facilitate cross-topic comparison in the
subsequent qualitative analysis. The Flawed Democracy subset produced a relatively
low number of topics. To facilitate a fairer comparison with BERTopic’s output, the
number was manually augmented.
All models were trained with online learning enabled and a fixed random seed to
ensure reproducibility. Additionally, word-level topic distribution were enabled during
training to allow traceability for individual terms in the produced topics. The final
set of hyperparameters selected for each government type is shown in the Results
section in Chapter 4; see Table 4.1.
3.3 BERTopic
As an alternative to LDA, the BERTopic model was used to process a set of documents
and, depending on the parameters, generate personalized topics. As mentioned in
Section 3.1.3, we split the original dataset into four separate subsets, each containing
policy texts for the respective government type. This was done to prevent the model
from interpreting a single large document – containing policies from all regimes – as
one topic. To account for the diverse and, in some cases, noisy nature of our data (e.g.,
the presence of non-English words and incomplete sentences), we modified and fine-
tuned several model components and hyperparameters after an initial run produced
overly vague topics. In the following, we introduce several adjusted components of
the methodology:
• Document Chunking: As mentioned, the BERT model has a token limit,
therefore, each government-type document was split into chunks of 512 tokens
with an overlap of 100 tokens. This overlap is important because some sentences
28
3. Methods
do not end with a full stop, so additional context from previous sentences is
necessary for correct processing.
• Sentence Embedding Model: Initially, as recommended by the BERTopic
creator, we used the all-MiniLM-L6-v2 sentence transformer to embed the in-
put documents. However, because some non-English words appeared in the out-
put, we switched to the sentence-transformers/paraphrase-multilingual-
MiniLM-L12-v2 model [64]. Like the original version, the multilingual sentence
transformer effectively captures the semantic similarities between texts and is
well-suited for tasks like semantic search and clustering, even for non-English
documents.
Note: The model still took the translated documents as input because, due
to its probabilistic nature, the LDA model does not differentiate between
multilingual outputs. Thus, to obtain as uniform data across the models as
possible and for a more consistent output comparison, the models receive the
same dataset as input.
Example: A sentence such as “Policies discussing artificial intelligence” might
be embedded as a 384-dimensional vector like [0.11, 0.96, 0.45, ..., 0.21].
• UMAPDimensionality Reduction: UMAP’s key hyperparameters, n_neighbors
(which sets the neighborhood size) and n_components (the target dimensional-
ity), were tuned via grid search, although its inherent stochasticity means that
identical outputs are not guaranteed each iteration.
Example: The high-dimensional vector for our earlier sentence might be reduced
to just 5 dimensions: [0.42, 0.73, 0.70, 0.64, 0.11].
• Clustering with HDBSCAN: HDBSCAN was employed as the clustering algorithm
to group similar documents. Its min_cluster_size parameter, which deter-
mines the minimum size of the cluster, was optimized through grid search.
This optimization helps prevent the creation of too many microclusters and
ensures that the output topics are meaningful.
Example: Suppose a chunk includes the terms “Spain”, “processor”, “algo-
rithm”, “United Kingdom”, “Brazil”, “computer”, “United States of America”,
“implementing”. One cluster might form around geographic entities ([ “Spain”,
“United Kingdom”, “Brazil”, “United States of America”]), and another around
technology-related terms ([“processor”, “algorithm”, “computer”, “implement-
ing”]).
• Topic Extraction with CountVectorizer and c-TF-IDF: We employed a
CountVectorizer together with a topic-level bag-of-words c-TF-IDF trans-
former to obtain distinct topics. Custom stop words (e.g. “ai”, “artificial”,
“intelligence”) were introduced since they appeared in every topic, contribut-
ing little to the later discussion, and we deliberately removed certain words
(“not”, “no”) from the default list because they provide additional context
(e.g., distinguishing “important” from “not important”). We configured the
vectorizer to extract one- to two-word n-grams since single-word topics might
29
3. Methods
not accurately capture the themes. Additionally, common words that appear
in most documents were removed by setting reduce_frequent_words=True
in the c-TF-IDF model.
Example: Suppose that we have chunks that discuss some data-related poli-
cies. After computing c-TF-IDF weights, the top 3 n-gram keywords might
be: {(“data privacy”, 0.43), (“privacy cookies”, 0.33), (“request information”,
0.11)}. Based on these keywords, the label (topic) assigned for this group could
be “Data Privacy”.
• POS-Based Topic Representation: A representation model based on POS
tagging was used to obtain unique topic-level clusters instead of document-
level clusters. By focusing on adjective+noun, noun+noun, verb+noun, and
adjectives alone, the model outputs topics that more effectively represent the
document content. Although running the model without POS might result in
higher coherence scores, incorporating POS tagging generally improves topic
interpretability.
Example: In the earlier technology-related cluster, POS filtering might re-
tain “computer”, “algorithm”, and “processor” but exclude all the verbs (e.g.,
“implementing”).
• Hyperparameter Grid Search: A grid search was performed to deter-
mine the optimal hyperparameters for each government type. We varied the
target number of topics (nr_topics), UMAP’s n_neighbors, and HDBSCAN’s
min_cluster_size. The nr_topics variable includes a wider range of topics
because, unlike LDA, it indicates the maximum number of topics, and not the
exact number (i.e., the variables are not identical). Because of the dispropor-
tional differences between government-type datasets, the best hyperparameters
were selected according to the coherence score obtained for each type. In the
final step, for each government type, we selected the top five (except for Hybrid
Regime) hyperparameter sets based on their coherence scores and then chose
the set with the highest nr_topics for further detailed analysis. Table 3.3
shows the parameters used for the grid search.
However, since the Hybrid Regime could not produce more than 3 topics for
any hyperparameter combination, we extended the hyperparameter range for
this regime. In particular, the Grid Search was run for nr_topics=16 and
then selected the set that resulted in the most topics within the top 10 highest-
coherence sets. Note: nr_topics was set to 16 since that is the maximum
value of topics that the model can reach, not the exact or minimum.
30
3. Methods
Parameter Candidate Values
nr_topics 4, 6, 8, 10, 12, 14, 16
umap_n_neighbors 2*, 5, 10, 15, 20*
hdbscan_min_cluster_size 2*, 5, 10, 15, 20*
Table 3.3: Hyperparameter Grid Search for BERTopic.
* Values 2 and 20 were explicitly introduced for the Hybrid Regime, with
nr_topics=16.
3.4 Qualitative Topic Analysis
To interpret the topics and keywords provided by the LDA and BERTopic models, a
qualitative analysis was performed.
The keywords for each government type topic, output by each topic model, were
investigated in detail. Specifically, the aim was to find the exact documents where
all 10 keywords of a specific topic were present.
However, this was not always possible since the topics were constructed based on
embeddings rather than originating from a specific document. Thus, the second
approach was finding documents (512-token chunks) that contained the largest
number of the 10 keywords. For example, if a certain document included the most
keywords (e.g., 9 out of 10), that document was selected.
Depending on the depth and distribution of the topic, different documents could
contain the same number of keywords. Therefore, all such documents were included
for qualitative analysis. However, the number of documents varied between each
topic, ranging from one 512-token document to roughly ten.
The analysis itself involved reading the extracted documents to understand the
broader context of each topic.
Example: Let’s say a topic model outputs Topic 0 with the following keywords:
weather, sun, heatwave, rain, storm, lightning. In a case where all the keywords
appear, the document might read:
Today, Texas is experiencing a heatwave, with a high-intensity sun. The weather
is forecast to take a turn, with the upcoming week expected to bring a storm with
heavy rain and lightning.
In other cases, some of the keywords might appear in separate documents:
The weather today in Sweden is moderate. Sun will not be visible due to the storm
and continuous rain.
or
No heatwave or sun to be expected due to the upcoming storm, which will include
torrential rain.
31
3. Methods
These contain only four of the six keywords but still contribute to understanding the
topic.
3.5 Quantitative Comparison Using OCTIS
Following the training and tuning of both the BERTopic and LDA models, the OCTIS
framework was applied to evaluate their technical performance. OCTIS enables fair
comparison across models with differing architectures by providing standardized
implementations of widely used topic modeling metrics. Since each model was trained
separately for each government regime type, evaluation was likewise performed
independently per category, using the corresponding topic-word distributions. Four
intrinsic evaluation metrics were computed using OCTIS: CV Coherence, Topic
Diversity, WECoherencePairwise, and IRBO. Together, these metrics assess topic
interpretability, semantic consistency, lexical uniqueness, and redundancy, as outlined
in Section 2.2.2. All metrics were computed using their respective OCTIS classes with
default settings. CV Coherence and IRBO relied on co-occurrence patterns within
the corpus, while WECoherencePairwise used pretrained 300-dimensional Word2Vec
embeddings from the Google News corpus. Topic Diversity was calculated based on
the top 10 words per topic. Together, these metrics provide quantitative proxies
for properties such as semantic coherence, lexical diversity, and topical redundancy,
which are commonly associated with human interpretability.
Each metric was computed and stored per model and government type. Results were
rounded to four decimal places and compiled into a summary table for comparison
across models and governance categories. All evaluations were conducted using
fixed random seeds and consistent preprocessing to ensure comparability of the final
results.
It is important to note that although both models were evaluated under the same
OCTIS framework, they were assessed on their own respective preprocessed corpora.
This decision reflects the different architectural characteristics of LDA and BERTopic.
Rather than enforcing identical preprocessing across models, the evaluation process
aimed to respect each model’s strengths, ensuring that performance was measured
under conditions optimized for each model’s architecture in accordance with the
exploratory nature of the thesis: to determine which model performs more effectively
within the context of AI policy discourse across different political regimes.
3.6 Ethical Topic Variation Across Government
Type
This section includes a description of the UNESCO ethical recommendations, which
will be used as a basis for creating our analytical framework. The design of this
framework will be discussed in detail, alongside its comparison with our keywords
extracted by each topic model. The goal of this framework (Table A.1) and its
categories was to identify whether certain ethical concerns and nuances are present
32
3. Methods
and discussed within the AI policies, rather than to extract or reproduce the 11
UNESCO categories.
3.6.1 UNESCO Recommendation on the Ethics of Artificial
Intelligence
The adoption of AI worldwide has resulted in variations in how countries are expected
to approach the regulation of AI. To ensure a structured and thorough analysis of
the generated topics from LDA and BERTopic, we construct an analytical framework
based on UNESCO’s Recommendation on the Ethics of Artificial Intelligence [54].
The UNESCO recommendation serves as a normative framework for evaluating and
guiding the ethical development and governance of AI. It defines AI systems as
those capable of processing data and information in ways that resemble intelligent
behavior, such as reasoning, learning, and prediction. The document itself is directed
at Member States, both as AI actors and as regulatory authorities. Additionally, it
provides ethical guidance to all AI stakeholders, including the public and private
sectors.
Unlike regional AI regulations such as the European Union’s AI Act,UNESCO’s
recommendations are recognized by 193 member states and establish a universal set
of ethical guidelines. From these guidelines, AI policy topics can be systematically
assessed based on their alignment with ethical AI principles.
By aligning the extracted topics from our models with these predefined areas, we
can evaluate the results of the different government structures (democracy indexes)
and find dominant themes and assess whether variations exist in regards to ethical
considerations, public accountability, or economic priorities. Moreover, this frame-
work minimizes subjective biases and ensures a consistent interpretation throughout
the analysis.
3.6.2 Framework
To systematically assess AI policies, this framework classifies policy themes into
11 key AI governance areas outlined by UNESCO (see Table 3.2). The document
includes several sections, each dedicated to a different AI principle or ethical risk
(e.g., Data Policy, Education and Research, etc.), and paragraphs on how member
states should approach AI policies to ensure ethical usage of AI in the respective
areas. These sections serve as the framework categories in our thesis for measuring
differences in thematic emphases (see Table A.1 for more details).
33
3. Methods
Ethical Impact 6 Gender
1
Assessment 7 Culture
Ethical Governance 
2 8 Education and Research
and Stewardship
Communication and 
3 Data Policy 9
Information
Development and 
4 10 Economy and Labour
International Cooperation
Environment and Health and Social
5 11
Ecosystems Well-Being
Figure 3.2: The 11 Framework Categories Based on UNESCO’s Recommendation.
To ensure that the analysis remains grounded in established ethical AI principles,
a manual keyword extraction approach was employed. This method was chosen to
maintain conceptual accuracy while remaining unbiased and consistent with the
normative language used in UNESCO’s recommendation when analyzing the results.
A manual keyword extraction to assess the ethical policy areas allows for a context-
aware selection of terms that might not have been conveyed through automated
methods.
The process followed UNESCO’s own categorization of ethical AI principles and
policy areas to structure the analytical framework. Each section was read in full,
and key terms were identified based on their explicit relevance to AI governance.
Specifically, nouns, technical terms, and phrases describing AI impact assessments
were identified.
Example: The “Data Policy” section has a paragraph that says the following:
Member States should work to develop data governance strategies that ensure the
continual evaluation of the quality of training data for AI systems, including the
adequacy of the data collection and selection processes, proper data security and [...].
In this case, we would identify the following keywords: data governance, data security,
data quality. The same procedure would be applied for the rest of the paragraphs
and turned into a list representing the Data Policy topic.
3.6.3 Analytical Application
The results of the topic models were analyzed and compared to evaluate their utility
in interpreting AI policies and correlating themes with governance indicators using
the Economist Intelligence Unit’s Democracy Index [3].
The comparative analysis evaluated the outputs of two topic modeling approaches
applied to AI policy documents. The aim was to determine which model most
effectively extracts meaningful and interpretable themes, particularly concerning
the level of democracy in the countries producing these policies, as reflected in the
Democracy Index.
34
3. Methods
The evaluation examined the coherence and relevance of the topics generated by each
model, with a focus on how thematic priorities differed across the Democracy Index
spectrum. For example, democratic nations may emphasize themes like transparency,
ethics, and accountability, whereas authoritarian regimes might prioritize control,
innovation, or surveillance. This analysis also assessed how well the generated topics
reflected governance-related patterns and aligned with frameworks like UNESCO’s
AI Ethics Guidelines.
3.6.3.1 Government Types versus Framework
This section describes a thorough analysis of the government types and the constructed
framework. The topics and their keywords generated by the LDA and BERTopic
models were compared to each of the 11 categories of the framework to see which
ethical dimensions each regime emphasized, and how emphases varied across regime
types.
To compare the government types to each other and the framework, the following
steps were implemented:
For a better analysis, the keywords representing each topic from the LDA and
BERTopic models, as well as the framework keywords, were normalized. Since
both models included stopword removal, we applied it to the framework as well
for consistency. Moreover, NLTK’s word stemming was used, since morphological
variations of n-grams were not of interest for these comparisons. This step also
included converting the phrases to lowercase and tokenizing bigrams into individual
words. The normalization step was particularly useful for bigrams in the later stages.
For instance, the phrase “host organization” would be transformed into a list: [“host”,
“organ”].
To get the overlap scores for each framework category, two options were considered:
• Option 1: Full Overlap. For each government type’s topic, the unigram or
bigram keyword was compared to each of the framework’s keywords. If the
overlap was absolute, the framework category was returned with an assigned
score of 1 (e.g., Data Privacy = 1).
• Option 2: Partial Overlap. If there was no perfect match between the
keywords – which was more common with bigrams – the closest matching
framework (if any) was returned. The score was calculated as:
score = Number of overlapping tokensTotal tokens in framework keyword .
For example, the keyword [“discrimination”] (length = 1) would return a score
of 0.5 if the bigram framework keyword was [“discrimination”, “policy”] (length
= 2). If there was a partial overlap with several framework keywords, the
category with the highest overlap score was selected.
For each topic model, the output included different sets of framework categories
with their assigned scores. All overlap scores were kept in order to keep a broad
35
3. Methods
overview of all topics for further discussion. Furthermore, the results were aggregated
by government type in two steps: first by summing the overlap scores of keywords
within each topic for the same framework category, then summing those topic-level
totals across all topics for each government type.
For instance, let’s say we have two topics within the same regime. Topic 0 contains
keywords keyword_1 and keyword_2, whilst Topic 1 has keyword_a, keyword_b,
keyword_c, and keyword_d. Among these, keywords 1, 2, a, b, and c all have overlaps
with the “Data Policy” framework category with scores 0.50, 0.75, 0.33, 1.00, and
1.00, respectively. The first score would be calculated by summing up the scores
within each topic (i.e., Data Policy for Topic 0: 0.50 + 0.75 = 1.25, Data Policy for
Topic 1: 0.33 + 1.00 + 1.00 = 2.33). The second aggregation combines these into
an overall regime-level score for “Data Policy”, yielding 1.25 + 2.33 = 3.58. This
two-stage aggregation provided a clear overview of which ethical dimensions were
most discussed by each government type.
Both raw and normalized data were used for comparison. The raw data (as described
earlier in this section) allowed for a comprehensive analysis of topics within each
government type. The normalized data was calculated by taking the raw overlap
score and dividing it by the total overlap score across all overlap categories for each
government type. The normalized data supported comparisons across government
types, especially since different models could produce an uneven number of topics.
These comparisons allowed us to identify the dominant themes emphasized by
different regimes and assess whether distinctive differences emerge between them.
36
4
Results
This chapter includes all the results obtained by the LDA and BERTopic models,
including the hyperparameters from the grid search and topics with the respective
keywords and evaluation metrics. Moreover, the Comparison Section 4.3 includes
the results obtained by the personalized method used for the government type-level
comparison and OCTIS for model comparison.
4.1 Model Configuration and Setup
This section presents the final hyperparameter settings used for the topic models
whose outputs form the basis of subsequent analysis.
4.1.1 LDA
We applied grid search tuning for LDA across each regime type to identify the best-
performing hyperparameter configurations. Specifically, we used number of topics
(num_topics), number of training passes (passes), and priors (alpha and eta). As
detailed in Section 3.2.2, the models were evaluated based on coherence and perplexity
to ensure the interpretability of the topics. While using different hyperparameters
across regime types introduces some challenges for direct comparison, this approach
was necessary due to substantial differences in both the quantity and content of the
data. Therefore, specific tuning for each government type ensured that the resulting
topics were as meaningful and interpretable as possible within each context. For
Flawed Democracy, in particular, we prioritized a slightly less optimal configuration
with a lower coherence score to support a more balanced comparison between LDA
and BERTopic.
Table 4.1 summarizes the final hyperparameters used for each government type. The
complete grid search results for the top five, configurations per regime are provided
in Appendix A.4.
37
4. Results
Government Type num_topics passes alpha eta
Full Democracy 10 30 auto 0.05
Flawed Democracy 10 30 auto 0.01
Hybrid Regime 15 20 asymmetric 0.01
Authoritarian Regime 15 20 0.01 0.01
Table 4.1: LDA Hyperparameters by Government Type.
4.1.2 BERTopic
For BERTopic, we ran a grid search using different combinations of hyperparameters
– specifically varying the target number of topics (nr_topics), n_neighbors, and
cluster_size – both without a representation model (POS tagging) and with it.
Each combination yielded a set of topics and an associated coherence score. The
results show that the model without the representation model generally produced
higher coherence scores compared to the one with POS tagging (see Appendix A.6).
However, POS tagging imposes restrictions on the types of topics that are extracted.
Because there is a trade-off between coherence and interpretability, we decided to
retain the representation model to achieve a more meaningful analysis.
After the grid search, we selected the following strategy for choosing hyperparameters.
For each government type (except Hybrid Regime, see Hyperparameter Grid Search
in Section 3.3), we identified the top five hyperparameter sets that yielded the
highest coherence scores (see Appendix A.5). From these sets, we chose the ones
with the highest value for nr_topics. This decision was based on the observation
that the coherence scores do not differ significantly, while a higher nr_topics value
produced a more interpretable and detailed topic structure – especially important
given the disproportionality in our dataset. However, the nr_topics hyperparameter
is the maximum number of topics BERTopic will produce; thus, we also show the
nr_output_topics parameter - the exact number of topics the model returned with
the respective hyperparameters. Moreover, the custom hyperparameters allowed for
a wider range of topics for further analysis.
Government Type nr_topics n_neighbors cluster_size nr_output_topics
Full Democracy 16 5 5 10
Flawed Democracy 16 15 10 10
Hybrid Regime 16 20 2 10
Authoritarian Regime 10 10 5 9
Table 4.2: Hyperparameters Used for Each Government Type.
4.2 Qualitative Topic Analysis
To explore how LDA and BERTopic differ in their ability to extract meaningful and
interpretable topics from AI policy documents, and to examine thematic variation
38
4. Results
across governance types, the results are presented by model and regime classification.
Tables 4.3, 4.4, 4.5, 4.6 4.7, 4.8, 4.9, and 4.10 were constructed based on the outputs
of both models. Each table includes the topics and keywords found in the previous
sections, together with a qualitative interpretation based on the documents they
were extracted.
4.2.1 LDA
As directed, LDA generated ten topics for Full Democracy, as seen in Table 4.3.
Several topics relate to regulatory structure and data governance (Topics 0, 5, and
7), often emphasizing oversight, compliance, and coordination across institutions,
but also within specific area of society such as health care (Topic 8), accountability,
and individual rights within algorithmic decision-making and law enforcement (Topic
7). Others reflect broader economic themes, such as public investment, digital
transformation, and EU-aligned recovery plan (Topics 4, 6, and 8).
The ten topics extracted by LDA for Flawed Democracy (Table 4.4) reflect a strong
emphasis on U.S. federal documentation and digital governance themes. Topics 1, 5
and 8 focus on digital transformation in different contexts such as education and
infrastructure. Topics 3, 6, and 9 relate to research and development policy and
legislative amendments, including grant procedures and military and cybersecurity.
Topic 2 addresses ethical oversight in medical research in India. The number of
topics was increased to align with the topics in BERTopic for comparative purposes,
which led to some thematic overlap across topics.
For the Hybrid Regime in LDA (Table 4.5), we output fifteen topics. Several
topics in this governance type focus on education and digital literacy, particularly
the integration of ICT into school systems and the development of local training
ecosystems (Topics 0, 10, and 12). We also see broader national agendas aimed at
preparing societies for digital transformation (Topics 8 and 11). A second group
focused on data governance and security (Topics 3, 5, 6), as well as a third group that
emphasized economic and industrial transformation (Topics 2, 7, 9, 13). Some topics
exhibit keyword overlap (e.g., Topics 0 and 10 both contain “education”, “ICT”, and
“ministry”), indicating closely related themes. While several topics exhibit thematic
overlap, they often diverge in national context, policy focus, or implementation
approach.
Lastly, with LDA, we output fifteen topics for the Authoritarian Regime (Table 4.6).
As with the previous government types, the model output a number of countries
such as “China”, “Kazakhstan”, “Uzbekistan”, “Egypt”, “Russian”, “Dubai”, and
“Vietnam”. A number of topics focus on digital infrastructure and personal data
governance, including national strategies for data protection and cybersecurity (Topics
0, 1, 9, and 11). Some topical overlap is present in this group, reflecting the trade-off
between coherence-based model selection and topic distinctiveness. Additionally,
three topics address the health and education sectors (Topics 2, 6, and 10), including
standardization of systems, integration of machine learning, and building technical
capacity. A small number of topics reflect administrative reporting, archival material,
or metadata, such as Topic 13, which appears to capture monthly archival records.
39
4. Results
Topic Keywords Qualitative Interpretation
0 algorithm, knowledge, de, supervisor, A discourse focused on Dutch and European
european, dutch, van, company, en, strategies for algorithm governance, including na-
million tional supervision structures, public-private re-
search partnerships, human capital initiatives, and
regulatory frameworks within the Netherlands’ AI
infrastructure development.
1 learn, strategy, user, society, science, A national AI strategy anchored in Japan’s AIP
center, network, machine, company, network and supported by ministries and research
principle centers, emphasizing explainability, user trust, and
policy coordination. Highlights integration of sci-
entific principles with public services, societal in-
clusion, and international research dissemination.
2 uk, gov, cookie, university, we, coun- UK government communications on national AI
cil, page, lead, professor, help strategy, featuring academic leadership, research
councils, and policy announcements. Includes
standard website elements like cookie notices and
user feedback prompts.
3 article, law, aid, beneficiary, entity, Legal and procedural framework outlining direct
subsidy, establish, provision, grant, subsidy grants to eligible entities, detailing aid pro-
activity visions, beneficiary obligations, and justifications
based on public interest and regulatory compli-
ance.
4 investment, recovery, spain, pro- Spain’s recovery and transformation plan outlining
mote, economic, component, spanish, investment and reform components aimed at pro-
transformation, european, reform moting economic growth, digital transformation,
and EU-aligned modernization across sectors.
5 uk, regulatory, regulator, across, Relates to UK-wide approaches to data regulation,
individual, organisation, approach, outlining organisational and individual responsi-
personal, guidance, worker bilities for lawful processing, personal data use,
and regulatory coordination across sectors.
6 fund, language, european, eu, pro- EU-aligned public investment strategies, spanning
gram, initiative, energy, employment, energy, language, employment, and digital econ-
investment, economy omy programs aimed at economic resilience, inclu-
sion, and sustainability.
7 algorithm, bias, organisation, algo- Regulation and oversight of algorithmic systems,
rithmic, individual, human, tool, focusing on organisational practices for mitigating
could, police, group bias, ensuring human involvement, and protecting
individual rights across contexts, including law
enforcement.
8 trial, health, solution, vehicle, cdaa, Strategy for data-driven innovation in health and
safety, automatic, science, training, science, specifically on trials, automatic systems,
analysis training, vehicle-based safety monitoring, through
collaboration between the public and private enti-
ties.
9 test, vehicle, standard, traffic, road, Austrian regulations for automated vehicle testing
publication, de, automate, drive, on public roads, safety standards, driver roles, le-
driver gal requirements, and coordination between public
authorities and industry.
Table 4.3: LDA: Full Democracy Topics and Qualitative Interpretation.
40
4. Results
Topic Keywords Qualitative Interpretation
0 document, federal, content, search, U.S federal documentation system, focusing on
order, register, official, office, detail, the structure, publication, and public access to
https official government content through the federal
register.
1 digital, solution, sector, administra- Digital transformation in education, healthcare,
tion, education, company, work, cre- and public administration, with emphasis on work-
ate, skill, area force, institutional collaboration, and the develop-
ment of digital services and platforms.
2 india, health, ec, participant, study, Ethical oversight and risk management in medical
risk, medical, review, ensure, must and health research in India, with emphasis on
participant protection, informed consent, and EC
(Ethics Committee) review.
3 rd, budget, nsf, federal, nist, nitrd, U.S federal research and development strategy
health, network, advance, nih and budgeting for scientific advancement, includ-
ing initiatives by NSF, NIH, NIST, and related
agencies, with a focus on health and national in-
frastructure.
4 explanation, sec, decision, para- Legal and policy adjustments promoting explain-
graph, model, title, explainable, able models and decision-making through amend-
amend, principle, strike ments and principles.
5 digital, innovation, country, sector, Public sector digital innovation, focusing on ser-
industry, field, model, university, vice design, institutional transformation, and
ministry, infrastructure building infrastructure and skills across ministries
and administrations.
6 nsf, proposal, gov, award, comment, U.S federal policy on testing and approving inno-
nist, submit, fairness, organization, vative services, focusing on transparency relevant
grant to grant submission and organizational account-
ability.
7 employer, employee, consortium, Partnership between industry and academia with
google, laboratory, website, comput- partners such as Google and IBM aimed at ad-
ing, employment, person, ibm vancing AI research with ethical and employment
challenges in computing and public-private collab-
oration.
8 al, standard, rd, human, federal, re- Federal digital transformation strategy focusing on
sponse, strategy, application, test, infrastructure, emerging technologies to support
strategic innovation and public service modernization.
9 title, sec, paragraph, force, strike, U.S. federal legislative text relating to amend-
amend, inserting, code, military, in- ments, code insertions, and military provisions,
sert references titles, sections, subsections, and insert-
ing clauses into U.S. Code regarding defense au-
thorization, cybersecurity, and government opera-
tions.
Table 4.4: LDA: Flawed Democracy Topics and Qualitative Interpretation.
41
4. Results
Topic Keywords Qualitative Interpretation
0 education, transformation, ict, min- Kenya’s Ministry of Education’s plan to integrate ICT
istry, plan, implementation, page, into the national education system through the Digital
read, integrate, society Literacy Programme and prepare learners for a digitally-
driven society.
1 april, people, partner, user, hold, Nigeria’s effort in digital transformation through policy
phase, responsible, facebook, train, development and stakeholder training.
transformation
2 april, people, say, general, reach, ex- Armenian digital public communication and digitaliza-
ecutive, facebook, email, standard, tion strategy, promoting high-standard digital adoption
high in governance and private sector transformation.
3 datum, risk, page, impact, gover- Mexico’s national data governance strategy, addressing
nance, current, agenda, responsible, current institutional challenges, responsible innovation,
challenge, future and future societal impacts through coordinated policy
and stakeholder engagement.
4 industry, drive, potential, europe, ed- Ukraine’s national strategy to drive industrial moderniza-
ucation, every, read, aim, transition, tion and digital transformation by integrating technology
find and reforming education to enable future economic po-
tential.
5 datum, security, responsible, must, Responsible data governance, emphasizing human rights,
risk, express, impact, say, education, education, and security in managing digital risks and
human ethical technology development.
6 security, say, express, agency, seek, National security and technological innovation through
establish, training, cooperation, gen- military training, and the establishment of advanced
eral, among research centers, aiming to develop robotics and emerging
technologies for public service and defense development.
7 say, revolution, must, like, force, Uganda’s strategy to capitalize on the Fourth Industrial
user, partner, people, many, eco- Revolution by establishing a task force, partnering with
nomic global tech leaders, and promoting inclusive economic
development through local technology adoption and in-
novation.
8 agenda, minister, implementation, Industrial and digital transformation strategy, driven by
photo, view, africa, industry, num- a formal agenda, ministers, task forces, and technological
ber, approve, revolution infrastructure.
9 transformation, user, plan, state, Peru’s digital transformation efforts, focused on telecom-
change, internet, company, main, munication regulation, user empowerment, state plan-
telecommunication, save ning, internet service transparency, and cultural change
within public digital governance.
10 education, ict, programme, ministry, Implement and promote ICT in primary and sec-
phase, learn, primary, adopt, con- ondary schools to adopt digital learning tools, support
tent, implementation competency-based curricula, and enhance teacher train-
ing, content development, and infrastructure across edu-
cation phases.
11 agenda, implementation, minister, National reforms for digital transformation across soci-
republic, approve, official, present, ety, emphasizing public service modernization and stake-
issue, society, also holder cooperation.
12 education, link, ict, startup, learn, Development of local ICT education ecosystems linking
need, local, ecosystem, knowledge, training, startups, and knowledge to meet digital learning
training and innovation needs.
13 research, official, center, minister, ini- Government-led innovation agendas fostering research-
tiative, agenda, application, industry, industry collaboration through strategic policy, funding
include, council programs, and institutional coordination.
14 april, stakeholder, contribute, article, Policy development and public engagement in digital
phase, facebook, like, partner, hold, initiatives, highlighted through events and stakeholder
seek contributions across platforms.
Table 4.5: LDA: Hybrid Regime Topics and Qualitative Interpretation.
42
4. Results
Topic Keywords Qualitative Interpretation
0 ng, viet nam, hc, hi, can, ch, state, Viet Nam’s strategies focused on personal data protec-
personal, data, protection tion, digital infrastructures, and state-led innovation to
support secure digital transformation.
1 algorithm, security, governance, A national framework to strengthen algorithmic gover-
strengthen, supervision, social, china, nance in China, emphasizing enterprise responsibility,
enterprise, risk, right netizen oversight, social supervision and regulatory mech-
anisms to ensure security and protect rights.
2 china, improve, major, intelligent, Initiatives in China advancing intelligent technologies
college, set, generation, computing, through major computing research, college reform, and
increase, basic next-generation talent development.
3 grant, fund, report, phd, agreement, Government-administered research funding initiatives in
result, candidate, applicant, republic Kazakhstan supporting PhD candidates and innovation
kazakhstan, environmental through structured grant programs with environmental
oversight and performance-based reporting.
4 city, smart, state, uzbekistan, insti- Implementation and development of digital economy and
tute, society, communication, open, electronics to build smart cities and enhancing public
high, electronic services.
5 egypt, company, egyptian, level, National AI and digital innovation strategy in Egypt.
model, one, student, number, phase, Specifically education reform, local startup growth, and
different phased adoption of machine learning models across sec-
tors to address economic and societal challenges.
6 health, standardization, healthcare, Russian initiatives to standardize healthcare and educa-
read, group, personal, committee, ac- tion through technical committees and data protection.
cess, patient, working
7 lab, team, idea, user, solution, step, Innovation in Abu Dhabi’s health sector through design-
innovator, story, product, stake- thinking labs that engage stakeholders, iterate on user-
holder centered ideas, and develop solutions.
8 ministry, uzbekistan, document, also, Digital development and international policy coordina-
vietnam, minister, republic, commu- tion across ministries and nations (e.g., Uzbekistan, Viet-
nication, foreign, unit nam), with emphasis on communication, governance, and
foreign affairs.
9 ministry, agency, open, portal, digi- Efforts by Russian state ministries and agencies to en-
talization, state, search, russian, pro- hance transparency and efficiency in governance through
curement, email digital portals, particularly for procurement, public ac-
cess to information, and administrative services in na-
tional digitalization.
10 egypt, also, course, student, econ- Egypt’s efforts to improve its economy through education,
omy, communication, many, improve, digital skills, and innovation.
problem, level
11 dubai, law, vietnam, director, uae, National data sovereignty and digital infrastructure for
head, minh, data, notice, group innovation.
12 oecd, governance, trustworthy, pol- International efforts to promote trustworthy and account-
icy, website, risk, issue, privacy, ap- able data governance and AI policies, emphasizing pri-
proach, principle vacy, risk management, and ethical principles.
13 april, ncai, february, january, july, Monthly archival records and activities related to
march, may, december, november, Saudi Arabia’s National Center for Artificial Intelligence
october (NCAI).
14 call, april, online, privacy, problem, Legal and regulatory updates around privacy, cybersecu-
owner, cybersecurity, federal, foreign, rity, and federal or international data governance, espe-
free cially involving new calls, policy revisions, and problem-
solving initiatives across jurisdictions.
Table 4.6: LDA: Authoritarian Regime Topics and Qualitative Interpretation.
43
4. Results
4.2.2 BERTopic
For Full Democracy, we specified for BERTopic to output ten topics, as presented
in Table 4.7. Several topics relate to the public-sector digital transformation, with
particular attention to transparency, automation, and ethical oversight (Topics 0, 1,
2, and 6). These themes include human-centric regulation, algorithmic accountability,
and integration of national and international digital strategies. Other topics address
challenges posed by emerging technologies such as deepfakes and surveillance systems
(Topics 6 and 8), while some reflect metadata or navigational content from digital
government documents (Topics 4, 5, and 9).
The BERTopic model identified ten topics for Flawed Democracies (Table 4.8),
several of which mirror institutional and legal frameworks from the United States
and India. Topics 0, 3, and 7 refer to U.S. federal law, national defense, and
cybersecurity strategies, particularly concerning technical standards and military
training. Other topics emphasize responsible AI development and implementation in
public administration and healthcare contexts, such as India’s national efforts (Topic
1). Additional themes include explainable AI (Topic 4), technological collaboration
(Topics 5, 6), and international competitiveness in AI governance (Topic 8).
For Hybrid Regime, we generated 10 topics by using BERTopic (see Table 4.9). Half
of these topics discuss policies from a national perspective, explicitly naming the
country (e.g., Turkey, Kenya, etc.). Most of these (Topics 0, 1, 8, 9) discuss action
plans, programs, and preparations for advancing technology. Another set of topics
(Topics 2, 3, 4, and 6) highlights higher education and conferences, focusing on
research projects, skill development, doctoral or academic programs, and technology-
oriented events. Other topics (Topics 0 and 4) address issues related to human rights,
notably in terms of protection of rights and juvenile/jurisdiction matters.
By using BERTopic, we generated nine topics for the Authoritarian Regime (Table
4.10). These topics reflect national digital governance strategies, with an emphasis on
data protection law, algorithmic supervision, and digital infrastructure development.
Topic 1 outlines advanced algorithm governance and citizen supervision in China,
while Topics 2 and 6 describe data protection revisions. Education and research
programs also feature in grant-related topics (Topics 5 and 8), alongside national
strategy planning for AI deployment and economic modernization (Topics 3 and 4).
44
4. Results
Topic Keywords Qualitative Interpretation
0 digital, public, new, national, euro- A joint European AI strategy covering cooperation on AI
pean, social, economic, international, competence centers, sovereign data infrastructure, and
strategic, human national civilian plans, and ethical, legal, and transparent
frameworks for human rights and democracy.
1 decision making, automated, algo- Integration of automated decision-making into public and
rithmic, public, data protection, new, regulatory processes, including document digitization,
discrimination, personal, human, reg- ML tools, data-protection impact assessments, personal-
ulatory data transparency, and bias risks in training and infer-
ence.
2 public, automatic, new, automatic Outlines a national data-science and automated-learning
learning, national, digital, social, hu- framework for government, emphasizing transparency in
man, strategic, autonomous decision-making, funding mechanisms, strategic goals,
and human involvement.
3 beneficiary, general, concession res- Defines eligible subsidy actions and expenses, sets limits
olution, subsidable, prior, following, on aid amounts, and specifies the documentation needed
corresponding, economic, beneficiary to justify payments.
entities, technical
4 additional cookies, additional, cookie Repeated banners explaining why, where, and how web-
settings, obtain permission, copy- site cookies – both essential and additional – are used,
right holders, party copyright, main and how to set preferences. Likely from the same UK
content, visit nationalarchives, copy- government site.
right information, ukdocopen
5 startxref, adobe design, service cen- PDF-export metadata combined with footer navigation
tre, capable supercomputer, inter- list of digital-hub programs and webpage menu items.
nal, data subjects, missing compo-
nents, unchanged, ambassadors part-
ners, systems mends
6 facial, live, facial identification, fa- Facial recognition technology – what it is, facial matching
cial recognition, society groups, po- uses (one-to-one vs. one-to-many), and legal safeguards
lice forces, identification systems, fa- (Data Protection, Human Rights and Equality Acts)
cial verification, private, recognition around live police deployments.
technology
7 doctoral, doctoral training, quantum, An overview of Vrije Universiteit Brussel’s (VUB) ecosys-
menu group, international, centres tem – its doctoral and research groups and programs, and
bpost, bpost parcel, campus manage- on-campus support and services (shops, parcel locker,
ment, sports infrastructure, funded etc.)
centres
8 deepfake, doctored, audio, visual, Discussion of deepfakes and audio-visual content as forms
face replacement, enactment, dis- of disinformation. Highlights the need for media trans-
information, fake, speech synthesis, parency on both pros and cons and encourages public
media platforms vigilance and fact-checking.
9 visit today, card details, blank, credit Standard disclaimer text appearing across different UK
card, new tab, wrong, financial infor- government webpages on AI regulation and strategies,
mation, useful, personal, financial advising users not to share personal information and
inviting them to complete surveys.
Table 4.7: BERTopic: Full Democracy Topics and Qualitative Interpretation.
45
4. Results
Topic Keywords Qualitative Interpretation
0 military, national, federal, following, Sections outlining U.S. defense and national-security
fiscal, general, subparagraph, appro- law, detailing the duties and reporting obligations for
priate, foreign, new the Secretaries of Defense and State, and setting rules
on international maritime law, military installations,
defense, funding, and oversight by various congressional
committees.
1 indian, medical, national, clinical, Challenges of AI and its development in agriculture and
new, responsible, ethical, potential, healthcare, with references to ethical policies by the
key, human Indian Council of Medical Research and the importance
of responsible nationwide implementation.
2 digital, public, public administration, Potential of a new industrial revolution driven by AI
digital transformation, new, national, and digital technologies, focusing on national strategies
country, industrial, industrial revolu- for digital transformation, U.S. digital security, public
tion, fourth administration, and the role of education and research
institutions.
3 federal, human, american, technical Due to the U.S. leadership in AI, federal agencies are
standards, national, federal govern- responsible for fostering reliable AI development through
ment, technical, strategic, regulatory, technical standards, with active participation from the
standards development private sector and academia to support new industries.
4 explainable, explanation accuracy, Self-explainable, interpretable ML models provide both
human, interpretable, neural, mean- global and per-decision explanations. When explanations
ingful, counterfactual, decision accu- are not meaningful, alternative algorithms are used for
racy, black box, knowledge limits additional information, and some metrics can be used to
assess explanation accuracy.
5 digital, european, public, high, pub- Estonia’s Digital Agenda 2030 and National AI Strategy
lic sector, private, possible, digital that include private and public sectors (i.e., roadmap
government, action plan, main for development projects), and detail action plans for
training, funding, and ongoing implementation updates.
Also, funding and plans for Italian companies in both
sectors focused on AI research.
6 technological development, techno- Lists research institutions and programs, including the
logical, scientific, prestigious, inter- Chinese Economic and Scientific Delegation visit, a pres-
national, summer school, korean, tigious Industry 4.0 conference, Splitech 2025 on sus-
center, global, international confer- tainable and smart technologies, and Eastern European
ence Machine Learning School.
7 national, public, military, digital, Sections of U.S. law covering national defense man-
economic, new, armed, fiscal, strate- agement, cybersecurity policies, and training programs
gic, subparagraph across the Navy, Army, Air Force, Marine Corps, and
other branches.
8 israeli, cloud, intelligent, human cap- Technological revolution describing Israel’s advance-
ital, national, human, technological, ments in AI technologies, human capital development,
high, team, various cloud services, and national funding plans.
9 ordinary skill, fair use, ordinary, fair, Guidance for defense and intelligence agencies on han-
natural person, trade secret, secret, dling personal and sensitive information and human-
claimed invention, sui generis, natu- capital planning, alongside public debate on whether
ral existing laws for intellectual-property systems should be
updated for AI-generated content.
Table 4.8: BERTopic: Flawed Democracy Topics and Qualitative Interpretation.
46
4. Results
Topic Keywords Qualitative Interpretation
0 human rights, human, action plan, Turkey’s action plan regarding protection and promotion
public, social, effective, doctoral, ju- of human rights and freedom, including alternative sanc-
dicial, international, legal tions to short-term prison sentences and convict rights,
rights to property, and victims of violence.
1 digital, government, digital transfor- Kenya’s three-phase program on technology integration
mation, digital agenda, digital liter- in basic education systems through Competency-Based
acy, primary, local, key, appropriate, Curriculum. It highlights that a new policy guide (e.g.,
technical integrating ICT, smart classroom setup), with the gov-
ernment’s involvement, aims to lead this implementation
to success.
2 human, multimodal, multimodal per- Global universities and their respective research projects,
ception, societal, responsible, delle primarily focused on ethical and responsible AI devel-
ricerche, societal use, use cases, trust- opment and applications (e.g., evidence-based chatbot
worthy, industrial interactions, segmentation in automatic captioning sys-
tems, and multimodal perception and modeling).
3 personal data, personal, data con- Policies on personal data collection and handling, partic-
troller, subject, data subject, data ularly when data controllers and data processors can and
controllers, necessary, international, should erase, destroy, or anonymize data. Moreover, it
relevant, international organization highlights conditions for domestic and international data
transfers, different purposes, special data categories, and
measures to be taken by data importers.
4 scientific, right, human rights, profes- Human rights, higher education, and skills development
sional, reference experiments, rights in the context of the Juvenile Justice System. Themes
boards, indispensable, high educa- include support for convicts to acquire professional skills,
tion, management skills, professional maintain contact with families, and the needs of minori-
skills ties.
5 doctoral, academic, human rights, A booklet of policies outlining academia, higher edu-
human, higher, judiciary, higher edu- cation requirements, and administrative affairs. Addi-
cation, doctoral programs, new, pub- tionally, it discusses the EU Action Plan on Human
lic Rights and Democracy, planning costs, and the exclu-
sion of discrimination for decisions made through the
e-Government gateway.
6 machine learning, high school, cover The Hybrid Human-AI Conference and its events focused
letter, great enthusiasm, diverse, on AI developments, human-AI collaboration, and re-
technical, high, relevant experience, search in machine learning, human-computer interaction,
human, advanced and psychology. Additionally, technology-oriented and
awareness-raising training sessions held to prepare young
people to gain skills for the job market.
7 political party, party groups, politi- Turkey’s rules and timeframes regarding the election of
cal, annexed, public, personal, mem- political party groups, Board members, and Authority
bership positions, vacant, personal personnel. It also discusses exemptions from the per-
data, total number sonal data law, particularly the conditions under which
personal data is processed.
8 information technology, digital econ- Nigeria National Information Technology Development
omy, new, emerging technologies, Agency’s approach to the exponential growth of technol-
digital technologies, exponential, ex- ogy and AI. It is responsible for developing frameworks
ponential growth, digital, corporate, and guidelines for the IT sector to support a sustainable
urged stakeholders digital economy.
9 task force, industrial revolution, Uganda’s preparation for the Industrial Revolution to
emerging technologies, harness op- drive economic development, including the creation of
portunities, national guidance, an- a task force of scientists, policymakers, and engineers.
nual report, advise government, na- The focus is on increasing agricultural production and
tional task, industrial, digital inno- adopting domesticated technologies, instead of foreign-
vation driven innovations (e.g., automated vehicles).
Table 4.9: BERTopic: Hybrid Regime Topics and Qualitative Interpretation.
47
4. Results
Topic Keywords Qualitative Interpretation
0 innovator teams, design thinking, Documentation on the AI Lab: what it is, its web portal,
safe, key, product owner, value test- and its design-thinking process. Includes team structure,
ing, project, prototyping, portal, dif- prototypes, and benefits of user-value testing and deliv-
ferent ery.
1 algorithm, security governance, in- A guide to advanced algorithm filing – how it works
formative service, service algorithms, and how to use it – alongside continuous netized partici-
social, algorithm filing, orderly, so- pation and the improvement and development of tech-
cial supervision, security risks, posi- nologies, algorithms, and models focused on supervision,
tive energy transparency, legality, security risks, and overall security
governance.
2 data protection, personal data, pub- Amendments to Saudi Arabia’s Personal Data Protec-
lic, personal, federal, national reg- tion Law – resolving stakeholder concerns, allowing con-
istry, data transfers, trade zones, free trollers to collect and use third-party personal data (un-
trade, privacy framework less sensitive), and removing the national registry. Also,
a discussion of the UAE’s privacy framework, including
general privacy rights and prohibitions on data misuse.
3 national, new, international, public, AI’s importance in modern society, and a national plan
economic, human, level, major, so- on how China will implement and promote it – includ-
cial, smart ing lawmaking, safety assessments, labor training, and
academic scholarships for AI programs.
4 national, national center, global, Saudi Arabia’s National Center for AI – covering ap-
cloud, large, civil law, large data, plications, risks, challenges, and data protection – and
trustworthy, economic, generative Vietnam’s recognition of gaps in AI development, leading
to a Tactical Targeting Network Technology plan and
development of broader AI strategies.
5 trng, financial, financial report, work- Contract for an internship in Kazakhstan, outlining re-
ing days, free, environmental, evalua- quirements for receiving financial grant funding and de-
tion, implementation, grant funding, tailing rules, including financial, technical, and adminis-
technical trative details of the research implementation.
6 information technologies, digital, Uzbekistan’s development of IT and digital transforma-
coordination commission, working tion roadmaps, highlighting increases in broadband ports,
groups, international, digital trans- communication lines, implementation of information sys-
formation, state bodies, road maps, tems and electronic services, and training workers in the
software products, digital technolo- IT sector.
gies
7 digital, personal data, personal, data Description of a research institute for digital technology
protection, scientific, grant program, and AI development in Uzbekistan, established by presi-
research institute, digital technolo- dential decree, outlining its main goals and functions.
gies, grant agreement, information
technologies
8 grant program, scientific, financing, Financial reporting requirements and documentation
environmental, notarized, following, rules for a PhD research and training grant program
grant funds, financial, host organiza- under EMF, including required documents, forms, and
tion, technical conditions.
Table 4.10: BERTopic: Authoritarian Regime Topics and Qualitative Interpretation.
4.3 Ethical Topic Variation Across Government
Type
Table 4.11 includes a quantitative summary of LDA and BERTopic models, including
the number of topics per government type that overlap with the 11 framework
48
4. Results
categories (Table A.1). The topic is said to overlap if there is at least one partial or
perfect overlap with the framework keyword. It can be seen that LDA overlaps with
10 ethical framework categories, while BERTopic overlaps with all 11 categories.
Government Type LDA Model BERTopic Model
Full Democracy 9 9
Flawed Democracy 7 8
Hybrid Regime 10 11
Authoritarian Regime 9 9
Table 4.11: The Number of Government Type Overlaps With the Created Ethical
Framework’s Topics (Out of 11) (A.1) for LDA and BERTopic.
Figures 4.1 show the normalized topic-framework scores produced by the LDA
and BERTopic models. These scores provide a comparative overview of how much
emphasis each government type places on particular framework categories relative to
their total topic distribution. The higher the score, the more accurate and stronger
the overlap with the framework category.
49
4. Results
(a) LDA (normalized scores)
1.0
Government Type
0.9 full democracyflawed democracy
hybrid
0.8 authoritarian
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0 1 2 3 4 5 6 7 8 9 10 11
Framework Topic
(b) BERTopic (normalized scores)
1.0
Government Type
0.9 full democracyflawed democracy
hybrid
0.8 authoritarian
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0 1 2 3 4 5 6 7 8 9 10 11
Framework Topic
1 Ethical Impact Assessment 2 Ethical Governance and Stewardship
3 Data Policy 4 Development and International Cooperation
5 Environment and Ecosystems 6 Gender
7 Culture 8 Education and Research
9 Communication and Information 10 Economy and Labour
11 Health and Social Well-Being
Figure 4.1: Stacked Bar Charts of Normalized Topic-Framework Scores from Both
Topic Modeling Approaches.
50
Normalized Score Normalized Score
4. Results
Framework Category LDA Overlap BERTopic
Score Overlap Score
1 - Ethical Impact 0.51 0.95
Assessment
2 - Ethical Governance and 0.94 0.79
Stewardship
3 - Data Policy 0.37 0.46
4 - Development and 0.93 0.70
International Cooperation
5 - Environment and 0.11 0.07
Ecosystems
6 - Gender 0.15 0.16
7 - Culture 0.22 0.12
8 - Education and Research 0 0.03
9 - Communication and 0.17 0.04
Information
10 - Economy and Labour 0.19 0.34
11 - Health and Social 0.41 0.31
Well-Being
Table 4.12: Model Topic Overlap with the Framework Categories, Normalized Scores.
Green Color Indicates the Highest Score Overlaps, and Red Color Indicates the
Lowest Scores. The Framework Categories That Both Models Match Are Also
Highlighted Respectively.
4.4 Quantitative Results Using OCTIS
This section presents OCTIS evaluation results to complement the qualitative evalua-
tion to answer the main research question of how BERTopic and LDA compare in their
ability to extract meaningful and interpretable topics. The metrics Coherence CV ,
WECoherencePairwise, Topic Diversity, and IRBO, are used to quantify various
dimensions of model quality. Coherence, specifically CV , evaluates the seman-
tic consistency of top keywords based on their co-occurrence in the input corpus.
WECoherencePairwise computes the average pairwise cosine similarity between top
topic words using word embeddings. Topic Diversity quantifies the uniqueness
of words across topics and penalizes models that produce highly similar topics
with slight variations. Lastly, IRBO complements Topic Diversity by measuring
redundancy based on how often the same words appear in similar positions across
topics. More detailed descriptions of these metrics are provided in Section 2.2.2.
Results are organized by evaluation metric, with comparisons shown across models
and government types to provide an overview of performance differences across
configurations.
Presented below are the results for OCTIS by government type and their correspond-
51
4. Results
ing values from the metrics. The better-performing results are highlighted in bold.
Following are the results described in the corresponding LDA and BERTopic sections.
Government Type Topic Diversity Coherence (CV ) WECoherence Pairwise IRBO
Full Democracy 0.7933 / 0.8867 0.5660 / 0.5772 0.0344 / 0.0118 0.9690 / 0.9824
Flawed Democracy 0.8600 / 0.8333 0.5385 / 0.5157 0.0370 / 0.0055 0.9671 / 0.9627
Hybrid Regime 0.6333 / 0.9714 0.4766 / 0.4567 0.0183 / 0.0130 0.9300 / 0.9713
Authoritarian Regime 0.8400 / 0.8778 0.5037 / 0.4713 0.0404 / 0.0105 0.9802 / 0.9715
Table 4.13: Comparative OCTIS Metrics for LDA vs BERTopic by Government
Type. Higher Values Are Bolded.
For Coherence (CV ), LDA achieved better coherence in three out of four government
types (Flawed Democracy, Hybrid Regime, and Authoritarian Regime). This suggests
that LDA’s topics were, on average, more semantically consistent and interpretable
based on word co-occurrence patterns. The only exception was Full Democracies,
where BERTopic slightly outperformed LDA and achieved the highest coherence
score overall. This suggests that LDA more reliably produced interpretable topics
across the different government types.
In the metric WECoherencePairwise, LDA again performed best, with a notable
margin across all regimes. LDA produced more internally coherent topics while
BERTopic, which often selects top words based on conceptual similarity rather than
lexical proximity, performed lower on this measure.
For Topic Diversity, BERTopic outperformed LDA in three out of four regime
types, indicating that it resulted in more varied and less repetitive topics. The
exception was Flawed Democracies, where LDA showed higher diversity. While
higher diversity can indicate broader thematic coverage, it does not necessarily
guarantee interpretability.
Lastly, IRBO saw mixed results with LDA scored higher for Flawed and Authoritar-
ian regimes, while BERTopic was stronger in Full and Hybrid regimes. However,
the differences in IRBO values were relatively small, suggesting that both models
maintained relatively balanced topic distributions.
These results provide an initial indication of model behavior across regimes in terms
of topic coherence, diversity, and balance. While LDA consistently scored higher
on coherence-based metrics, BERTopic showed strengths in topic diversity. These
patterns are discussed further in the following sections.
52
5
Discussions
The following sections thoroughly discuss the results obtained in the previous chapter.
Specifically, the LDA and BERTopic models are compared on both quantitative
and qualitative levels. Additionally, we perform an in-depth analysis of the topics
obtained by the models, particularly in relation to government types.
5.1 Model Comparison and Qualitative Analysis
This section discusses the thematic patterns identified in AI policy documents
across regime types, focusing on how key issues are framed differently depending
on governance context, and how BERTopic and LDA capture and represent these
variations.
5.1.1 Model Comparison per Government Type
The topics extracted from the BERTopic’s Full Democracy documents strongly
emphasize transparency in policymaking. In particular, the texts explore different
strategies for automation tools in government and decision-making, highlighting
democratic human rights and the need for transparent models. Additionally, Full
Democracy addresses modern technologies, facial recognition, deepfakes, and audio-
visual content, stressing the media’s need for transparency and information about
disinformation risks. Moreover, about one-third of the topics consist of scraped
metadata, including footers, banners, etc., which do not contribute directly to the
policy analysis. This indicates that further data cleaning was needed to reduce the
amount of metadata. However, due to time constraints, more extensive preprocessing
was not feasible. LDA’s topics for Full Democracy show a strong focus on regulatory
and ethical application of technology. Several topics include references to specific
national or regional contexts, often appearing as named entities (e.g., countries,
institutions, or programs). Even though the documents were not analyzed by
country, some topics show consistent signals from national strategies. While the
topics vary by national contexts, some recurring patterns are visible. Several topics
relate to regulatory structure and data governance, often emphasizing oversight,
compliance, and coordination across institutions, but also within specific areas of
society, such as health care, accountability, and individual rights within algorithmic
decision-making and law enforcement. Others reflect broader economic themes, such
as public investment, digital transformation, and an EU-aligned recovery plan. These
53
5. Discussions
patterns indicate that Full Democracies are positioned at a later stage
of digital governance, where regulatory frameworks, transparency, and
ethical oversight play a central role in shaping technological development.
Flawed Democracy topics from BERTopic focus on national AI policies and
defense. The majority, at least four out of the ten topics, outline U.S. leadership and
military-related regulations. Moreover, these documents highlight the potential of
AI in driving a technical revolution, introducing the need for national strategy plans,
implementation roadmaps, and funding allocations. Similar to Full Democracy,
the documents also emphasize transparency and explainability of AI
models and the role of academic and research institutions in shaping AI
development. For LDA’s 10 topics, there was a large focus on the U.S., particularly
on digital innovation, research infrastructure, and legislative processes across different
contexts. For example, both Topics 5 and 6 mention digital innovation, where Topic
5 centers on public sector innovation and Topic 6 highlights transparency and
accountability in federal grant submissions and organizational evaluation. The large
number of topics focusing on the U.S. is likely influenced by the large volume of
U.S.-based documents in the dataset. As a global hegemon with a considerable
amount of policy documentation, the U.S. influence is captured in the prevalence
of topics where the country appears. This reflects not only data quantity but also
geopolitical influence, which shapes the visibility of U.S. policies in the global AI
discourse. Furthermore, as in BERTopic, “military” was mentioned but not to the
same extent in LDA, where it was only mentioned once. Additionally, because we
opted to use the lesser-performing hyperparameters in this regime type to allow for
more topics in LDA, we also observe some redundancy, particularly in topics related
to federal structures (e.g., Topics 0 and 8).
The Hybrid Regime for BERTopic had 10 topics, with a strong emphasis on
advancing or developing technologies across varying contexts, including education,
human rights, or economic development. Compared to BERTopic’s output in other
regime types, the topics under Hybrid Regime display a more distinctly national
focus. This may suggest that countries within this regime type exhibit less policy
overlap, making country references a stronger indicator of topic relevance. It is
worth noting that this pattern appears more pronounced in BERTopic than in
LDA, which shows a stronger country-specific signal across all regime types. The
difference is likely due to LDA’s probabilistic structure, which relies more heavily
on frequent word co-occurrence patterns, which could more often find country
mentions as a more dominant topic feature. In LDA’s topics there was an overarching
focus on transformation. The topics mentioned are integrating information and
communication technology into the educational systems and modernizing industrial
capacity, showing a focus on building technological foundations. This suggests
that Hybrid Regimes are in an earlier phase of technological development,
with an emphasis on foundational implementation rather than developing
regulatory frameworks. Furthermore, one LDA topic includes keywords related
to training and security; while the term “military” is not explicitly present, further
qualitative analysis show the topic pertains to military training.
Authoritarian Regime countries acknowledge the growing importance of AI. In
54
5. Discussions
particular, a Chinese document stresses this by establishing laws, training programs,
and scholarships to make AI more appealing and engaging to more people. Other
nations outline national plans and infrastructure roadmaps, broadband ports, com-
munication lines, IT systems, and workforce development, to support future digital
transformation. Some texts discuss the legal and transparent deployment of AI
models and websites. Moreover, the regime includes sections on research institutes,
grant programs, and technical contracts for AI development. These observations
imply that the Authoritarian Regime prioritizes AI as a strategic tool
for modernization, simultaneously underlining state-backed research pro-
grams. LDA had similar topics, such as the implementation of digital infrastructure,
the development of national research initiatives and funding, and the promotion
of smart cities and digital governance. Additionally, LDA topics include references
to education reform, with several also addressing privacy. However, certain topics
show a distinctive character not found in the other government types. Notably,
mentions of “supervision” and “social”, which, upon qualitative examination, point
to a discourse emphasizing enterprise responsibility, social supervision, and state
regulatory mechanisms. These findings suggest that while both models
highlight the innovative focus of AI, LDA reveals an additional layer
of governance through topics emphasizing social supervision, an aspect
less visible in BERTopic, which centers more on innovation and digital
infrastructure.
5.1.2 Cross-Regime Comparison
The thematic differences in the government types identified by both
models suggest a broader structural distinction in how regimes approach
technological development. Hybrid Regimes appear to prioritize foundational
implementation and capacity building, while Full Democracies more often emphasize
the governance and regulation of already deployed technologies. This contrast may
indicate different stages of technological adoption and institutional maturity across
regime types. However, this distinction may not only be a reflection of political
systems, but can also point to economic factors. For instance, countries with higher
GDPs could be better positioned to focus on more advanced technology, whereas
regimes with limited economic resources might need to prioritize infrastructure
development and digital capacity. Recurring themes such as “military” and “health”
also reveal significant regime-specific variations. The term “military,” for instance,
appears explicitly in Flawed Democracy topics, where it is framed within legislative
and safety protocol contexts, emphasizing national strategy and oversight. In Hybrid
Regimes, however, military discourse centers on training, international cooperation,
and the development of emerging technologies, suggesting a more operational and
capability-building orientation. Similarly, the topic of “health” is addressed with
different emphasis. In Full Democracies, health is often linked to individual rights
and accountability in algorithmic decision-making, reflecting practical technological
regulations. By contrast, in Hybrid Regimes, BERTopic highlights health within the
legalistic context of data consent and human rights frameworks, focusing more on
structural safeguards than on individual-level protections.
55
5. Discussions
Finally, while all regime types exhibit an interest in digital innovation
and modernization, the discursive framing of these developments varies
notably across the spectrum of democratic governance. In Authoritarian
Regimes, the focus on infrastructure, national AI strategies, and technological
advancement mirrors themes seen in other regime types. However, a distinctive
perspective appears in one of the topics, where terms such as “supervision” and
“social” indicate a concern with enterprise responsibility, citizen oversight, and
state-regulated monitoring. This element does not appear in the other government
types and suggests a more centralized and control-oriented interpretation of digital
governance. This contrasts with Full Democracies, where discussions of emerging
technologies, such as facial recognition, deepfakes, and audiovisual manipulation, are
situated within a discourse of transparency and public accountability. There, the
emphasis lies on mitigating disinformation and preserving democratic norms. Flawed
and Hybrid Regimes, positioned between these two ends of the spectrum, reveal
more mixed patterns: Hybrid Regimes emphasize infrastructure and transformation,
while Flawed Democracies show signs of both regulatory formality and strategic
assertiveness, particularly in relation to U.S. leadership.
5.2 Ethical Topic Variation Across Government
Types
This section interprets how the extracted topics vary across the four regime types
defined by the Democracy Index: Full Democracies, Flawed Democracies, Hybrid
Regimes, and Authoritarian Regimes. In particular, the comparison was split into
two parts: topic-level and government-level comparisons. This allowed an in-depth
inspection of how the two models compare to each other in terms of overlap scores
with our created ethical framework (see Table A.1). Additionally, it was examined
how much the ethical framework categories were discussed within the extracted
topics, especially how those categories were distributed within the same government
type documents. Based on this analysis, we investigate whether there is evidence
that different regimes discuss different ethical aspects when implementing AI policies.
5.2.1 Topic-Level Comparison
This section includes a topic-level discussion, particularly about the framework
categories that the LDA and BERTopic models discuss and emphasize using the
normalized score plot (Figure 4.1). Additionally, Table 4.12 provides numerical
normalization overlap scores for both models, highlighting the top and bottom
categories. For each topic, we count how many of the 11 ethics framework categories
it overlaps with. Tables A.7 and A.8 show these overlaps and scores, while Figures
in 4.1 visualize their distributions.
The two models share the three highest-score categories – “Ethical Impact
Assessment”, “Ethical Governance and Stewardship” and “Development and Inter-
national Cooperation”. This shows that the key topics discussed for each model
include similar issues and ethical concerns. Similarly, the models place the
56
5. Discussions
least emphasis on the same three out of four categories – “Environment and
Ecosystems”, “Education and Research”, and “Communication and Information”.
The overlap in the top and bottom ranking categories indicates that both models
share a similar topic distribution, which is expected given their use of the same
dataset.
Next, we examine how these overlaps play out proportionally across regime types in
the Government-Level Comparison.
5.2.2 Government-Level Comparison
This part includes a comparison of the four regimes to see whether there are identi-
fiable differences between them, and which categories receive the most significant
portion of the policy texts. As mentioned, the normalized plots in Figure 4.1 provide
a good overview of how the topics are distributed across the four government types.
This is particularly important since the regimes have different numbers and sizes of
documents, thus introducing the need for normalization for reasonable and consistent
comparison.
Full Democracy displays a broad policy focus, but the two models emphasize
different framework categories. As evident in LDA (Figure 4.1 (a)), the most
discussed topics – “Development and International Cooperation” (31%) and “Health
and Social Well-Being” (14%) – account for 45% of the framework-aligned content.
In contrast, BERTopic (Figure 4.1(b)) highlights “Ethical Impact Assessment” (23%)
and “Development and International Cooperation” (22%), again totaling 45% of the
overlaps. These results suggest that both models place the same emphasis
on one of the two categories.
Flawed Democracy places a strong emphasis on the ethics-themed categories. In
LDA, “Ethical Impact Assessment” (approximately 29%) and “Ethical Governance
and Stewardship” (23%) account for 52% of frame overlapped content. In BERTopic,
the same categories make up 54% of the overlap – 29% and 25%, respectively. These
results indicate consistent results, with Flawed Democracy focusing on ethical
topics across both models.
Hybrid Regime has a broader distribution of the overlapping framework categories.
In LDA, the most considered classifications – “Ethical Governance and Stewardship”
(32%) and “Development and International Cooperation” (29%) – make up 61% of the
total framework-aligned content. In BERTopic, the most common framework topics
covered are “Ethical Impact Assessment” (24%) and “Development and International
Cooperation” (19%), totaling 43% of the content. The results indicate that Hybrid
Regime has a slightly more spread-out distribution of topics, covering
more framework category aspects throughout the documents.
Authoritarian Regime, similarly to the Flawed Democracy, highlights ethics.
Besides the “Ethical Governance and Stewardship” (27%) category, topics in LDA also
emphasize “Data Policy” (20%), which is 47% of the total overlap of the framework.
In BERTopic, “Ethical Governance and Stewardship” (21%) and “Ethical Impact
57
5. Discussions
Assessment” (19%) make up 40% of the framework-aligned content. Again, these
results underline the regime’s consistent ethical focus across both models.
Overall, the most prominent framework category across all regimes relates
to ethics. While Flawed Democracy and Authoritarian Regime place more emphasis
on ethical topics (i.e., “Ethical Governance and Stewardship”, “Ethical Impact
Assessment”), Full Democracy and Hybrid Regime prioritize “Development and
International Cooperation” among others. However, even with the inconsistencies
between the models, almost all regimes (except Full Democracy in LDA) highlight
at least one of the two ethics topics.
5.3 Quantitative Analysis Using OCTIS
This subsection includes a discussion on the comparative performance of the two
topic models, LDA and BERTopic, based on quantitative evaluation metrics. This
analysis complements the previous qualitative evaluation and directly contributes to
answering the main research question by assessing how well each model performs in
extracting coherent and diverse topics from AI policy documents.
To assess the performance of LDA and BERTopic, it is useful to compare the models
directly across evaluation metrics, rather than by government type. This approach
provides a clearer picture of each model’s strengths and limitations.
In terms of Topic Diversity, BERTopic outperforms LDA in three out of the four
government categories. The only exception is in Flawed Democracies, which also
happen to contain the largest dataset. One plausible explanation for BERTopic’s
lower Topic Diversity score in Flawed Democracies is that LDA, when provided
with ample data, can have many well-separated topics, reducing vocabulary overlap
and enhancing diversity scores. Conversely, BERTopic’s reliance on embeddings
may begin to cluster thematically similar content more tightly as corpus size grows,
resulting in broader but less lexically distinct topics.
When evaluating IRBO, BERTopic and LDA performed better in two government
types each. Overall, the differences in IRBO scores between the models are relatively
minor.
In Coherence (CV ), LDA consistently outperforms BERTopic across all government
types, except for Full Democracies. This aligns with LDA’s modeling approach, which
favors internally coherent groupings of top-ranked terms based on co-occurrence
statistics. In contrast, BERTopic prioritizes semantic context through embeddings,
which can result in broader conceptual coverage at the expense of tight lexical
cohesion.
The metric where LDA clearly dominates is WECoherencePairwise, in which it
consistently outperforms BERTopic across all government types. This result is
expected, as LDA directly optimizes for word co-occurrence patterns within topics.
However, while these quantitative metrics offer a structured means to
compare model performance, they do not always capture how meaningful
58
5. Discussions
or interpretable the resulting topics are to human readers. In particular,
BERTopic’s lower WECoherencePairwise scores may reflect the model’s tendency
to include top words in a topic that are semantically related but lexically diverse.
While this leads to lower lexical coherence by standard measures, it can actually
enhance interpretability by capturing broader, real-world thematic groupings. For
example, the recurrence of domain-relevant tokens, such as terms related to law,
health, or regulation, across topics often reflects legitimate thematic intersections
rather than redundancy. This overlap can help human interpretation by revealing
nuanced variations and relationships between conceptually connected topics.
For example, the appearance of similar regulatory terminology in both public health
and environmental topics may indicate a shared conceptual framework, rather than
poor topic separation. BERTopic’s embedding-based architecture enables it to capture
such semantic proximity and contextual overlap, offering a more nuanced view of
thematic content. In contrast, LDA’s stricter lexical boundaries between topics
contribute to its higher scores in WECoherencePairwise and Coherence (CV ), but
this comes at the cost of thematic flexibility. While such strict boundaries improve
quantitative evaluations, they may confuse meaningful connections between topics,
particularly when similar concepts are expressed in slightly different lexical forms or
embedded within different discursive contexts.
Ultimately, the differences between model evaluation metrics and human
interpretability indicate a methodological tension. LDA performs best in
generating tightly bound, internally consistent topics, which are rewarded by stan-
dard coherence and diversity metrics. BERTopic, while often penalized by these
same metrics, may better align with how humans understand and navigate complex,
overlapping thematic landscapes. Our results reflect this widely established
trade-off between better metric performance and qualitative interpretability, high-
lighting the current limitations in evaluating topic models in a way that reflects
human reasoning and understanding, particularly in domains like political science,
where semantic nuance and discourse structure are central to interpretation.
59
5. Discussions
60
6
Conclusion
This chapters concludes the thesis by summarizing the key observations and facts
from the previous sections, highlighting the limitations encountered throughout the
thesis and further research that could follow.
6.1 Conclusion
The thesis aimed to see whether there are quantitative and qualitative differences
between BERTopic and LDA models, and what the similarities and differences are
for interpreting topics generated from AI policy documents. To answer this, two
sub-questions were introduced: (1) regarding the themes and keywords found across
the AI policies and different government types, and (2) whether there are notable
differences in what the government types discuss regarding ethical, economic, etc.
considerations.
To answer the first sub-question, we examined the topics identified across different
government types. Some themes, such as digital transformation, were present across
all government types, while other were more government-specific. For instance,
Full Democracies emphasized ethical concerns as transparent algorithmic decision-
making, facial recognition, deepfakes, and disinformation. Flawed Democracies,
heavily influenced by U.S policy documents, showed a strong focus on military
perspectives of AI. Hybrid regimes were found to prioritize foundational capacity
building. Meanwhile, Authoritarian Regimes uniquely address social supervision
and centralized regulatory mechanisms, framing AI development within a broader
context of state control and oversight.
The second sub-question asked whether different government regimes place special
emphasis on distinct ethical considerations. We addressed this on a topic and
government level by examining overlap scores with the ethical framework that we
created. Both LDA and BERTopic models produced consistent results, with the top
3 and bottom 3 topics aligning with the same framework categories. While Flawed
Democracy and Authoritarian Regime had more topics aligning with the two ethics
topics (“Ethical Impact Assessment” and “Ethical Governance and Stewardship”),
and Full Democracy and Hybrid Regime touched on other topics (e.g., “Development
and International Cooperation”) among the ethics ones. Overall, all regimes more
or less discussed ethics, though the nuances in emphasis were not enough to draw
strong, regime-specific conclusions.
61
6. Conclusion
To complement the two previous subsections and fully address the main research
question regarding the differences between LDA and BERTopic, we employed OCTIS
to quantitatively compare their topic modeling performance. The OCTIS results
indicate that LDA outperforms BERTopic in two out of four evaluation metrics,
suggesting it is more effective at generating coherently consistent topics. However,
BERTopic demonstrated better topic diversity, which supports the qualitative obser-
vations that its topics were more distinct and with less overlap. This again reinforces
the finding that BERTopic produces themes that are easier for humans to interpret
and connect.
As a final conclusion of our findings, this study also aimed to reflect on topic
modeling as a methodological tool in political science. While LDA performed better
on traditional coherence metrics, our qualitative analysis found BERTopic’s output to
be more meaningful to human interpretation. Despite the methodological differences,
both models often identified similar topics and themes related to social science within
the government types. The overlap suggests that, while the model selection impacts
the interpretability of the results, the topics remain consistent, supporting the validity
of the results related to the social science domain. This points to a broader issue
within topic modeling: There remains a gap between how models are evaluated and
how their outputs are used in practice, as standard metrics do not fully capture the
value of coherence, clarity, or relevance in human-centered research contexts.
6.2 Limitations
Even though we have successfully answered our primary questions, we encountered
several limitations along the way.
To begin with, data processing faced computational and time constraints. Finding,
collecting, and processing data takes time, and many things need to be considered.
For instance, the majority of documents had to be translated due to multilingualism.
Google Translate API was used since more precise and accurate translation models
are computationally expensive. However, this library struggles with input containing
more than one language, requiring manual and thorough inspection. Additionally,
introducing automated translation also risked altering or rephrasing the original
ideas of the corpus.
Moreover, a deeper understanding of site-protection scraping mechanisms could
have increased the size and quality of the scraped dataset. Even though we tried
considering several edge cases to overcome the website bot protection, a deeper
inspection of how certain websites handle automatic access (i.e., by code and not
manual human access) could be explored.
In addition to the latter two points, while most of the topics are coherent and
interpretable, the data could have benefited from a more thorough cleaning process.
While interpreting the topics, the noisy text from PDF scraping and translation was
visible, limiting the full understanding of those documents.
Furthermore, some country documents were notably larger than others. In particular,
62
6. Conclusion
the U.S. in the Flawed Democracy contained several times more tokens than the next
largest document within that regime. Although the document contributed valuable
insights to our analysis and discussion, the output topics of both models were heavily
influenced by these large documents, potentially underrepresenting other countries.
Finally, time constraints limited both the depth of qualitative analysis and the extent
of model optimization. In particular, a more thorough hyperparameter search and
fine-tuning process could have led to more robust topic-modeling results. This, in
turn, would have supported a more nuanced interpretation of thematic patterns
across regime types.
6.3 Further Research
Further research may be conducted to address the limitations discussed in the
previous section and explore other potential fields within AI policies across different
government types. Future work could consider the following areas:
• Exploration of Different Models: Only two topic models were explored
in this research. Initially, we had the goal to include the third topic model,
LEGAL-BERT, to measure whether the BERT model fine-tuned on legal
data would outperform the other classic topic models. Additionally, a hybrid
of BERT and LDA models could be implemented to incorporate strengths
from both models. Moreover, the BERT model has a limitation of input
not exceeding 512 tokens. Therefore, it might be of interest to explore other
potential models for this task.
• Parameter Selection: Additional attention could be paid to the hyperparam-
eter selection, as it is as important as selecting the right model. Alternative
methods could include relying only on quantitative metrics (e.g., coherence
score) or changing the number of output topics,
• Compare Policies across Different Countries or Over Time: A potential
area to explore could include a more in-depth comparison between the policies
themselves. For instance, one could see whether a country’s geographical
position has an impact on its policies. In particular, compare regions such as
China, the European Union, and the U.S. and see whether there are distinct
differences. Moreover, focusing on a certain country over time could reveal
how the policies change over time with different administrations in charge and
what they prioritize.
63
6. Conclusion
64
Bibliography
[1] A. Yoder, M. Hickok, G. S. Thompson, and K. Caunes, Artificial Intelli-
gence and Democratic Values 2025, Volume I, M. Rotenberg, Ed. Washington,
D.C.: Center for AI and Digital Policy, 2025, isbn: 979-8218669669. [Online].
Available: https://www.caidp.org/reports/aidv-2025/.
[2] OECD.AI, powered by EC/OECD, Database of national ai policies, https:
//oecd.ai, Accessed: 2025-05-24, 2021.
[3] Economist Intelligence Unit, “Democracy index 2023: Age of conflict,” Feb. 2024,
Accessed: 2024-12-03. [Online]. Available: https://www.economistgroup.com/
press-centre/economist-intelligence/eius-2023-democracy-index-
conflict-and-polarisation-drive-a-new-low-for.
[4] K. Chowdhary and K. Chowdhary, “Natural language processing,” Fundamen-
tals of artificial intelligence, pp. 603–649, 2020.
[5] D. Khurana, A. Koli, K. Khatter, and S. Singh, “Natural language process-
ing: State of the art, current trends and challenges,” Multimedia tools and
applications, vol. 82, no. 3, pp. 3713–3744, 2023.
[6] J. Eisenstein, Introduction to natural language processing. MIT press, 2019.
[7] A. Vaswani, N. Shazeer, N. Parmar, et al., “Attention is all you need,” in
Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S.
Bengio, et al., Eds., vol. 30, Curran Associates, Inc., 2017. [Online]. Available:
https://proceedings.neurips.cc/paper_files/paper/2017/file/
3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf.
[8] D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent dirichlet allocation,” Journal
of machine Learning research, vol. 3, no. Jan, pp. 993–1022, 2003.
[9] M. Hoffman, F. Bach, and D. Blei, “Online learning for latent dirichlet alloca-
tion,” advances in neural information processing systems, vol. 23, 2010.
[10] M. Grootendorst, “Bertopic: Neural topic modeling with a class-based tf-idf
procedure,” arXiv preprint arXiv:2203.05794, 2022.
[11] L. Gan, T. Yang, Y. Huang, et al., “Experimental comparison of three topic
modeling methods with lda, top2vec and bertopic,” in Artificial Intelligence
and Robotics, H. Lu and J. Cai, Eds., Singapore: Springer Nature Singapore,
2024, pp. 376–391.
[12] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training
of deep bidirectional transformers for language understanding,” in Proceed-
ings of the 2019 Conference of the North American Chapter of the Associa-
tion for Computational Linguistics: Human Language Technologies, Volume
1 (Long and Short Papers), J. Burstein, C. Doran, and T. Solorio, Eds., Min-
65
Bibliography
neapolis, Minnesota: Association for Computational Linguistics, Jun. 2019,
pp. 4171–4186. doi: 10.18653/v1/N19- 1423. [Online]. Available: https:
//aclanthology.org/N19-1423/.
[13] A. Radford, K. Narasimhan, T. Salimans, I. Sutskever, et al., “Improving
language understanding by generative pre-training,” 2018. [Online]. Available:
https://cdn.openai.com/research-covers/language-unsupervised/
language_understanding_paper.pdf.
[14] E. Voita, D. Talbot, F. Moiseev, R. Sennrich, and I. Titov, “Analyzing multi-
head self-attention: Specialized heads do the heavy lifting, the rest can be
pruned,” in Proceedings of the 57th Annual Meeting of the Association for Com-
putational Linguistics, A. Korhonen, D. Traum, and L. Màrquez, Eds., Florence,
Italy: Association for Computational Linguistics, Jul. 2019, pp. 5797–5808.
doi: 10.18653/v1/P19-1580. [Online]. Available: https://aclanthology.
org/P19-1580/.
[15] N. Patwardhan, S. Marrone, and C. Sansone, “Transformers in the real world:
A survey on nlp applications,” Information, vol. 14, no. 4, p. 242, 2023. doi:
10.3390/info14040242. [Online]. Available: https://doi.org/10.3390/
info14040242.
[16] T. Lin, Y. Wang, X. Liu, and X. Qiu, “A survey of transformers,” AI Open,
vol. 3, pp. 111–132, 2022, issn: 2666-6510. doi: https://doi.org/10.1016/
j.aiopen.2022.10.001. [Online]. Available: https://www.sciencedirect.
com/science/article/pii/S2666651022000146.
[17] S. Islam, H. Elmekki, A. Elsebai, et al., “A comprehensive survey on applications
of transformers for deep learning tasks,” Expert Systems with Applications,
vol. 241, p. 122 666, 2024, issn: 0957-4174. doi: https://doi.org/10.1016/j.
eswa.2023.122666. [Online]. Available: https://www.sciencedirect.com/
science/article/pii/S0957417423031688.
[18] A. Reuter, A. Thielmann, C. Weisser, B. Säfken, and T. Kneib, “Probabilistic
topic modeling with transformer representations,” IEEE Transactions on Neural
Networks and Learning Systems, pp. 1–15, 2025. doi: 10.1109/TNNLS.2025.
3538262.
[19] N. M. Gardazi, A. Daud, M. K. Malik, A. Bukhari, T. Alsahfi, and B. Alshe-
maimri, “Bert applications in natural language processing: A review,” Artificial
Intelligence Review, vol. 58, no. 6, p. 166, 2025. doi: 10.1007/s10462-025-
11162- 5. [Online]. Available: https://doi.org/10.1007/s10462- 025-
11162-5.
[20] Y. Zhou and V. Srikumar, “A closer look at how fine-tuning changes BERT,”
in Proceedings of the 60th Annual Meeting of the Association for Computa-
tional Linguistics (Volume 1: Long Papers), S. Muresan, P. Nakov, and A.
Villavicencio, Eds., Dublin, Ireland: Association for Computational Linguistics,
May 2022, pp. 1046–1061. doi: 10.18653/v1/2022.acl-long.75. [Online].
Available: https://aclanthology.org/2022.acl-long.75/.
[21] C. Qu, L. Yang, M. Qiu, W. B. Croft, Y. Zhang, and M. Iyyer, “Bert with history
answer embedding for conversational question answering,” in Proceedings of the
42nd International ACM SIGIR Conference on Research and Development in
Information Retrieval, ser. SIGIR’19, Paris, France: Association for Computing
66
Bibliography
Machinery, 2019, pp. 1133–1136, isbn: 9781450361729. doi: 10.1145/3331184.
3331341. [Online]. Available: https://doi.org/10.1145/3331184.3331341.
[22] W. Zheng, S. Lu, Z. Cai, R. Wang, L. Wang, and L. Yin, “Pal-bert: An improved
question answering model,” Computer Modeling in Engineering & Sciences,
vol. 10, 2023.
[23] Y. Yu, Y. Wang, J. Mu, et al., “Chinese mineral named entity recognition based
on bert model,” Expert Systems with Applications, vol. 206, p. 117 727, 2022,
issn: 0957-4174. doi: https://doi.org/10.1016/j.eswa.2022.117727.
[Online]. Available: https://www.sciencedirect.com/science/article/
pii/S0957417422010090.
[24] M. Mohseni and A. Tebbifakhr, “MorphoBERT: A Persian NER system with
BERT and morphological analysis,” in Proceedings of the First International
Workshop on NLP Solutions for Under Resourced Languages (NSURL 2019)
co-located with ICNLSP 2019 - Short Papers, A. A. Freihat and M. Abbas, Eds.,
Trento, Italy: Association for Computational Linguistics, Nov. 2019, pp. 23–30.
[Online]. Available: https://aclanthology.org/2019.nsurl-1.4/.
[25] H. Darji, J. Mitrović, and M. Granitzer, “German bert model for legal named
entity recognition,” in Proceedings of the 15th International Conference on
Agents and Artificial Intelligence, SCITEPRESS - Science and Technology
Publications, 2023, pp. 723–728. doi: 10.5220/0011749400003393. [Online].
Available: http://dx.doi.org/10.5220/0011749400003393.
[26] Y. Sun, Y. Zheng, C. Hao, and H. Qiu, “NSP-BERT: A prompt-based few-shot
learner through an original pre-training task —— next sentence prediction,” in
Proceedings of the 29th International Conference on Computational Linguistics,
N. Calzolari, C.-R. Huang, H. Kim, et al., Eds., Gyeongju, Republic of Korea:
International Committee on Computational Linguistics, Oct. 2022, pp. 3233–
3250. [Online]. Available: https://aclanthology.org/2022.coling-1.286/.
[27] Y. Levine, B. Lenz, O. Dagan, et al., “SenseBERT: Driving some sense into
BERT,” in Proceedings of the 58th Annual Meeting of the Association for
Computational Linguistics, D. Jurafsky, J. Chai, N. Schluter, and J. Tetreault,
Eds., Online: Association for Computational Linguistics, Jul. 2020, pp. 4656–
4667. doi: 10.18653/v1/2020.acl-main.423. [Online]. Available: https:
//aclanthology.org/2020.acl-main.423/.
[28] D. Song, S. Ma, Z. Sun, S. Yang, and L. Liao, “Kvl-bert: Knowledge enhanced
visual-and-linguistic bert for visual commonsense reasoning,” Knowledge-Based
Systems, vol. 230, p. 107 408, 2021, issn: 0950-7051. doi: https://doi.
org/10.1016/j.knosys.2021.107408. [Online]. Available: https://www.
sciencedirect.com/science/article/pii/S0950705121006705.
[29] A. Chiche and B. Yitagesu, “Part of speech tagging: A systematic review of
deep learning and machine learning approaches,” Journal of Big Data, vol. 9,
no. 1, p. 10, 2022.
[30] S. Pei, L. Wang, T. Shen, and Z. Ning, “Da-bert: Enhancing part-of-speech
tagging of aspect sentiment analysis using bert,” in Advanced Parallel Processing
Technologies, P.-C. Yew, P. Stenström, J. Wu, X. Gong, and T. Li, Eds., Cham:
Springer International Publishing, 2019, pp. 86–95, isbn: 978-3-030-29611-7.
67
Bibliography
[31] W. Liu, S. Lin, B. Gao, et al., “Bert-pos: Sentiment analysis of mooc reviews
based on bert with part-of-speech information,” in Artificial Intelligence in
Education. Posters and Late Breaking Results, Workshops and Tutorials, In-
dustry and Innovation Tracks, Practitioners’ and Doctoral Consortium, M. M.
Rodrigo, N. Matsuda, A. I. Cristea, and V. Dimitrova, Eds., Cham: Springer
International Publishing, 2022, pp. 371–374, isbn: 978-3-031-11647-6.
[32] R. Saidi, F. Jarray, and M. Mansour, “A bert based approach for arabic pos
tagging,” in International Work-Conference on Artificial Neural Networks,
Springer, 2021, pp. 311–321.
[33] L. Bobojonova, A. Akhundjanova, P. S. Ostheimer, and S. Fellenz, “BBPOS:
BERT-based part-of-speech tagging for Uzbek,” in Proceedings of the First
Workshop on Language Models for Low-Resource Languages, H. Hettiarachchi,
T. Ranasinghe, P. Rayson, et al., Eds., Abu Dhabi, United Arab Emirates:
Association for Computational Linguistics, Jan. 2025, pp. 287–293. [Online].
Available: https://aclanthology.org/2025.loreslm-1.23/.
[34] M. A. Cheragui, A. H. Dahou, and A. Abdedaiem, “Exploring bert models
for part-of-speech tagging in the algerian dialect: A comprehensive study,”
in Proceedings of the 6th International Conference on Natural Language and
Speech Processing (ICNLSP 2023), 2023, pp. 140–150.
[35] N. Reimers and I. Gurevych, “Sentence-bert: Sentence embeddings using
siamese bert-networks,” in Proceedings of the 2019 Conference on Empirical
Methods in Natural Language Processing, Association for Computational Lin-
guistics, Nov. 2019. [Online]. Available: https://arxiv.org/abs/1908.10084.
[36] N. Khodeir and F. Elghannam, “Efficient topic identification for urgent mooc
forum posts using bertopic and traditional topic modeling techniques,” Ed-
ucation and Information Technologies, vol. 30, pp. 5501–5527, 2025. doi:
10.1007/s10639- 024- 13003- 4. [Online]. Available: https://doi.org/
10.1007/s10639-024-13003-4.
[37] M. Grootendorst. “Bertopic - the algorithm.” Accessed: 2025-04-22. (2024),
[Online]. Available: https://maartengr.github.io/BERTopic/algorithm/
algorithm.html.
[38] S. P. Crain, K. Zhou, S.-H. Yang, and H. Zha, “Dimensionality reduction and
topic modeling: From latent semantic indexing to latent dirichlet allocation and
beyond,” in Mining Text Data, C. C. Aggarwal and C. Zhai, Eds. Boston, MA:
Springer US, 2012, pp. 129–161, isbn: 978-1-4614-3223-4. doi: 10.1007/978-
1-4614-3223-4_5. [Online]. Available: https://doi.org/10.1007/978-1-
4614-3223-4_5.
[39] M. Allaoui, M. L. Kherfi, and A. Cheriet, “Considerably improving clustering
algorithms using umap dimensionality reduction technique: A comparative
study,” in Jul. 2020, pp. 317–325, isbn: 978-3-030-51934-6. doi: 10.1007/978-
3-030-51935-3_34.
[40] L. McInnes, J. Healy, S. Astels, et al., “Hdbscan: Hierarchical density based
clustering.,” J. Open Source Softw., vol. 2, no. 11, p. 205, 2017.
[41] L. McInnes, J. Healy, and S. Astels, Hdbscan documentation: Parameter se-
lection, Accessed: 2025-04-21, 2016. [Online]. Available: https://hdbscan.
readthedocs.io/en/latest/parameter_selection.html.
68
Bibliography
[42] S.-l. developers, Feature extraction, Accessed: 2025-04-21, 2025. [Online]. Avail-
able: https://scikit-learn.org/stable/modules/feature_extraction.
html#text-feature-extraction.
[43] S.-l. developers, Tfidftransformer, Accessed: 2025-04-21, 2025. [Online]. Avail-
able: https://scikit-learn.org/stable/modules/generated/sklearn.
feature_extraction.text.TfidfTransformer.html#sklearn.feature_
extraction.text.TfidfTransformer.
[44] M. Grootendorst. “Bertopic - fine-tune topic representation.” Accessed: 2025-
04-21. (2024), [Online]. Available: https://maartengr.github.io/BERTopic/
api/representations.html.
[45] M. Grootendorst. “Bertopic - representation models.” Accessed: 2025-04-21.
(2024), [Online]. Available: https : / / maartengr . github . io / BERTopic /
getting_started/representation/representation.html.
[46] M. Röder, A. Both, and A. Hinneburg, “Exploring the space of topic coherence
measures,” in Proceedings of the eighth ACM international conference on Web
search and data mining, 2015, pp. 399–408.
[47] J. H. Lau, D. Newman, and T. Baldwin, “Machine reading tea leaves: Automat-
ically evaluating topic coherence and topic model quality,” in Proceedings of the
14th Conference of the European Chapter of the Association for Computational
Linguistics, 2014, pp. 530–539.
[48] J. Chang, S. Gerrish, C. Wang, J. Boyd-Graber, and D. Blei, “Reading tea
leaves: How humans interpret topic models,” Advances in neural information
processing systems, vol. 22, 2009.
[49] S. Terragni, E. Fersini, B. G. Galuzzi, P. Tropeano, and A. Candelieri, “Octis:
Comparing and optimizing topic models is simple!” In Proceedings of the 16th
Conference of the European Chapter of the Association for Computational
Linguistics: System Demonstrations, 2021, pp. 263–270.
[50] W. Webber, A. Moffat, and J. Zobel, “A similarity measure for indefinite
rankings,” ACM Transactions on Information Systems (TOIS), vol. 28, no. 4,
pp. 1–38, 2010.
[51] K. Manheim and L. Kaplan, “Artificial intelligence: Risks to privacy and
democracy,” Yale JL & Tech., vol. 21, p. 106, 2019.
[52] K. Crawford, The atlas of AI: Power, politics, and the planetary costs of
artificial intelligence. Yale University Press, 2021.
[53] M. Veale and I. Brass, “Administration by algorithm? public management
meets public sector machine learning,” in Oxford University Press, 2019.
[54] UNESCO, Recommendation on the ethics of artificial intelligence, Programme
and Meeting Document, 2022. [Online]. Available: https://unesdoc.unesco.
org/ark:/48223/pf0000381137.
[55] B. Wagner, “Ethics as an escape from regulation. from “ethics-washing” to
ethics-shopping?,” 2018.
[56] L. Richardson, Beautiful soup documentation, Online, Accessed: March 5,
2025, 2023. [Online]. Available: https : / / www . crummy . com / software /
BeautifulSoup/bs4/doc/.
[57] VeNoMouS, Cloudscraper github repository, GitHub repository, Accessed: March
5, 2025, 2023. [Online]. Available: https://github.com/VeNoMouS/cloudscraper.
69
Bibliography
[58] SeleniumHQ, Selenium documentation, Online, Accessed: March 5, 2025, 2023.
[Online]. Available: https://www.selenium.dev/documentation/.
[59] py-pdf, Pypdf2 documentation, Online, Accessed: March 5, 2025, 2023. [Online].
Available: https://pypdf2.readthedocs.io/en/3.x/.
[60] Python Software Foundation, Python io.bytesio documentation, Online, Ac-
cessed: March 5, 2025, 2023. [Online]. Available: https://docs.python.org/
3/library/io.html#io.BytesIO.
[61] JaidedAI, Easyocr: Ready-to-use ocr with 80+ languages supported, https:
//github.com/JaidedAI/EasyOCR, Accessed: 2025-05-01, 2020.
[62] dwyl, English words github repository, GitHub repository, Accessed: March 5,
2025, 2023. [Online]. Available: https://github.com/dwyl/english-words.
[63] R. Řehřek and P. Sojka, “Software framework for topic modelling with large
corpora,” 2010.
[64] N. Reimers and I. Gurevych, “Sentence-BERT: Sentence embeddings using
Siamese BERT-networks,” in Proceedings of the 2019 Conference on Empirical
Methods in Natural Language Processing and the 9th International Joint Con-
ference on Natural Language Processing (EMNLP-IJCNLP), K. Inui, J. Jiang,
V. Ng, and X. Wan, Eds., Hong Kong, China: Association for Computational
Linguistics, Nov. 2019, pp. 3982–3992. doi: 10.18653/v1/D19-1410. [Online].
Available: https://aclanthology.org/D19-1410/.
70
A
Appendix 1
Framework Overview
The following framework outlines key dimensions for analyzing ethical and societal
impacts of AI systems. Each category includes relevant focus areas and guiding
principles for analysis.
Category Focus Areas and Considerations
Ethical Impact Assessment ethical review, impact assessment, risk prevention, hu-
man rights impact, fundamental freedom, due diligence,
oversight mechanisms, impact evaluation, socioeconomic
assessment, digital divide, transparency protocols, access
to information, decision-making autonomy, regulatory
framework, auditability, traceability, explainability, in-
clusion, public authorities, citizen participation
Ethical Governance and AI governance, inclusive, transparent, multidisciplinary,
Stewardship human rights law obligation, remediation mechanisms,
enforcement mechanisms, accountability, responsibility,
liability frameworks, auditability, system robustness,
safety and security risks, explainability, inclusive de-
velopment, innovation, SMEs, civil society organizations,
fundamental freedoms, cultural and social diversities, dis-
information, misinformation, algorithmic stereotyping,
access to AI, freedom of expression, policy prototypes,
strategic research, global collaboration, public oversight
Data Policy data governance, privacy by design, privacy impact as-
sessments, right to privacy, data security, personal and
sensitive data, data quality, gold standard datasets, an-
notating, disaggregated data, surveillance concerns, data
protection legislation, transparency mechanisms, fair
data sharing, consent, data trust, open data, interoper-
ability, cross-border data flow, responsible AI develop-
ment
I
A. Appendix 1
Category Focus Areas and Considerations
Development and Interna- AI ethics, ethical frameworks, international collaboration,
tional Cooperation platforms for cooperation, AI for development, education,
science, healthcare, agriculture, environment, natural
resources, infrastructure, economy, Global AI research,
data sharing, geo-technical divide, international law, tech
exchange, funding, policy consulting
Environment and Ecosys- environmental impact assessments, AI system lifecycle,
tems carbon footprint, energy consumption, raw material ex-
traction, sustainability, ecosystem monitoring, disaster
resilience, circular economy, sustainable finance, climate
mitigation, pollution detection and prevention, energy,
resource-efficient AI, safeguards and justification for AI
use
Gender gender equality, AI system lifecycle, transversal gen-
der perspective, gender action plans, dedicated public
funds, digital gender gaps, STEM education for girls
and women, career development, online violence preven-
tion, AI bias and stereotyping, economic incentives, best
practice transfer
Culture cultural heritage preservation, accessibility, endangered
languages, indigenous languages, cultural programs, AI’s
cultural impact, automated translation, language reduc-
tion, promoting diversity in algorithms, local content,
visibility, AI, arts, IP rights
Education and Research AI literacy, public education, digital divides, critical
thinking, media literacy, ethics in AI curricula, children’s
rights, gender inclusion, accessibility for disabilities and
minorities, ethical design, interdisciplinary research, AI
risks and limitations, AI in policy and academia
Communication and Infor- access to knowledge, freedom of expression, information
mation disclosure, automated content, communication regula-
tion, diverse viewpoints, disinformation, misinformation,
journalism, transparency, media recommendations
Economy and Labour labor markets, skill requirements, reskilling, job transi-
tions, AI unemployment protection, social protection,
fair competition, monopoly prevention, market exploita-
tion, compliance, trade, labour-intensive sector support
Health and Social Well- healthcare, mental health, physical health, disease miti-
Being gation, privacy, informed consent, human oversight, di-
agnostics, treatment, AI safety, validation, psychological
impact, youth, social isolation, addiction, elderly care,
disability support, human dignity in health-AI interac-
tions
Table A.1
II
A. Appendix 1
Data Collection and Processing
Table A.2 provides an overview of the status codes obtained by the URL addresses
in the initial dataset. The links were split into two categories – Working Links and
Broken Links – where the Working Links were selected for further data extraction
and processing.
Status Code Count
Working Links 200 461
202 4
Total 765
Broken Links 400 2
403 95
404 76
471 1
520 1
Exception 105
Total 280
Overall Total 1045
Table A.2: “Public access URL” Status Codes and Counts.
Table A.3 includes information for each government type. Particularly, it shows what
countries belong to a specific regime, the number of working links per country, and
total tokens obtained from those scraped websites.
Government Type Country Working URLs Total Tokens
Australia 13 11052
Austria 9 15973
Canada 17 16628
Costa Rica 1 1152
Denmark 8 4228
Finland 7 10893
France 40 24490
Germany 39 76876
Greece 1 34
Iceland 0 -
Ireland 4 2869
Full Democracies Japan 19 23807
Korea 4 1456
Luxembourg 8 5570
Mauritius 1 442
Continued on next page
III
A. Appendix 1
Government Type Country Working URLs Total Tokens
Netherlands 13 62478
New Zealand 9 22345
Norway 21 38373
Spain 25 331675
Sweden 10 9688
Switzerland 5 2327
United Kingdom 49 360136
Uruguay 4 26297
Argentina 4 2000
Belgium 19 4703
Brazil 10 34219
Bulgaria 2 575
Chile 6 22094
Colombia 29 141279
Cyprus 1 892
Czechia 14 5878
Estonia 13 51812
Hungary 4 1723
India 26 168613
Indonesia 1 346
Israel 12 79732
Flawed Democracies Italy 12 26522
Latvia 4 49817
Lithuania 3 2291
Malta 7 10025
Poland 5 72125
Portugal 20 28252
Romania 1 7
Serbia 22 35158
Singapore 26 16317
Slovak Republic 0 -
Slovenia 6 1863
South Africa 4 5026
Thailand 5 8213
United States 73 1107352
Armenia 4 2624
Kenya 1 2572
Mexico 11 1856
Morocco 0 -
Hybrid Regimes Nigeria 3 2209Peru 11 3129
Tunisia 3 511
Continued on next page
IV
A. Appendix 1
Government Type Country Working URLs Total Tokens
Türkiye 25 63371
Uganda 1 1182
Ukraine 1 1288
China 20 22248
Egypt 7 28497
Kazakhstan 6 17904
Russian Federation 6 3565
Authoritarian Regimes Rwanda 4 2065
Saudi Arabia 4 1783
United Arab Emirates 6 12072
Uzbekistan 5 5431
Viet Nam 7 12992
Table A.3: Detailed Metrics by Government Type and Country.
V
A. Appendix 1
LDA Hyperparameters
The following table shows the 5 (and 6 for Flawed Democracy) sets of hyperparameters
for each government type that resulted in the highest coherence and lowest perplexity
scores.
Government Type num_topics passes alpha eta Coherence Perplexity
10 20 0.01 0.05 0.4395 -7.6850
10 30 auto 0.05 0.4409 -7.6617
Full Democracy 10 30 symmetric auto 0.4400 -7.5846
10 30 auto auto 0.4400 -7.5846
10 30 auto symmetric 0.4400 -7.6171
5 30 auto 0.05 0.4484 -7.8093
5 30 0.01 0.05 0.4282 -7.8090
Flawed Democracy 5 30 asymmetric 0.05 0.4491 -7.8087
5 20 auto 0.05 0.4491 -7.8087
5 20 0.01 0.05 0.4491 -7.8092
10 30 auto 0.01 0.3671 -8.5689
15 20 asymmetric auto 0.4686 -7.3301
15 20 asymmetric symmetric 0.4686 -7.3667
Hybrid Regime 15 20 asymmetric 0.01 0.4650 -7.9421
15 30 asymmetric symmetric 0.4686 -7.3577
15 30 asymmetric auto 0.4684 -7.3248
15 20 0.01 0.01 0.4507 -7.8995
5 30 auto auto 0.4416 -7.4900
Authoritarian Regime 5 30 0.01 auto 0.4416 -7.4952
5 30 0.01 symmetric 0.4416 -7.5757
5 30 symmetric auto 0.4416 -7.4998
Table A.4: Grid Search Results for LDA Hyperparameters Across Government Types.
VI
A. Appendix 1
BERTopic Hyperparameters
Table A.5 shows the 5 and 10 sets of hyperparameters for each government type that
resulted in the highest coherence scores. For Hybrid Regime, the hyperparameter
grid search range was increased due to the limited number of output topics.
Government Type nr_topics n_neighbors cluster_size Coherence Score
4 10 10 0.5349
4 15 5 0.5349
Full Democracy 4 15 10 0.5349
16 5 5 0.5219
10 5 10 0.5150
16 15 10 0.5449
6 15 5 0.5434
Flawed Democracy 14 15 10 0.5386
16 10 15 0.5298
8 10 10 0.5251
16 20 10 0.7126
16 15 10 0.6760
16 2 15 0.6429
16 5 20 0.5942
Hybrid Regime 16 5 5 0.590016 5 10 0.5815
16 20 5 0.5664
16 5 15 0.5444
16 20 2 0.5091
16 10 2 0.4948
6 10 5 0.5539
8 10 5 0.5473
Authoritarian Regime 10 10 5 0.5406
14 10 5 0.5406
16 10 5 0.5406
Table A.5: Top 5 (and Top 10 for Hybrid Regime) Hyperparameter Sets for Each
Government Type.
Table A.6 shows the coherence scores obtained by BERTopic for each government
type. These scores were measured for the model with and without a representation
model (POS Tagging).
VII
A. Appendix 1
Government Type Coherence Without POS Coherence With POS
Full Democracy 0.7548 0.5349
Flawed Democracy 0.9139 0.5449
Hybrid Regime 0.7107 0.7126
Authoritarian Regime 0.7225 0.5539
Table A.6: Coherence Scores per Government Type: With and Without POS Tagging.
VIII
A. Appendix 1
Framework Overlap Evaluation
Tables A.7 and A.8 provide the output of topic overlap from both – LDA and
BERTopic – models with the framework in Table A.1. The Framework Topics
with Raw Scores column provides the framework categories that the specific topic
overlapped with, along with the overlap score. The higher the number, the stronger
the overlap. An empty row suggests that no topic keyword matched any framework
category.
Topic ID Framework Topics with Raw Scores
LDA Model - Full Democracy
Topic 0 Communication and Information (0.50), Ethical Governance and Stewardship
(0.50)
Topic 1 Development and International Cooperation (1.00), Ethical Governance and
Stewardship (0.33)
Topic 2 -
Topic 3 Development and International Cooperation (0.50)
Topic 4 Gender (0.50), Culture (0.33)
Topic 5 Communication and Information (0.50), Ethical Impact Assessment (0.50),
Data Policy (0.33)
Topic 6 Development and International Cooperation (2.00), Culture (1.00), Environ-
ment and Ecosystems (1.00)
Topic 7 Ethical Governance and Stewardship (1.00), Gender (0.50), Health and Social
Well-Being: (0.50)
Topic 8 Health and Social Well-Being (1.50), Development and International Coopera-
tion (1.00)
Topic 9 Culture (0.50), Data Policy (0.33)
LDA Model - Flawed Democracy
Topic 0 Culture (0.50)
Topic 1 Development and International Cooperation (1.00), Economy and Labour
(0.75), Ethical Impact Assessment (0.50)
Topic 2 Ethical Impact Assessment (1.50), Health and Social Well-Being (0.50)
Topic 3 Health and Social Well-Being (0.50)
Topic 4 Ethical Impact Assessment (1.33)
Topic 5 Development and International Cooperation (1.00), Ethical Governance and
Stewardship (1.00), Ethical Impact Assessment (0.50), Economy and Labour
(0.25)
Topic 6 Economy and Labour (0.50), Ethical Governance and Stewardship (0.33)
Topic 7 Data Policy (0.33)
Topic 8 Ethical Governance and Stewardship (1.50), Health and Social Well-Being
(0.50), Data Policy (0.33)
Topic 9 -
LDA Model - Hybrid Regime
Topic 0 Development and International Cooperation (1.00), Ethical Governance and
Stewardship (0.33), Gender (0.33)
Topic 1 Ethical Governance and Stewardship (1.00)
IX
A. Appendix 1
Continued from previous page
Topic ID Framework Topics with Raw Scores
Topic 2 Data Policy (0.33)
Topic 3 Ethical Governance and Stewardship (2.00), Ethical Impact Assessment (1.00)
Topic 4 Development and International Cooperation (1.00), Economy and Labour (0.50)
Topic 5 Ethical Governance and Stewardship (1.50), Ethical Impact Assessment (1.00),
Development and International Cooperation (1.00), Data Policy (0.50), Health
and Social Well-Being (0.50)
Topic 6 Data Policy (0.50), Development and International Cooperation (0.50), Ethical
Governance and Stewardship (0.50)
Topic 7 Gender (0.5)
Topic 8 -
Topic 9 Gender (0.33)
Topic 10 Development and International Cooperation (1.00), Culture (0.50)
Topic 11 Ethical Governance and Stewardship (0.33)
Topic 12 Development and International Cooperation (1.00), Communication and Infor-
mation (0.50), Culture (0.50), Environment and Ecosystems (0.50)
Topic 13 Ethical Governance and Stewardship (0.50)
Topic 14 -
LDA Model - Authoritarian Regime
Topic 0 Data Policy (0.83), Economy and Labour (0.50)
Topic 1 Ethical Governance and Stewardship (1.50), Data Policy (1.00), Economy and
Labour (0.50), Ethical Impact Assessment (0.50)
Topic 2 -
Topic 3 Development and International Cooperation (1.00), Environment and Ecosys-
tems (0.33)
Topic 4 Communication and Information (0.50), Data Policy (0.50), Ethical Governance
and Stewardship (0.33)
Topic 5 -
Topic 6 Development and International Cooperation (1.00), Ethical Governance and
Stewardship (1.00), Data Policy (0.67), Health and Social Well-Being (0.50)
Topic 7 Ethical Governance and Stewardship (1.00)
Topic 8 Communication and Information (0.50)
Topic 9 Data Policy (0.50), Ethical Impact Assessment (0.50)
Topic 10 Development and International Cooperation (1.00), Communication and Infor-
mation (0.50)
Topic 11 Data Policy (0.50), Development and International Cooperation (0.50)
Topic 12 Ethical Governance and Stewardship (1.50), Health and Social Well-Being
(1.00), Ethical Impact Assessment (0.50)
Topic 13 -
Topic 14 Health and Social Well-Being (1.00), Gender (0.33)
Table A.7: LDA: Raw scores for Topics by Government Type.
X
A. Appendix 1
Topic ID Framework Topics with Raw Scores
BERTopic Model - Full Democracy
Topic 0 Ethical Impact Assessment (1.00), Development and International Cooperation
(0.50), Economy and Labour (0.50), Ethical Governance and Stewardship (0.50),
Gender (0.50), Health and Social Well-Being (0.50)
Topic 1 Ethical Impact Assessment (1.67), Data Policy (1.00), Culture (0.50), Ethical
Governance and Stewardship (0.50), Health and Social Well-Being (0.50)
Topic 2 Ethical Impact Assessment (1.00), Economy and Labour (0.50), Ethical Gover-
nance and Stewardship (0.50), Health and Social Well-Being (0.50)
Topic 3 Gender (0.50), Development and International Cooperation (0.33)
Topic 4 Culture (0.50), Ethical Impact Assessment (0.50)
Topic 5 Data Policy (1.00), Development and International Cooperation (0.50), Ethical
Governance and Stewardship (0.50)
Topic 6 Ethical Governance and Stewardship (0.83)
Topic 7 Development and International Cooperation (2.50)
Topic 8 Ethical Governance and Stewardship (1.00), Development and International
Cooperation (0.50)
Topic 9 Ethical Impact Assessment (0.50), Data Policy (0.33), Environment and Ecosys-
tems (0.25)
BERTopic Model - Flawed Democracy
Topic 0 -
Topic 1 Development and International Cooperation (1.00), Ethical Governance and
Stewardship (1.00), Health and Social Well-Being (0.50)
Topic 2 Ethical Impact Assessment (2.00)
Topic 3 Ethical Governance and Stewardship (2.00), Health and Social Well-Being:
(0.50), Ethical Impact Assessment (0.50), Data Policy (0.33), Development and
International Cooperation (0.33)
Topic 4 Ethical Impact Assessment (1.33), Communication and Information (0.50),
Health and Social Well-Being (0.50)
Topic 5 Ethical Impact Assessment (1.50), Ethical Governance and Stewardship (1.00),
Gender (0.67)
Topic 6 Development and International Cooperation (1.00), Ethical Governance and
Stewardship (1.00)
Topic 7 Ethical Impact Assessment (1.00), Ethical Governance and Stewardship (0.50),
Gender (0.50)
Topic 8 Health and Social Well-Being (1.00)
Topic 9 Economy and Labour (2.50), Development and International Cooperation (1.00)
BERTopic Model - Hybrid Regime
Topic 0 Ethical Impact Assessment (1.17), Gender (0.67), Development and Interna-
tional Cooperation (0.50), Economy and Labour (0.50), Health and Social
Well-Being (0.50)
Topic 1 Ethical Impact Assessment (1.50), Ethical Governance and Stewardship (1.00),
Education and Research (1.00), Culture (0.50), Development and International
Cooperation (0.33)
Topic 2 Ethical Governance and Stewardship (1.00), Environment and Ecosystems
(0.50), Health and Social Well-Being (0.50)
XI
A. Appendix 1
Continued from previous page
Topic ID Framework Topics with Raw Scores
Topic 3 Data Policy (2.50), Development and International Cooperation (1.00)
Topic 4 Data Policy (1.00), Development and International Cooperation (1.00), Econ-
omy and Labour (1.00), Ethical Impact Assessment (0.67)
Topic 5 Ethical Impact Assessment (1.17), Development and International Cooperation
(1.00), Culture (0.50), Health and Social Well-Being (0.50)
Topic 6 Communication and Information (0.50), Health and Social Well-Being (0.50),
Development and International Cooperation (0.33)
Topic 7 Data Policy (1.00), Ethical Impact Assessment (0.50)
Topic 8 Ethical Impact Assessment (1.50), Development and International Cooperation
(1.00)
Topic 9 Ethical Governance and Stewardship (2.00)
BERTopic Model - Authoritarian Regime
Topic 0 Ethical Governance and Stewardship (1.50), Data Policy (0.50)
Topic 1 Ethical Governance and Stewardship (3.17), Economy and Labour (1.00),
Ethical Impact Assessment (0.50)
Topic 2 Data Policy (2.17), Economy and Labour (2.00), Health and Social Well-Being
(1.00), Ethical Impact Assessment (0.50)
Topic 3 Development and International Cooperation (0.50), Economy and Labour
(0.50), Ethical Impact Assessment (0.50), Gender (0.50), Health and Social
Well-Being (0.50)
Topic 4 Data Policy (0.50), Development and International Cooperation (0.50), Ethical
Governance and Stewardship (0.50), Gender (0.50)
Topic 5 Development and International Cooperation (1.33), Ethical Impact Assessment
(0.50), Environment and Ecosystems (0.33)
Topic 6 Ethical Impact Assessment (2.00), Development and International Cooperation
(0.50)
Topic 7 Data Policy (1.67), Ethical Impact Assessment (1.50), Culture (0.50), Ethical
Governance and Stewardship (0.50)
Topic 8 Development and International Cooperation (1.33), Environment and Ecosys-
tems (0.83), Culture (0.50), Ethical Governance and Stewardship (0.33)
Table A.8: BERTopic: Raw scores for Topics by Government Type.
XII
A. Appendix 1
Overlap Score Distribution
Figure A.1 includes the distribution of all raw overlap scores with the framework
obtained by each topic. The x-axis of the plot indicates:
• Score < 1: A fraction of a keyword has an overlap with the framework.
• Score = 1: One keyword from a topic has a perfect overlap with a framework
category/A combination of two or more partial overlaps (e.g., two partial
overlaps each with score 0.5).
• Score > 1: More than one perfect overlap/A combination of several partial
overlaps were found.
LDA
50 BERTopic
40
30
20
10
0
0.25 0.33 0.5 0.67 0.75 0.83 1.0 1.17 1.33 1.5 1.67 2.0 2.17 2.5 3.17
Score
Figure A.1: Distributions of Overlap Scores by Topic Model.
XIII
Frequency
A. Appendix 1
OCTIS Comparison
Figure A.2: Comparison of Topic Diversity by Model and Government Type.
Figure A.3: Comparison of CV Coherence by Model and Government Type.
XIV
A. Appendix 1
Figure A.4: Comparison of IRBO by Model and Government Type.
Figure A.5: Comparison of WECoherencePairwise by Model and Government Type.
XV