A Comparative Study with LDA and BERTopic: AI Policies Across Different Democracy Indexes Master’s thesis in Applied Data Science Anne Söderwall & Gabija Telešova Department of Computer Science and Engineering CHALMERS UNIVERSITY OF TECHNOLOGY UNIVERSITY OF GOTHENBURG Gothenburg, Sweden 2025 Master’s thesis 2025 A Comparative Study with LDA and BERTopic: AI Policies Across Different Democracy Indexes Anne Söderwall & Gabija Telešova Department of Computer Science and Engineering Chalmers University of Technology University of Gothenburg Gothenburg, Sweden 2025 A Comparative Study with LDA and BERTopic: AI Policies Across Different Democ- racy Indexes Anne Söderwall & Gabija Telešova © Anne Söderwall & Gabija Telešova, 2025. Supervisor: Denitsa Saynova, Department of Computer Science and Engineering Examiner: Moa Johansson, Department of Computer Science and Engineering Master’s Thesis 2025 Department of Computer Science and Engineering Chalmers University of Technology and University of Gothenburg SE-412 96 Gothenburg Telephone +46 31 772 1000 Typeset in LATEX Gothenburg, Sweden 2025 iv A Comparative Study with LDA and BERTopic: AI Policies Across Different Democ- racy Indexes Anne Söderwall & Gabija Telešova Department of Computer Science and Engineering Chalmers University of Technology and University of Gothenburg Abstract In times of global political instability, paired with an evolving and experimental phase in artificial intelligence, the future of AI remains unclear. What is even less defined is how governments around the world plan to use, regulate, or develop it. Therefore, this thesis aims to evaluate how topic models perform in policy documents and how different government types influence these policies. This was done by scraping AI policies collected by the OECD’s AI Policy Observatory across different countries, later categorized by government type – Full Democracy, Flawed Democracy, Hybrid Regime, and Authoritarian Regime. Two topic models, LDA and BERTopic, were applied to extract topics and keywords for each regime. The results suggest that LDA’s topics were more detailed but less interpretable, whilst BERTopic was better suited for human interpretation and understanding. All government types, more or less, focused on ethics and digital governance themes. On a deeper level, Full Democracy emphasized regulations of already existing technology, Flawed Democracy focused on military development, Hybrid Regime was centered around the actual implementation, and Authoritarian Regime emphasized research and a broader context of state control. The final results obtained by using OCTIS measurements proposed that LDA exceeded in quantitative and statistical evaluations, while BERTopic was consistently preferred for human interpretation. This discrepancy illustrates the methodological tension between how models are evaluated and how understandable they are in practical application. Keywords: data science, political science, thesis, AI, policies, government, BERTopic, LDA, OCTIS, ethical framework v Acknowledgements We would like to express our sincere gratitude to our supervisor for her continuous support, guidance, and encouragement throughout this thesis. We are also thankful to our families, friends, and peers for their moral support, helpful feedback, and constructive suggestions during the writing process. Anne Söderwall & Gabija Telešova, Gothenburg, 2025-06-10 vii Contents List of Figures xi List of Tables xiii 1 Introduction 1 1.1 Research Question . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Thesis Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2 Theory 5 2.1 Natural Language Processing . . . . . . . . . . . . . . . . . . . . . . 5 2.1.1 Latent Dirichlet Allocation . . . . . . . . . . . . . . . . . . . . 6 2.1.1.1 Structure and Parameters . . . . . . . . . . . . . . . 6 2.1.1.2 Inference and Parameter Estimation . . . . . . . . . 7 2.1.1.3 Bag-of-Words . . . . . . . . . . . . . . . . . . . . . . 9 2.1.2 BERTopic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.1.2.1 Transformers . . . . . . . . . . . . . . . . . . . . . . 9 2.1.2.2 BERT . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.1.2.3 Model Architecture . . . . . . . . . . . . . . . . . . . 14 2.2 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.2.1 Intrinsic Evaluation Metrics . . . . . . . . . . . . . . . . . . . 19 2.2.2 OCTIS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.3 Political and Ethical Context of AI Policy . . . . . . . . . . . . . . . 21 3 Methods 23 3.1 Data Collection and Processing . . . . . . . . . . . . . . . . . . . . . 23 3.1.1 Data Cleaning (Pre-Scraping Stage) . . . . . . . . . . . . . . . 23 3.1.2 Scraping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.1.3 Data Preprocessing (Post-Scraping Stage) . . . . . . . . . . . 24 3.2 LDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.2.1 Text Processing and Chunking . . . . . . . . . . . . . . . . . . 27 3.2.2 LDA Modeling and Hyperparameter Tuning . . . . . . . . . . 27 3.3 BERTopic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.4 Qualitative Topic Analysis . . . . . . . . . . . . . . . . . . . . . . . . 31 3.5 Quantitative Comparison Using OCTIS . . . . . . . . . . . . . . . . . 32 3.6 Ethical Topic Variation Across Government Type . . . . . . . . . . . 32 ix Contents 3.6.1 UNESCO Recommendation on the Ethics of Artificial Intelli- gence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.6.2 Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.6.3 Analytical Application . . . . . . . . . . . . . . . . . . . . . . 34 3.6.3.1 Government Types versus Framework . . . . . . . . 35 4 Results 37 4.1 Model Configuration and Setup . . . . . . . . . . . . . . . . . . . . . 37 4.1.1 LDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.1.2 BERTopic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 4.2 Qualitative Topic Analysis . . . . . . . . . . . . . . . . . . . . . . . . 38 4.2.1 LDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.2.2 BERTopic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 4.3 Ethical Topic Variation Across Government Type . . . . . . . . . . . 48 4.4 Quantitative Results Using OCTIS . . . . . . . . . . . . . . . . . . . 51 5 Discussions 53 5.1 Model Comparison and Qualitative Analysis . . . . . . . . . . . . . . 53 5.1.1 Model Comparison per Government Type . . . . . . . . . . . 53 5.1.2 Cross-Regime Comparison . . . . . . . . . . . . . . . . . . . . 55 5.2 Ethical Topic Variation Across Government Types . . . . . . . . . . . 56 5.2.1 Topic-Level Comparison . . . . . . . . . . . . . . . . . . . . . 56 5.2.2 Government-Level Comparison . . . . . . . . . . . . . . . . . 57 5.3 Quantitative Analysis Using OCTIS . . . . . . . . . . . . . . . . . . . 58 6 Conclusion 61 6.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 6.2 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 6.3 Further Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 Bibliography 65 A Appendix 1 I x List of Figures 2.1 Plate Notation of LDA [8]. . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 Variational Distribution Used to Approximate the Posterior in LDA. . 9 2.3 The Transformer Model Architecture from the Attention Is All You Need Paper [7]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.4 The Main Two Approaches Used When Constructing the BERT Model for Different Tasks [12]. . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.5 BERTopic Sequence of Steps to Create Its Topic Representations [37]. 15 3.1 Total English Words by Government Type. . . . . . . . . . . . . . . . 25 3.2 The 11 Framework Categories Based on UNESCO’s Recommendation. 34 4.1 Stacked Bar Charts of Normalized Topic-Framework Scores from Both Topic Modeling Approaches. . . . . . . . . . . . . . . . . . . . . . . . 50 A.1 Distributions of Overlap Scores by Topic Model. . . . . . . . . . . . . XIII A.2 Comparison of Topic Diversity by Model and Government Type. . . . XIV A.3 Comparison of CV Coherence by Model and Government Type. . . . XIV A.4 Comparison of IRBO by Model and Government Type. . . . . . . . . XV A.5 Comparison of WECoherencePairwise by Model and Government Type.XV xi List of Figures xii List of Tables 2.1 Democracy Index Classifications [3]. . . . . . . . . . . . . . . . . . . . 22 3.1 Aggregated Metrics by Government Type. . . . . . . . . . . . . . . . 26 3.2 Grid Search Parameters for LDA Model Optimization. . . . . . . . . 28 3.3 Hyperparameter Grid Search for BERTopic. . . . . . . . . . . . . . . 31 4.1 LDA Hyperparameters by Government Type. . . . . . . . . . . . . . 38 4.2 Hyperparameters Used for Each Government Type. . . . . . . . . . . 38 4.3 LDA: Full Democracy Topics and Qualitative Interpretation. . . . . . 40 4.4 LDA: Flawed Democracy Topics and Qualitative Interpretation. . . . 41 4.5 LDA: Hybrid Regime Topics and Qualitative Interpretation. . . . . . 42 4.6 LDA: Authoritarian Regime Topics and Qualitative Interpretation. . 43 4.7 BERTopic: Full Democracy Topics and Qualitative Interpretation. . . 45 4.8 BERTopic: Flawed Democracy Topics and Qualitative Interpretation. 46 4.9 BERTopic: Hybrid Regime Topics and Qualitative Interpretation. . . 47 4.10 BERTopic: Authoritarian Regime Topics and Qualitative Interpretation. 48 4.11 The Number of Government Type Overlaps With the Created Ethical Framework’s Topics (Out of 11) (A.1) for LDA and BERTopic. . . . . 49 4.12 Model Topic Overlap with the Framework Categories, Normalized Scores. Green Color Indicates the Highest Score Overlaps, and Red Color Indicates the Lowest Scores. The Framework Categories That Both Models Match Are Also Highlighted Respectively. . . . . . . . . 51 4.13 Comparative OCTIS Metrics for LDA vs BERTopic by Government Type. Higher Values Are Bolded. . . . . . . . . . . . . . . . . . . . . 52 A.2 “Public access URL” Status Codes and Counts. . . . . . . . . . . . . III A.4 Grid Search Results for LDA Hyperparameters Across Government Types. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VI A.5 Top 5 (and Top 10 for Hybrid Regime) Hyperparameter Sets for Each Government Type. . . . . . . . . . . . . . . . . . . . . . . . . . . . . VII A.6 Coherence Scores per Government Type: With and Without POS Tagging. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VIII A.7 LDA: Raw scores for Topics by Government Type. . . . . . . . . . . . X A.8 BERTopic: Raw scores for Topics by Government Type. . . . . . . . XII xiii List of Tables xiv Acronyms AI Artificial Intelligence BERT Bidirectional Encoder Representations from Transformers BoW Bag of Words CNN Convolutional Neural Network CSV Comma-Separated Values HDBSCAN Hierarchical Density-Based Spatial Clustering of Applications with Noise IRBO Inverted Ranked-Biased Overlap LDA Latent Dirichlet Allocation MCMC Markov Chain Monte Carlo MLM Masked Language Modeling NER Named Entity Recognition NLP Natural Language Processing NSP Next Sentence Prediction OCTIS Optimizing and Comparing Topic Models Is Simple OECD Organization for Economic Co-operation and Development POS Part-of-Speech QA Question Answering RNN Recurrent Neural Network SOTA State-of-the-Art UMAP Uniform Manifold Approximation and Projection UNESCO United Nations Educational, Scientific and Cultural Organization xv List of Tables xvi 1 Introduction Countries are experiencing a transformative period marked by drastic political changes, while simultaneously navigating Artificial Intelligence (AI) as an expanding field with a growing number of actors in the global market. This convergence of political changes and AI advancement raises concerns and questions about the interplay between government structures and AI policy development. Given the international nature of the development and its use transcending national jurisdictions, it is important to recognize the different approaches to AI within distinct political contexts. Recent work by the Center for AI and Democratic Values [1] has addressed similar themes by qualitatively analyzing AI governance across political systems. Their findings draw attention to the importance of understanding how regime types shape ethical and strategic priorities in AI policies. However, their approach relies on manual qualitative analysis, which offers valuable insights but remains limited in terms of scalability and reproducibility. This thesis applies data science, particularly Natural Language Processing (NLP) techniques, to analyze AI policy initiative documents from different countries and correlate their characteristics with democratic indexes. By comparing different topic modeling approaches (Latent Dirichlet Allocation (LDA) and BERTopic), we aim not only to understand the themes of AI policies globally but also to contribute methodologically to how NLP can be applied to understand policy discourse and societal priorities. By grounding our analysis in United Nations Educational, Scientific and Cultural Organization (UNESCO)’s ethical framework for AI governance, we will systematically explore how political systems prioritize key aspects of AI governance, such as ethical considerations, public accountability, and technological innovation. The Organization for Economic Co-operation and Development (OECD) [2] has collected more than 1000 AI policies and initiatives from 70 countries. These documents range from comprehensive National AI Strategies to countries’ projects to implement AI in healthcare or education. The countries in this collection vary in political governance structure, and the collected policies give insight into the diverse implementation of global AI strategies. To compare different political contexts, this thesis adopts the Economist Intelligence Unit’s Democracy Index [3], which categorizes countries into four regime types: Full Democracies, Flawed Democracies, Hybrid Regimes, and Authoritarian Regimes. 1 1. Introduction These classifications are based on criteria such as electoral process, civil liberties, political participation, and functioning of government. Examining how these different government types articulate their approaches to AI gives us a valuable understanding of the respective priorities in shaping the field of AI. However, the volume and complexity of AI policy documents make traditional qualitative analysis impractical for examining large multilingual corpora produced by diverse actors across various governance systems. Applying data science and NLP allows us to examine the structure and meaning of unstructured text on a larger scale; in particular, topic modeling can identify latent semantic structures and recurring themes within documents. In a political context, topic models can help reveal how states conceptualize issues like ethics, surveillance, or innovation. From a technical perspective, this problem also presents an opportunity to explore interdisciplinary adaption and comparative analysis. By applying and evaluating two topic modeling techniques, LDA and BERTopic, this thesis aims to identify which approaches are most effective in extracting coherent, interpretable, and politically relevant themes from global AI policy documents. 1.1 Research Question The aim of this thesis is twofold: first, to evaluate the technical performance of different topic models in policy texts; and second, to derive insights on how government structures shape the focus and framing of AI policies using topic models. The primary research question guiding this study is: How do LDA and BERTopic compare in their ability to extract meaningful and interpretable topics from AI policy documents? To answer the primary question, we will consider the following sub-questions: • What are the dominant themes and keywords in AI policy documents across different types of governance, as defined by the Democracy Index? • Are there measurable differences in the thematic emphasis on ethical consid- erations, public accountability, or economic priorities based on a country’s governance structure? 1.2 Thesis Structure This thesis includes the following chapters: Chapter 2 introduces the background and theory needed to answer the research questions. In particular, it covers key concepts in NLP, the Transformer architectures, and Bidirectional Encoder Representations from Transformers (BERT) as the basis for our topic modeling methods – LDA and BERTopic. It also defines chosen evaluation metrics (coherence score and Optimizing and Comparing Topic Models Is 2 1. Introduction Simple (OCTIS) measurements) and includes the categorization of the documents into different government types. Chapter 3 introduces the methodology of the thesis, including data collection and processing, BERTopic and LDA model setup and hyperparameter selection, and qualitative interpretation of the output topics. The chapter closes with quantitative and qualitative assessments using OCTIS and an ethical framework, respectively. Chapter 4 presents the key results obtained from the modeling experiments. Chapter 5 examines the trends, patterns, and observations made from these results, linking them back to the research questions. Chapter 6 concludes the thesis by summarizing the key findings of the discussion and forming the answers to the research questions. It includes limitations, suggesting that further research may be needed. 3 1. Introduction 4 2 Theory This chapter starts by introducing the broad concept of NLP, with a focus on topic modeling in political science research. Then it goes further into details of the LDA and BERTopic models, outlining their structures, key parameters, and applications. The Transformer and BERT architectures are emphasized to understand BERTopic’s underlying mechanisms. Finally, OCTIS metrics are used for a quantitative compari- son of model coherence and diversity, whilst an ethical framework is used as a lens for a qualitative measurement of different regimes. 2.1 Natural Language Processing NLP is the field of study concerning the interaction between computers and human language. The field has rapidly evolved over the last decades from focusing on syntax in the 1960s to today’s advanced machine learning and AI applications. While early efforts in NLP were characterized by hand-crafted, rule-based systems designed to encode linguistic knowledge explicitly, recent developments have shifted toward statistical and data-driven approaches. As a result, NLP has moved beyond its traditional linguistic focus to influence a wide range of everyday technologies, including digital assistants, machine translation, and automated content analysis [4]. Newer NLP approaches rely heavily on machine learning, which allows systems to learn patterns from examples rather than, as previously done, relying solely on rule-based calculations [5], [6]. Neural architectures, such as Recurrent Neural Network (RNN) and transformers, have performed well on different NLP tasks such as translation and summarization, and enable more context-aware modeling of language [7]. Within this broader evolution, topic models have emerged as a family of unsuper- vised machine learning methods designed to automatically identify latent semantic structures in large corpora. Instead of manually reading and coding each text, topic models infer groups of words that frequently occur together. They uncover latent thematic patterns without previously labeled data, making them suitable for exploratory analysis. Topic models also support scalability by summarizing large text datasets and supporting systematic, comparative studies in areas such as policy analysis. Furthermore, topic modeling offers transparency and interpretability and facilitates replicable thematic comparisons. 5 2. Theory 2.1.1 Latent Dirichlet Allocation Among the first models in topic modeling is Latent Dirichlet Allocation (LDA), where the goal is to uncover central topics and their distributions across documents and generate succinct representations of large datasets [8]. Its probabilistic framework preserves important statistical patterns and supports a range of downstream tasks, including classification, anomaly detection, summarization, and measuring similarity or relevance. 2.1.1.1 Structure and Parameters LDA models each document in a collection as a mixture of topics, where each topic is represented by a probability distribution over words. This structure allows LDA to capture the underlying thematic structure within a corpus. The core idea is that documents express multiple topics to varying degrees, and each word is assumed to be generated from one of these topics. Before outlining the model’s formal structure, it is helpful to define the key variables and parameters involved in LDA: • N : the number of words in a document (often observed). • ξ : the Poisson parameter that governs the expected length of a document • k: the number of latent topics in the model (a fixed hyperparameter). • V : the size of the vocabulary. • θ: the topic distribution for a document, drawn from a Dirichlet prior with parameter α. • α: the hyperparameter that shapes the Dirichlet prior over topic distributions. • wn: the n-th word in a document. • zn: the latent topic assignment for the n-th word in a document. • β: the topic-word distribution matrix, with dimensions k × V , where each row βi represents the word distribution for topic i. Given these definitions, the generative process for LDA, as defined by Blei, Ng, and Jordan [8], is as follows: 1. Choose N ∼ Poisson(ξ) 2. Choose θ ∼ Dir(α) 3. For each of the N words wn: (a) Choose a topic zn ∼ Multinomial(θ) (b) Choose a word wn from p(wn|zn, β), a multinomial probability conditioned on the topic zn 6 2. Theory Figure 2.1: Plate Notation of LDA [8]. The Poisson (ξ) distribution is used to model the number of words N in a document, with ξ denoting the document length, which accounts for the variation of document sizes. However, the ξ component is often excluded since the document length is often a fixed number and can be observed. Instead, the focus is primarily on the latent topic structure. Following the generative process, θ represents the topic distribution for a document drawn from a Dirichlet prior parameterized by α. The Dirichlet distribution serves as a prior over multinomial distributions and encodes assumptions about topic density within a document. Smaller values of α create a more focused topic distribution, and higher values distribute it more uniformly. Each latent topic assignment zn is drawn from aMultinomial distribution over a fixed number of topics k, corresponding to the dimensionality of the Dirichlet distribution θ ∼ Dir(α). In this context, the dimensionality k refers to the number of latent topics assumed in the model, and determines the number of components in the topic distribution θ. In the basic model, the number of topics k is set prior to training and remains constant through the modeling process. Therefore, each latent topic assignment zn represents a discrete selection from the k selected topics. This topic assignment determines from which topic-specified word distribution the observed word wn is drawn, which in turn determines the probability distribution from which the word wn is generated. The matrix β, estimated during training, defines the word-topic density. It is structured as a k × V matrix, where each row represents a multinomial distribution over the corpus vocabulary for k latent topics and V as the vocabulary size. Each individual value βij tells the probability of the word wj being generated in topic i. When the topic assignment zn is sampled for a word position, the observed word is drawn from the corresponding distribution βzn . 2.1.1.2 Inference and Parameter Estimation To uncover the latent topic structure in LDA, it is necessary to compute the posterior distribution of the latent variables given the observed data. This is expressed as: ( | p(θ, z, w | α, β)p θ, z w, α, β) = ( ,p w | α, β) 7 2. Theory where θ denotes the hidden variables’ topic proportions for each document, z topic assignments for each word, w for the observed words in the corpus, and α and β the Dirichlet hyperparameters. The posterior distribution expresses the conditional probability of the latent variables θ and z after observing the actual data w. It combines the previous assumption about the document-topic and topic-word distributions, encoded by α and β, with the tokens appearing in each document of the corpus, and generates the most likely latent structure in the corpus. However, the posterior is intractable. The marginal likelihood, expressed as p(w | α, β), needs to be calculated by iterating over all possible topic proportions and summing over all possible topic assignments. This is expressed as: ∫ (∑ ∏ )N p(w | α, β) = p(θ | α) p(z | θ) p(wn | zn, β) dθ. z n=1 This computation is expensive for large real-world datasets, as the number of possible combinations of latent variables, topic proportions θ for each document and topic assignments z, grows exponentially with the number of documents, topics, and words [8]. Approximate inference methods can be used to identify a family of lower bounds of the log-likelihood of the data by introducing a set of variational parameters indexes. The variational parameters are decided based on optimizing to find the lowest bound. These parameters are adjusted to minimize the Kullback-Leibler divergence between the true and the approximate posterior. Two main approaches exist: sampling-based methods, such as Markov Chain Monte Carlo (MCMC), and optimization-based methods, such as variational inference. The latter has become the standard approach for LDA. In the standard plate notation of LDA (see Figure 2.1), the model is defined in its generative form. However, the interaction between θ and β through z and w creates a computational bottleneck, or problematic coupling, making exact inference difficult. To enable approximate inference, a modified presentation is used. Figure 2.2 shows the variational distribution, where the original parameters such as θ and β are replaced with variational parameters γ and ϕ. The edges in the diagram are decoupled to reflect the independence assumptions introduced by the variational approximation, allowing for a tractable optimization of the posterior. Even with approximate methods, calculating inference can still be computationally expensive, so online variations has been developed to be able to scale the inference. Online Variational Bayes is an approximate technique based on stochastic optimiza- tion. It processes text in mini-batches and does not require storing the entire corpus in memory, allowing documents to be discarded after processing [9]. 8 2. Theory Figure 2.2: Variational Distribution Used to Approximate the Posterior in LDA. 2.1.1.3 Bag-of-Words Within the field of NLP, various techniques exist for representing and processing text data. One such foundational approach is Bag of Words (BoW), which represents each document as a sparse vector in a high-dimensional space, where each dimension corresponds to a unique term from the vocabulary, and the value typically encodes the term frequency or a weighted variant. A corpus is therefore not considered to be a flowing text, but instead a collection of isolated words without a specific order or relation to each other, and thus does not consider contexts in which the words appear. The BoW process is based on the assumption of the probabilistic concept of ex- changeability, which infers that the order of words within a document (and even the documents within a corpus) can be neglected without loss of generality [8]. This assumption makes it possible to model a sequence of words as conditionally indepen- dent and identically distributed once a latent parameter is introduced. This is the basis for probabilistic models like LDA, which represents each document as a mixture of latent topics, and each topic as a distribution over words. While the original LDA assumes full exchangeability over unigrams, more recent extensions relax this assumption by allowing sequences such as bigrams or trigrams to be modeled. This enables the capture of limited word-order dependencies and multi-word expressions, making the resulting topics more coherent and semantically meaningful. 2.1.2 BERTopic BERTopic, introduced by Grootendorst [10], is a semantic topic model that uses Sentence-BERT (SBERT) embeddings. For the model to work well with complex semantic topic representations, BERTopic encodes each document as a fixed-length vector to further cluster these vectors into topics [11]. 2.1.2.1 Transformers Different NLP tasks require distinct model architectures. The Transformer, intro- duced by scientists at Google in 2017, is a modern, widely used architecture that 9 2. Theory combines encoder and decoder components (explained later in this section) that have eliminated the need for recurrence by using the attention mechanism [7]. It intro- duces efficient parallelization and long-term memory, addressing limitations inherent to RNN and Convolutional Neural Network (CNN) used in sequence-to-sequence translation tasks, such as sequential processing constraints and difficulty capturing long-range dependencies. The Transformer is the foundation of key models used for NLP applications – including BERT [12] and GPT [13] – where BERT utilizes only the encoder stack and GPT uses only the decoder. The Transformer – unlike RNN and CNN – eliminates the need for recurrence and convolution by relying entirely on self-attention. Self-Attention The self-attention mechanism allows each word in a sentence to be compared to every other word, regardless of the distance between them. This capability enables the model to understand context, like connecting a noun at the start of a sentence to a pronoun at the end, or across multiple sentences. In doing so, it mimics how humans understand text by focusing on relevant parts regardless of position. Specifically, the model generates three vectors – Query, Key, and Value – for each word. These vectors are derived from the input embeddings. The self-attention then compares each word (Query) to all others (Keys) in a sentence, and outputs a score that indicates how strongly the words are related and what weight to assign to each word. The attention score of a word’s likelihood of appearing in a particular position is calculated by the following: T Attention(Q,K, V ) = softmax(Q√K )V, dk where Q, K, and V values are the Query, Key, and Value vectors, and dk is the dimension of the Key vector. Architecture: Encoder and Decoder As described by the authors [7] and shown in Figure 2.3, the Transformer architecture consists of two main components: the encoder and the decoder. • Encoder: Each encoder takes an input sequence and processes it in parallel. It starts by applying embedding and positional encoding, which adds additional information about the position of each token in the sequence, as attention alone lacks this information. The embedded inputs are then passed through N identical layers, each containing: – Multi-Head Self-Attention: Each self-attention (“head”) focuses on differ- ent aspects of the input. For instance, studies have shown that different heads can specialize in different tasks, such as handling infrequent words, encoding syntactic dependencies, or positional information of words [14]. 10 2. Theory Figure 2.3: The Transformer Model Architecture from the Attention Is All You Need Paper [7]. Moreover, the multiplicity of heads allows efficient computation of rela- tionships among tokens obtained by the previous layer. – Feed Forward Network: Applies a linear transformation with ReLU ac- tivation between layers to compute similarity scores and further process the features for deeper understanding. • Decoder: The output from the encoder layers is passed to the decoder. Similarly, the decoder consists of the positional encoding and feed-forward network layers. However, the decoder uses Masked Multi-Head Attention instead of regular multi-head attention to prevent the model from seeing future words and focus on already generated words during training. The final Add & Norm layer outputs the normalized vectors that are then passed through linear and softmax layers to produce the final output. Applications The Transformer is the foundation for powerful models, including GPT and BERT [12]. Several surveys ([15]–[17]) have documented the Transformer applications across real-world domains. The advancement of architecture in handling long-term dependencies has made a significant impact in both NLP and deep learning fields, including computer vision and multimodal applications [15], [17]. Due to its attention 11 2. Theory mechanism and contextual awareness, the Transformer is widely used in many NLP applications. It has shown outstanding performance in a variety of subtasks, such as Question Answering (QA), machine translation, and sentiment analysis [15]. More specifically, the encoder-decoder architecture plays a crucial role in modern topic modeling approaches, enabling models such as BERT or probabilistic TNTM (Transformer-Representation Neural Topic Model) [18] to capture nuanced language patterns across diverse texts and clusters. 2.1.2.2 BERT BERT is a language model created by Google researchers [12]. Using the Transformer’s encoder, it bidirectionally encodes text passages and thus considers context from both directions, which, at the time, gave it State-of-the-Art (SOTA) performance on many NLP tasks. Additionally, BERT has the advantage of being relatively small and computationally lighter than other models. BERT’s architecture consists of the Transformer encoder (parts explained in detail in Section 2.1.2.1) [12]. As in the training state of the Transformer, the BERT model uses Masked Language Modeling (MLM), randomly masking tokens in a document to predict the original input. Unlike OpenAI GPT, which processes text left-to-right, BERT is bidirectional, meaning it considers both left and right context simultaneously to predict masked tokens. An example of MLM could be the sentence “This document discusses AI [MASK] from different government types”, where BERT’s task is to predict the most likely word that could replace the [MASK] token (e.g., “policies”, “laws”, “rules”) by considering context from both directions. Contrary to deep learning word embedding models that provide stable word embeddings that lack contextual dependency (e.g., Word2Vec and GloVe), BERT produces token embeddings based on the word’s/token’s role in a sentence [19]. For instance, the word season will have different encodings for sentences Spring is my favorite season. and Can you season the pasta?. Moreover, BERT takes special tokens like [CLS] (classification) and [SEP] (separation) to indicate the beginning and end of a sequence, respectively. These tokens are then used in the further Next Sentence Prediction (NSP) task (explained later). Pre-training and Fine-tuning As mentioned, the BERT model was built to handle a variety of NLP tasks. However, not every task can be effectively tackled using the same training settings, which introduces the need for additional pre-training and fine-tuning steps tailored to specific tasks [12]. The following two approaches are used (shown in Figure 2.4): • Pre-training: The model is trained on large-scale unlabeled text using two tasks: MLM and NSP. MLM enables BERT’s bidirectional context understand- ing by randomly masking 15% of tokens in a sequence and predicting them based on surrounding text. For each training sequence, 15% of tokens are selected for prediction. Of these: – 80% are replaced with the [MASK] token, 12 2. Theory – 10% are replaced with a random token, and – 10% are left unchanged. NSP enables BERT to model relationships between sentences by predicting whether one sentence logically follows another. During training, 50% of sentence pairs are consecutive, while the remaining 50% are randomly selected from the corpus [12]. • Fine-tuning: The pre-trained model is further trained on labeled data to adapt it for specific NLP tasks. By including an extra output layer, the model can be trained for different tasks, such as QA, Named Entity Recognition (NER), and text classification, often leading to improved task performance. Research shows that fine-tuning largely preserves the spatial structure of the original token embeddings while adjusting task-relevant parameters, making classification boundaries clearer [20]. Additionally, fine-tuning refines representations by bringing tokens belonging to the same label closer together, enabling more accurate classifications. Figure 2.4: The Main Two Approaches Used When Constructing the BERT Model for Different Tasks [12]. Applications At the time of its introduction, BERT had superior performance compared to previous models on 11 different NLP tasks [12], including well-established benchmarks such as language inference (GLUE), QA (SQuAD), and NER. With its contextually- aware embeddings, BERT’s architecture makes it well-suited for tasks requiring deep semantic understanding, as words or phrases with similar meanings are positioned closely in the embedding space. Consequently, with some additional tuning, the model can be particularly useful for topic modeling. The semantically similar words can be modeled as a collection of topics. As mentioned, BERT is important for various NLP applications. For QA tasks, it significantly enhances performance by analyzing text bidirectionally, enabling 13 2. Theory Conversational QA (ConvQA) [21], SQuAD-based fine-tuned models, and a first- order pruning model (PAL-BERT) [22]. Moreover, BERT achieves SOTA results in NER tasks via both fine-tuning and feature-based approaches, with language-specific variants such as Chinese BERT for mineral NER [23], Persian MorphoBERT for NER [24], and German BERT for legal NER [25]. It also supports commonsense reasoning and NSP tasks, including NSP-BERT [26] (prompt-based BERT), Sense- BERT [27] (predicts masked words with supersense categories), and KVL-BERT [28] (applies BERT to visual reasoning). In particular, BERT has been used for Part-of-Speech (POS) Tagging tasks – assigning grammatical tags for words from a given document (e.g., noun, verb, adjective, etc.) [29]. Due to contextual aware- ness, BERT outperforms CountVectorizer, TF-IDF, FastText, and ELMo in POS tagging, extending the applications to include DA-BERT (BERT with deep-attention mechanism understanding relationships between target and emotional words) [30], BERT-POS (BERT with POS for sentiment analysis) [31], and applications such as BERT for POS tagging task for various languages, including Arabic [32], Uzbek [33], and the Algerian dialect [34]. Although BERT has been widely adopted for numerous NLP applications, its use in topic modeling is still emerging. The most well-known topic model, BERTopic, leverages BERT embeddings and c-TF-IDF, demonstrating its potential in generating interpretable topics [10]. However, the full potential of BERT in topic modeling remains relatively underutilized. Variants and Adaptations Originally, BERT was introduced in two sizes: BERT_base and BERT_large, with 12 and 24 encoder layers, respectively. Since then, numerous variations and compact versions have been developed to improve efficiency and performance. For instance, some of the most well-known include ALBERT (reduced parameters), DistilBERT (lighter computational footprint), and RoBERTa (enhanced MLM and removal of the NSP task) [19]. The paper discusses DistilBERT’s suitability for computationally- limited devices, noting that it does not surpass BERT_large in performance. Similar to the standard models, DistilBERT comes in base and large versions, with half the number of layers and a more compact architecture compared to BERT_base. Furthermore, BERT is frequently fine-tuned for domain-specific tasks, leading to specialized models such as MobileBERT for mobile applications, language-specific models (BERTino for Italian, ITALIAN-LEGAL-BERT for legal Italian texts, BERTje for Dutch), and topic modeling approaches like BERTopic. Another variation of BERT is Sentence BERT (SBERT) [35], a sentence-specific Siamese (or triplet) transformer that produces fixed-length sentence embeddings, significantly outperforming BERT in semantic tasks. These adaptations allow BERT to achieve better performance in specialized fields while maintaining its core architecture. 2.1.2.3 Model Architecture Unlike LDA and other traditional topic models, BERTopic takes into consideration the semantics and context of the documents in question. Despite their popularity and 14 2. Theory success in topic modeling, older models often struggle with somewhat inconsistent data [36]. Due to their probabilistic nature, even minor details can affect the output of the models, impeding accurate and coherent result interpretations. Additionally, traditional models generally output fewer topics compared to BERTopic. While this does not necessarily negatively affect the results, having a higher number of topics can increase the interpretability of the documents. The following sections discuss the key components of BERTopic. As seen in Figure 2.5, BERTopic includes Embedding Generation to create sentence- or paragraph-level vectors, usually using SBERT. These vectors then undergo Dimensionality Reduction to improve the model’s performance. Furthermore, Clustering (HDBSCAN) is applied to construct hierarchical topic groups based on density. Once the clusters are formed, CountVectorizer and c-TF-IDF are used to extract word sequences based on their occurrence in a document. Optionally, Fine-Tuning can be utilized for task-specific requirements and further processing. Figure 2.5: BERTopic Sequence of Steps to Create Its Topic Representations [37]. Embedding Generation BERTopic uses sentence- or paragraph-level vector embeddings to make documents and topics comparable [10]. These 768-dimensional embeddings provide numer- ical representations for clusters based on semantic similarity for further dimen- sionality reduction and clustering. By default, BERTopic uses Sentence Trans- formers (SBERT), though the optimal embedding model depends on the applica- tion and goal. Other sentence embedding models include all-mpnet-base-v2, paraphrase-albert-small-v2, and multi-qa-mpnet-base-dot-v1, fine-tuned for various use-cases (e.g., paraphrasing, semantic search, and multi-QA). Moreover, Gensim word embeddings (e.g., GloVe, FastText, Word2Vec) are also widely used. Several multilingual versions, such as distiluse-base-multilingual-cased-v2 and paraphrase-multilingual-MiniLM-L12-v2, support over 50 languages. 15 2. Theory Dimensionality Reduction with UMAP The embeddings generated in the previous step are high-dimensional and can notably slow down the topic modeling and downstream clustering process. While each embed- ding’s dimension remains constant, the increase in dataset size results in an overall larger embedding matrix. Therefore, to manage the embeddings and to improve the performance of clustering models, topic models often require dimensionality reduction, which involves reducing the number of features while preserving the struc- ture of the data [38]. While other models use latent topic models (including LDA) combined with machine learning techniques for dimensionality reduction, BERTopic uses Uniform Manifold Approximation and Projection (UMAP). This technique represents the data as a weighted graph in a high-dimensional space and reduces it to a low-dimensional form while preserving its underlying structure [39]. Specifically, to control the formation of the clusters, UMAP’s n_neighbors hyperpa- rameter is used, where decreasing the value leads to smaller, more distinct clusters, and increasing it results in larger, broader ones. The UMAP is applied before the clustering algorithm. Clustering (HDBSCAN) Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDB- SCAN) was introduced as an improvement to DBSCAN [40]. Unlike the latter, HDBSCAN builds clusters using a hierarchical structure based on varying density levels. While alternatives like K-Means, OPTICS, and Gaussian Mixtures are commonly used, HDBSCAN offers more flexibility and adaptability due to its tunable parameters. The key parameter, min_cluster_size, regulates the minimum number of documents required to form a cluster (i.e., a topic based on similar documents). Increasing this value results in fewer but larger clusters, whilst decreasing it can result in so-called miniclusters [41]. In parallel, min_samples can be used to influence the clustering after adjusting min_cluster_size. A higher min_samples value results in stricter clustering, as it increases the likelihood of classifying points as noise. Representation: CountVectorizer and c-TF-IDF To generate human-interpretable keyword lists that represent the topics extracted during the clustering step, the documents must first be transformed into a machine- readable format. In particular, BERTopic uses CountVectorizer, which converts document tokens into matrix form [42]. This BoW representation involves converting strings into tokens (usually words), counting the occurrences of each token, and then normalizing the token frequencies across the documents. The resulting matrix is sparse, as it typically contains many zero values for tokens that do not appear in a particular document. CountVectorizer is commonly used alongside the term-frequency times inverse document-frequency (TF-IDF) measure. TF-IDF is a tool for equally distributing the weight of a token, minimizing the importance of commonly used words in a document 16 2. Theory [43]. TF-IDF can be calculated as the multiplication of the term frequency and the inverse document frequency by using the following formula: ( ) tf-idf( ) = tf( )× log 1 + nt, d t, d 1 + df( ) + 1,t where t is the term in a document d, n is the total number of documents, tf(t, d) is the number of times term t appears in document d, and df(t) is the number of documents in which t appears. For topic modeling and clustering, class-based TF-IDF (c-TF-IDF) can be used instead. Rather than computing TF-IDF for individual documents, c-TF-IDF is applied after clustering to represent entire groups of documents (i.e., topics). In BERTopic, CountVectorizer and c-TF-IDF together help build a topic representation model. c-TF-IDF is useful for identifying cohesive terms that characterize each topic and can be calculated using the following formula: ( ) Wx,c = ∥tfx,c∥ × log 1 + A , fx where tfx,c is the frequency of the term x within class (cluster) c, fx is the total frequency of term x across all classes, and A is the total number of words across all classes. Additionally, CountVectorizer can be used alongside c-TF-IDF to remove stopwords and overly frequent terms after the topics have been identified, ensuring that the most informative content is retained. Additional Preprocessing/Fine-Tuning Whilst CountVectorizer and c-TF-IDF could be the final step in some topic mod- eling tasks, further fine-tuning and model refinements might be necessary. Since BERTopic with default parameters provides general results, additional adjustments can improve the coherence, relevance, and interpretability of the extracted topics. Some fine-tuned representation models include LangChain (leverages LLMs and QA chains to generate descriptive topic labels from documents), MaximalMarginalRele- vance (which extracts diverse keywords by minimizing redundancy), and KeyBERT (which optimizes keyword extraction using BERT-based embeddings) [44], [45]. For multimodal data, the VisualRepresentation model can be used to associate topics with corresponding images. Another important refinement technique is POS tagging, which filters and refines topic representations based on parts of speech. To optimize both computational efficiency and topic quality, the PartOfSpeech model selects documents containing relevant keywords and applies predefined POS filters to generate new refined keyword sets. Depending on the n-grams outputted, these filters can include nouns, adjectives, verbs, etc., as well as combinations (e.g., adjective+noun, noun+noun, etc). 17 2. Theory Performance BERTopic is a solid candidate among the highest-performing topic models today. The model’s rise in popularity has prompted a growing body of competitive research against traditional topic models. Additionally, some researchers ([11], [36]) identified BERTopic’s superiority over the SOTA topic models. For instance, BERTopic outperforms traditional models in terms of higher coherence, often producing a greater number of meaningful topics [36]. The model automatically finds and categorizes topics, requiring little to no manual intervention, though some fine-tuning or hyperparameter optimization might be needed for custom or fine-grained tasks and applications. Thanks to c-TF-IDF, BERTopic can distinguish distinct topics across different clusters, even when they share overlapping vocabulary. This means the same words can appear in different contexts and still contribute to clearly separated topic representations. Moreover, the model supports hierarchical topic reduction, dividing the topics into higher and lower sublayers. This feature helps users explore specific topics or view broader topic categories when needed. To visualize the topics, BERTopic includes intuitive and interactive built-in graphs, which highlight the probabilities of the most frequent keywords within a topic. Lastly, the BERTopic documentation explains every step and underlying logic in detail, making the model transparent and easy to follow. This level of clarity reduces room for ambiguity or misinterpretation, and helps minimize the risk of biases [11]. Whilst BERTopic excels in many aspects of topic modeling, it is important to highlight its limitations. BERTopic encodes and processes text at the sentence or paragraph level, meaning it benefits from clearly structured input. The model might struggle with texts that have irregular sentence structures or missing punctuation. In addition, to achieve high-quality results, the model often requires careful hyper- parameter selection. Grid search or other optimization methods may be needed to outperform traditional topic modeling approaches in some contexts. Similarly, some SOTA models have simpler architectures, whereas BERTopic, even after dimensional- ity reduction, requires more computational power due to its reliance on transformer embeddings and clustering. Finally, as BERTopic relies on pretrained language models, it may inherit the language biases present in those models, affecting performance on non-English or low-resource languages. 2.2 Evaluation Metrics This section includes a description of the metrics used to evaluate the performance of the two topic models. The evaluations include intrinsic metrics, such as coherence and perplexity, and several OCTIS framework metrics. 18 2. Theory 2.2.1 Intrinsic Evaluation Metrics Topic models generate sets of topics, each typically represented as a list of words, but they can also be expressed as distributions over documents or other forms, depending on the use case. However, the output of a topic model is not immediately measurable in terms of quality. To assess whether a model has learn meaningful topics, a range of evaluation metrics has been developed that quantify different aspects of topic quality. In topic modeling, coherence has become one of the most commonly used intrinsic evaluation metrics [46], [47]. Its goal is to quantify how semantically consistent the top words in a topic are by examining how frequently these words appear together in reference corpora. A coherent topic, for instance, might contain the words doctor, nurse, hospital, vaccination, as these words commonly co-occur in natural language and are likely to be interpreted by humans as belonging to the same semantic theme. A less coherent topic would be donut, house, europe, cat, dollar, where the words do not have a clear connection and would be judged incoherent by human standards. One commonly used coherence metric is the CV measure, which evaluates semantic similarity among top-ranked topic words based on their co-occurrence within a sliding window and a sliding segmentation of a reference corpus [46]. For example, given a set of top words W = {w1, ..., wN}, CV coherence evaluates all pairwise combinations (wi, wj), scoring each pair based on how frequently they appear together in the context of the corpus. These pairwise scores are aggregated using the arithmetic mean, resulting in a single coherence value for the topic. High coherence values generally suggest more interpretable topics that reflect real-world concepts. However, coherence scores have shown limits in their actual comparison to human interpretation [48]. For instance, a topic like data, database, dataset, datum, data drive may achieve a high coherence score due to frequent word co-occurrence. From a statistical perspective, the topic appears coherent. Yet semantically, it offers little depth or distinction and revolves around slight variations of a single word rather than capturing a broader or more informative concept. In such cases, the model may be learning repetitive or overly narrow topics. This issue arises in part because coherence measures tend to reward statistical regularity, which may not always correspond with actual thematic relevance. Additionally, coherence is highly sensitive to preprocessing decisions and the choice of reference corpus. This means that coherence scores are not always comparable across datasets or experimental setups. Perplexity is another metric used to evaluate topic models [48]. It measures the model’s ability to predict a held-out test set by calculating the average log-likelihood of unseen data. A lower perplexity indicates a better generalization performance and will not be negatively affected by a new document, while a higher perplexity will have trouble understanding new information. Since perplexity is based on the likelihood of word distributions, it is suited for probabilistic models like LDA, which explicitly model the probability of each word in a document based on learned topic-word and document-topic distributions. In contrast, BERTopic does not follow a generative probabilistic framework. Instead of modeling 19 2. Theory word probabilities, it clusters document embeddings and extracts representative keywords. As a result, its output does not provide the necessary probabilistic structure for computing perplexity, making this metric inapplicable to BERTopic in a meaningful way. However, it is important to acknowledge that the extent to which perplexity and coherence reflect human interpretability remains debated. Even when applied to probabilistic models such as LDA, perplexity has known limitations. Prior work has shown that perplexity does not always correlate with human judgment of topic quality and may favor statistically optimal yet semantically weak topics [48]. Similarly, while coherence, particularly the CV measure, is often used to approximate human interpretability [46], [47], others have shown that statistically coherent topics can result from repetitive or overly narrow topics that lack semantic meaning or thematic clarity [48]. These inconsistencies reflect a broader issue in topic model evaluation, where statistical metrics often reward internal regularity rather than meaningful thematic structure. Given this tension, these metrics should not be treated as absolute measures of model performance, but should be used as practical guides for hyperparameter tuning [8]. 2.2.2 OCTIS To evaluate topic models effectively and under consistent conditions, it is important to consider both the quality of individual topics and the usefulness of the mod- els as a whole. The OCTIS framework offers a reproducible pipeline to facilitate such evaluations [49]. OCTIS allows for standardized experimentation across various datasets and models by providing a unified framework for metric-based evalua- tion. While many evaluation metrics exist, this thesis emphasizes four core mea- sures: Coherence, Topic Diversity, Inverted Ranked-Biased Overlap (IRBO), and WECoherencePairwise. Together, these metrics provide insights into the inter- pretability, diversity, and redundancy of the resulting topics. In the context of OCTIS, the most commonly used coherence measure is CV Coherence. As described in Section 2.2.1, it evaluates the semantic consistency of topic words based on co-occurrence statistics extracted from the input corpus. However, coher- ence alone does not fully characterize the quality of a topic model, particularly in cases where multiple topics share overlapping vocabulary or only differ slightly in emphasis. While CV Coherence relies on co-occurrence within the evaluation corpus, OCTIS’s metric WECoherencePairwise adopts a different strategy based on semantic similar- ity using word embeddings [49]. It computes the average pairwise cosine similarity between the top-k words in each topic within a high-dimensional vector space. The embedding model used by default is built into OCTIS and consists of 300-dimensional Word2Vec vectors trained on the Google News corpus. These embeddings are auto- matically downloaded and handled by the framework. Because this metric leverages distributional semantics rather than observed frequencies, WECoherencePairwise is less sensitive to corpus size and noise. As a result, it can capture hidden semantic relationships that may not be observed through local co-occurrence patterns. 20 2. Theory Topic Diversity quantifies the uniqueness of words across topics, thus penalizing models that produce near identical topics with only minor variations [49]. It is calculated as the proportion of unique words in the top-k topic words across all topics. The metric aligns with the idea of decomposability in interpretability, each topic should ideally contribute new, distinct information to the model’s overall representation of the dataset. A high Topic Diversity score implies that the model has learned a broad and varied set of themes rather than repeating the same information across multiple topics. While Topic Diversity assesses the uniqueness by measuring how many distinct words are used across topics, it does not account for how similar those topics are in terms of their ranked word structure. To complement this, IRBO is used to evaluate topic redundancy by comparing the ranked word lists of all topic pairs [50]. Rather than counting unique words alone, IRBO measures how often the same words appear in similar positions across different topics. A high IRBO score indicates that many topics share similar high-ranking words, while a lower score suggests that the model has produced more clearly separated and structurally diverse topics. 2.3 Political and Ethical Context of AI Policy To understand how AI is evolving, it is important to understand the circumstances under which it is developed and used. The evolution of AI is not purely isolated to a technical context, but is as much a political and ethical one. The development is affected by who benefits from its use and who bears the risks [51]. National AI strategies are therefore not only blueprints for innovation, but they also reflect political systems, cultural norms, and regulatory philosophies. AI is neither neutral nor isolated from existing power structures [52]. The choices made in AI policy documents, about what to regulate, what to incentivize, and what ethical principles to prioritize, reflect broader governance ideologies [53]. This means that analyzing AI policies also provides insight into how governments understand and frame their responsibilities towards citizens in the context of AI development and use. To enable comparative analysis of AI policies, this thesis draws on the Democracy Index published by the Economist Intelligence Unit [3]. The Index score is derived from expert assessment and public opinion survey, based on 60 indicators grouped into 5 categories: electoral process and pluralism, functioning of government, political participation, political culture, and civil liberties. A panel of country analysts and experts from the Economist Intelligence Unit reviews the data and assigns scores according to a standardized coding system (e.g., 0-1 or a three-point scale), ensuring consistency across countries. This, in turn, categorized the countries into one of four government types (see Table 2.1): Full Democracy, Flawed Democracy, Hybrid Regime, and Authoritarian Regime. The UNESCO Recommendation on the Ethics of Artificial Intelligence [54] provides a useful normative framework for analyzing AI policies. This document was constructed by researchers collaborating with international stakeholders (e.g., government, private 21 2. Theory Government Type Democracy Index Score Full Democracies 8.00–10.00 Flawed Democracies 6.00–7.99 Hybrid Regimes 4.00–5.99 Authoritarian Regimes 0.00–3.99 Table 2.1: Democracy Index Classifications [3]. companies, academia, etc.). It builds on already existing frameworks, implementing international law that is focused on human rights and dignity, equality, protection, and more. In particular, the recommendation identifies ethical principles and en- courages countries to align their AI governance with these goals. These principles are split into the following themes: Ethical Impact Assessment, Ethical Governance and Stewardship, Data Policy, Development and International Cooperation, Environ- mental and Ecosystems, Gender, Culture, Education and Research, Communication and Information, Economy and Labour, and Health and Social Well-Being. However, implementation remains voluntary, and the symbolic nature of many national AI strategies means that they may function more as public relations tools than as enforceable commitments [55]. Understanding AI strategies through these theoretical lenses provides a more comprehensive picture of global AI development and shows how data science intersects with governance, ethics, and public values. 22 3 Methods This chapter includes detailed steps taken to answer the research questions based on the introduced theory. The first section includes the data collection process, highlighting data cleaning, web scraping, and post-scraping processing steps to make data usable for topic modeling. Furthermore, BERTopic and LDA modeling techniques are described, including chunking, hyperparameter optimization using GridSearch, and topic extraction. Lastly, the chapter concludes with a policy analysis framework design based on UNESCO’s ethical recommendations, which will then be used for qualitative analysis of the models and different regimes. 3.1 Data Collection and Processing For this project, the dataset was constructed using data from the OECD AI Obser- vatory, which provides a Comma-Separated Values (CSV) file containing countries and their respective AI policies. The following subsections detail the data processing steps, including pre-scraping cleaning, web scraping, and post-scraping preprocessing. 3.1.1 Data Cleaning (Pre-Scraping Stage) To ensure data usability, we filtered the dataset to retain only the “Country” and “Public access URL” columns, discarded rows with missing data, and saved the resulting data into a new dataframe. We manually created a dictionary associating each country from the OECDs AI Observatory CSV file with its respective democracy index. The democracy index values were sourced from The Economist [3], as their methodology is grounded in empirical observations of state governance. Thus, the countries were divided into 4 classifications (see Table 2.1), where the higher the democracy index, the more democratic the country is said to be. The government type was used for further analysis. Additionally, we checked the status code of each “Public access URL” – URLs with status code 400 or higher (indicating errors such as bad requests, forbidden access, or not found) were flagged, whilst those with status codes below 400 were considered operational. Since some websites have specific anti-bot protection to avoid overloading the server, basic measures were taken to maintain as many links as possible, including retry strategies using HEAD and GET requests, and custom user-agent headers Table A.2 provides an overview of the status codes with the 23 3. Methods number of times they were encountered. 3.1.2 Scraping Given that scraping strategies differ by document type, we categorized the links into two groups: PDFs and non-PDFs. Either one of the following conditions had to be satisfied for a link to be considered a PDF: Content-Type being “application/pdf”, links ending with “.pdf” or “.docsx”, or Content-Disposition including an at- tachment. Originally, the CSV file contained data for 70 countries. However, 3 countries were dropped due to no working links, resulting in 67 total countries. After the PDF and non-PDF checks, 60 countries returned non-PDFs, whilst only 35 returned PDFs. The two lists of documents were combined, resulting in a total of 66 countries with at least PDF and/or non-PDF scraped content. To scrape the text from the non-PDFs, BeautifulSoup, Cloudscraper, and Selenium were used. BeautifulSoup library [56] extracts visible text, removing irrelevant information for further processing (e.g., script and style elements, headers, and footers). Due to its efficiency, Cloudscraper [57] is introduced as the first approach to overcome anti-bot protections. However, if the module fails to access the link, Selenium [58] was used as a fallback. Although the project is computationally more expensive, the automatic user interaction emulated with websites provides satisfactory results, increasing the number of links scraped. To scrape the PDF files, we used the PyPDF2, BytesIO, and EasyOCR modules [59]– [61]. BytesIO was used to interact with the links directly, without requiring separate downloading. PyPDF2 was used to efficiently split the files into pages and extract the text. However, some files resulted in noisy data, requiring an OCR-based method instead. Therefore, for several countries – including those with handwritten content and poorly formatted PDFs where traditional text extraction failed – EasyOCR was used. These tools were chosen for their complementary strengths, which ensured a high retrieval rate across diverse document formats. 3.1.3 Data Preprocessing (Post-Scraping Stage) To finalize the data gathered from the scraping stage, we applied several preprocessing methods. To standardize the dataset language, we evaluated whether to exclude non-English data or translate it into English. The need for translation was assessed by comparing the words in the scraped text against a reference list of 479,000 English words [62]. This list was selected for its substantial size – approximating the estimated 600,000 words in the English language – and served as a reasonable proxy for estimating the proportion of English content in the dataset. The analysis revealed that approximately 1.7 million out of 4 million words did not match the reference list. Most of the matching English content originated from English-speaking countries such as the United Kingdom, Canada, and Australia. To maximize the size and comparability of our corpus, we opted to translate non-English text into English. 24 3. Methods • Using the Google Translate API (Googletrans), we split the files into chunks and then automatically detected the source language. After translation, the original format of the files using translated chunks was restored. • After applying the same English word check, we selected country files containing less than 80% English text. Additional translation was used by manually detecting the languages within those files, and then Googletrans was used to translate from those languages to English. • A similar approach was used for countries using Latin-based alphabets with diacritical marks (e.g., Sweden, Poland, Lithuania). In these cases, certain characters specific to the respective languages (such as “ö”, “ł”, or “š”) were being removed or distorted during later processing steps, resulting in unintelli- gible or incorrect words. To prevent this, files from these countries were further translated by using Googletrans API with the source language manually set to the country’s native language. Figure 3.1 displays the distribution of the English words in each government type after translation and data processing. Full Democracy Hybrid Regime 36.2% Authoritarian Regime 2.6% 3.6% 57.6% Flawed Democracy Figure 3.1: Total English Words by Government Type. However, these numbers are just a rough estimation solely based on the English 25 3. Methods word list [62] and were used as a reference for whether the files need additional processing. The final output of the test run indicated that the English corpus of the mentioned text file classifies some English words as non-English (e.g., “cybersecurity”, “nanotechnology”, and dates). Furthermore, some data cleaning was applied. For instance, some of the translated text combined sentences, resulting in paragraphs without proper spacing between them. Additionally, some files contained noise from scraped PDFs, including scraper interface residue such as “opens submenu items.” Moreover, there were instances of unigrams and bigrams repeated multiple times in a row, further contributing to noise. Removing the mentioned nonsensical text, along with unusual characters and extra spacing, finalized our data processing steps. Unlike these non-informative elements, country references were intentionally retained, as they are considered meaningful textual features, as they can signal the geographic focus of a policy or reflect differences in discourse. These references are treated as relevant components of the topic model output, contributing to the interpretability of themes related to the national context. Moreover, UTF-8 character encoding was used throughout various preprocessing stages, including reading, writing, and saving files. It encoded characters into a sequence of 8-bit bytes and is widely preferred for web pages and electronic communication due to its compatibility with ASCII and support for a wide range of characters. Lastly, model-specific processing was applied. Because the BERT model has a token limit, the text was segmented into uniform 512-token chunks. Additionally, we generated a separate version of the dataset applying spaCy’s lemmatization for the LDA model. Lemmatization (e.g., converting “policies” to “policy”, “data” to “datum”) ensures that similar words are consolidated, which is crucial for accurately capturing word frequencies in probabilistic models. Following preprocessing, the country-level dataset was split into four documents, one for each government type. Table 3.1 summarizes the key metrics for each government type. Detailed metrics for the countries obtained from the original CSV dataset are provided in Appendix A. Government Type Countries per Gov Type Working URLs Total Tokens Full Democracy 23 307 1,050,225 Flawed Democracy 28 329 1,879,726 Hybrid Regime 10 60 78,080 Authoritarian Regime 9 65 106,976 Total 70 761 3,114,997 * Total number of countries per government type includes the four countries that result in 0 working links and 0 tokens. Table 3.1: Aggregated Metrics by Government Type. 26 3. Methods 3.2 LDA This section describes the methodology for LDA to output topics from policy docu- ments. In particular, preprocessing and chunking, as well as hyperparameter search steps, are highlighted. 3.2.1 Text Processing and Chunking Prior to LDA modeling, the data was preprocessed to build a Gensim dictionary and corpus. In addition to standard English stopwords, a set of domain-specific stopwords was compiled and removed. These included frequently occurring but semantically uninformative tokens such as “artificial”, “intelligence”, “cookie”, and “http”. These terms were overrepresented due to the scraping of policy documents that discussed AI and web interfaces, but they did not contribute meaningfully to distinguishing topics. Moreover, isolated characters (e.g., “a”, “b”, “c”, etc.) were removed to prevent noise in the data. All policy text documents were tokenized using NLTK’s sentence and word tokenizers. The tokenized data was segmented into overlapping chunks to accommodate LDA’s assumptions regarding document size and topic distribution. Each chunk was capped at 512 tokens, with a 40-token overlap between consecutive chunks to preserve contextual continuity. Unlike BERTopic, which relies on semantic embeddings and benefits from larger overlaps to maintain contextual flow, LDA treats documents as bags-of-words and models topic distribution across the entire corpus. Therefore, a smaller overlap was sufficient to maintain coherence while improving computational efficiency. This approach ensured a balance between semantic completeness and computational tractability. Each chunk was then tokenized into lowercase words and filtered to exclude custom stopwords. The processed chunks were then transformed into bag-of-words represen- tations in Gensim’s dictionary object. This step created a mapping between each token and a numerical reference to standardize the vocabulary across the corpus. 3.2.2 LDA Modeling and Hyperparameter Tuning For the purpose of topic modeling, this study employed LDA implemented via the LdaModel class from the Gensim library [63]. We selected it over Scikit-Learn because it allows a more in-depth analysis of the results and supports online variational Bayes inference, which is more computationally efficient and scalable. Given the objective of comparing the thematic structure of policy discourse across government regime types, Gensim provides methodological reliability while enabling the analysis to focus on the interpretation of cross-regime thematic patterns. Developing a custom LDA implementation would be more appropriate in a study aimed at improving or extending the algorithm itself, which falls outside the scope of the present research. The LDA models were trained separately for each government category and tailored for their own data to account for variations in structure and quantity. A grid search approach was used to tune the hyperparameters for each type of government. The 27 3. Methods hyperparameters used were the number of topics the model should identify, the number of passes through the training corpus, alpha, and eta. To identify the most effective configuration for each regime type, the grid search was conducted over the predefined ranges of hyperparameters presented in Table 3.2. Parameter Candidate Values num_topics 5, 10, 15 passes 20, 30 alpha ’symmetric’, ’asymmetric’, 0.01, 0.1 eta ’symmetric’, ’asymmetric’, 0.01, 0.1, 0.5 Table 3.2: Grid Search Parameters for LDA Model Optimization. The model was evaluated using topic coherence and perplexity to assess both semantic interpretability and statistical fit. For each government type, except for Flawed Democracy, the five configurations with the highest coherence scores from the grid search were shortlisted. If multiple configurations had identical coherence scores, perplexity was used as a secondary criterion to break ties. From these five top candidates, the configuration with the highest number of topics was selected to enable a more detailed thematic analysis and facilitate cross-topic comparison in the subsequent qualitative analysis. The Flawed Democracy subset produced a relatively low number of topics. To facilitate a fairer comparison with BERTopic’s output, the number was manually augmented. All models were trained with online learning enabled and a fixed random seed to ensure reproducibility. Additionally, word-level topic distribution were enabled during training to allow traceability for individual terms in the produced topics. The final set of hyperparameters selected for each government type is shown in the Results section in Chapter 4; see Table 4.1. 3.3 BERTopic As an alternative to LDA, the BERTopic model was used to process a set of documents and, depending on the parameters, generate personalized topics. As mentioned in Section 3.1.3, we split the original dataset into four separate subsets, each containing policy texts for the respective government type. This was done to prevent the model from interpreting a single large document – containing policies from all regimes – as one topic. To account for the diverse and, in some cases, noisy nature of our data (e.g., the presence of non-English words and incomplete sentences), we modified and fine- tuned several model components and hyperparameters after an initial run produced overly vague topics. In the following, we introduce several adjusted components of the methodology: • Document Chunking: As mentioned, the BERT model has a token limit, therefore, each government-type document was split into chunks of 512 tokens with an overlap of 100 tokens. This overlap is important because some sentences 28 3. Methods do not end with a full stop, so additional context from previous sentences is necessary for correct processing. • Sentence Embedding Model: Initially, as recommended by the BERTopic creator, we used the all-MiniLM-L6-v2 sentence transformer to embed the in- put documents. However, because some non-English words appeared in the out- put, we switched to the sentence-transformers/paraphrase-multilingual- MiniLM-L12-v2 model [64]. Like the original version, the multilingual sentence transformer effectively captures the semantic similarities between texts and is well-suited for tasks like semantic search and clustering, even for non-English documents. Note: The model still took the translated documents as input because, due to its probabilistic nature, the LDA model does not differentiate between multilingual outputs. Thus, to obtain as uniform data across the models as possible and for a more consistent output comparison, the models receive the same dataset as input. Example: A sentence such as “Policies discussing artificial intelligence” might be embedded as a 384-dimensional vector like [0.11, 0.96, 0.45, ..., 0.21]. • UMAPDimensionality Reduction: UMAP’s key hyperparameters, n_neighbors (which sets the neighborhood size) and n_components (the target dimensional- ity), were tuned via grid search, although its inherent stochasticity means that identical outputs are not guaranteed each iteration. Example: The high-dimensional vector for our earlier sentence might be reduced to just 5 dimensions: [0.42, 0.73, 0.70, 0.64, 0.11]. • Clustering with HDBSCAN: HDBSCAN was employed as the clustering algorithm to group similar documents. Its min_cluster_size parameter, which deter- mines the minimum size of the cluster, was optimized through grid search. This optimization helps prevent the creation of too many microclusters and ensures that the output topics are meaningful. Example: Suppose a chunk includes the terms “Spain”, “processor”, “algo- rithm”, “United Kingdom”, “Brazil”, “computer”, “United States of America”, “implementing”. One cluster might form around geographic entities ([ “Spain”, “United Kingdom”, “Brazil”, “United States of America”]), and another around technology-related terms ([“processor”, “algorithm”, “computer”, “implement- ing”]). • Topic Extraction with CountVectorizer and c-TF-IDF: We employed a CountVectorizer together with a topic-level bag-of-words c-TF-IDF trans- former to obtain distinct topics. Custom stop words (e.g. “ai”, “artificial”, “intelligence”) were introduced since they appeared in every topic, contribut- ing little to the later discussion, and we deliberately removed certain words (“not”, “no”) from the default list because they provide additional context (e.g., distinguishing “important” from “not important”). We configured the vectorizer to extract one- to two-word n-grams since single-word topics might 29 3. Methods not accurately capture the themes. Additionally, common words that appear in most documents were removed by setting reduce_frequent_words=True in the c-TF-IDF model. Example: Suppose that we have chunks that discuss some data-related poli- cies. After computing c-TF-IDF weights, the top 3 n-gram keywords might be: {(“data privacy”, 0.43), (“privacy cookies”, 0.33), (“request information”, 0.11)}. Based on these keywords, the label (topic) assigned for this group could be “Data Privacy”. • POS-Based Topic Representation: A representation model based on POS tagging was used to obtain unique topic-level clusters instead of document- level clusters. By focusing on adjective+noun, noun+noun, verb+noun, and adjectives alone, the model outputs topics that more effectively represent the document content. Although running the model without POS might result in higher coherence scores, incorporating POS tagging generally improves topic interpretability. Example: In the earlier technology-related cluster, POS filtering might re- tain “computer”, “algorithm”, and “processor” but exclude all the verbs (e.g., “implementing”). • Hyperparameter Grid Search: A grid search was performed to deter- mine the optimal hyperparameters for each government type. We varied the target number of topics (nr_topics), UMAP’s n_neighbors, and HDBSCAN’s min_cluster_size. The nr_topics variable includes a wider range of topics because, unlike LDA, it indicates the maximum number of topics, and not the exact number (i.e., the variables are not identical). Because of the dispropor- tional differences between government-type datasets, the best hyperparameters were selected according to the coherence score obtained for each type. In the final step, for each government type, we selected the top five (except for Hybrid Regime) hyperparameter sets based on their coherence scores and then chose the set with the highest nr_topics for further detailed analysis. Table 3.3 shows the parameters used for the grid search. However, since the Hybrid Regime could not produce more than 3 topics for any hyperparameter combination, we extended the hyperparameter range for this regime. In particular, the Grid Search was run for nr_topics=16 and then selected the set that resulted in the most topics within the top 10 highest- coherence sets. Note: nr_topics was set to 16 since that is the maximum value of topics that the model can reach, not the exact or minimum. 30 3. Methods Parameter Candidate Values nr_topics 4, 6, 8, 10, 12, 14, 16 umap_n_neighbors 2*, 5, 10, 15, 20* hdbscan_min_cluster_size 2*, 5, 10, 15, 20* Table 3.3: Hyperparameter Grid Search for BERTopic. * Values 2 and 20 were explicitly introduced for the Hybrid Regime, with nr_topics=16. 3.4 Qualitative Topic Analysis To interpret the topics and keywords provided by the LDA and BERTopic models, a qualitative analysis was performed. The keywords for each government type topic, output by each topic model, were investigated in detail. Specifically, the aim was to find the exact documents where all 10 keywords of a specific topic were present. However, this was not always possible since the topics were constructed based on embeddings rather than originating from a specific document. Thus, the second approach was finding documents (512-token chunks) that contained the largest number of the 10 keywords. For example, if a certain document included the most keywords (e.g., 9 out of 10), that document was selected. Depending on the depth and distribution of the topic, different documents could contain the same number of keywords. Therefore, all such documents were included for qualitative analysis. However, the number of documents varied between each topic, ranging from one 512-token document to roughly ten. The analysis itself involved reading the extracted documents to understand the broader context of each topic. Example: Let’s say a topic model outputs Topic 0 with the following keywords: weather, sun, heatwave, rain, storm, lightning. In a case where all the keywords appear, the document might read: Today, Texas is experiencing a heatwave, with a high-intensity sun. The weather is forecast to take a turn, with the upcoming week expected to bring a storm with heavy rain and lightning. In other cases, some of the keywords might appear in separate documents: The weather today in Sweden is moderate. Sun will not be visible due to the storm and continuous rain. or No heatwave or sun to be expected due to the upcoming storm, which will include torrential rain. 31 3. Methods These contain only four of the six keywords but still contribute to understanding the topic. 3.5 Quantitative Comparison Using OCTIS Following the training and tuning of both the BERTopic and LDA models, the OCTIS framework was applied to evaluate their technical performance. OCTIS enables fair comparison across models with differing architectures by providing standardized implementations of widely used topic modeling metrics. Since each model was trained separately for each government regime type, evaluation was likewise performed independently per category, using the corresponding topic-word distributions. Four intrinsic evaluation metrics were computed using OCTIS: CV Coherence, Topic Diversity, WECoherencePairwise, and IRBO. Together, these metrics assess topic interpretability, semantic consistency, lexical uniqueness, and redundancy, as outlined in Section 2.2.2. All metrics were computed using their respective OCTIS classes with default settings. CV Coherence and IRBO relied on co-occurrence patterns within the corpus, while WECoherencePairwise used pretrained 300-dimensional Word2Vec embeddings from the Google News corpus. Topic Diversity was calculated based on the top 10 words per topic. Together, these metrics provide quantitative proxies for properties such as semantic coherence, lexical diversity, and topical redundancy, which are commonly associated with human interpretability. Each metric was computed and stored per model and government type. Results were rounded to four decimal places and compiled into a summary table for comparison across models and governance categories. All evaluations were conducted using fixed random seeds and consistent preprocessing to ensure comparability of the final results. It is important to note that although both models were evaluated under the same OCTIS framework, they were assessed on their own respective preprocessed corpora. This decision reflects the different architectural characteristics of LDA and BERTopic. Rather than enforcing identical preprocessing across models, the evaluation process aimed to respect each model’s strengths, ensuring that performance was measured under conditions optimized for each model’s architecture in accordance with the exploratory nature of the thesis: to determine which model performs more effectively within the context of AI policy discourse across different political regimes. 3.6 Ethical Topic Variation Across Government Type This section includes a description of the UNESCO ethical recommendations, which will be used as a basis for creating our analytical framework. The design of this framework will be discussed in detail, alongside its comparison with our keywords extracted by each topic model. The goal of this framework (Table A.1) and its categories was to identify whether certain ethical concerns and nuances are present 32 3. Methods and discussed within the AI policies, rather than to extract or reproduce the 11 UNESCO categories. 3.6.1 UNESCO Recommendation on the Ethics of Artificial Intelligence The adoption of AI worldwide has resulted in variations in how countries are expected to approach the regulation of AI. To ensure a structured and thorough analysis of the generated topics from LDA and BERTopic, we construct an analytical framework based on UNESCO’s Recommendation on the Ethics of Artificial Intelligence [54]. The UNESCO recommendation serves as a normative framework for evaluating and guiding the ethical development and governance of AI. It defines AI systems as those capable of processing data and information in ways that resemble intelligent behavior, such as reasoning, learning, and prediction. The document itself is directed at Member States, both as AI actors and as regulatory authorities. Additionally, it provides ethical guidance to all AI stakeholders, including the public and private sectors. Unlike regional AI regulations such as the European Union’s AI Act,UNESCO’s recommendations are recognized by 193 member states and establish a universal set of ethical guidelines. From these guidelines, AI policy topics can be systematically assessed based on their alignment with ethical AI principles. By aligning the extracted topics from our models with these predefined areas, we can evaluate the results of the different government structures (democracy indexes) and find dominant themes and assess whether variations exist in regards to ethical considerations, public accountability, or economic priorities. Moreover, this frame- work minimizes subjective biases and ensures a consistent interpretation throughout the analysis. 3.6.2 Framework To systematically assess AI policies, this framework classifies policy themes into 11 key AI governance areas outlined by UNESCO (see Table 3.2). The document includes several sections, each dedicated to a different AI principle or ethical risk (e.g., Data Policy, Education and Research, etc.), and paragraphs on how member states should approach AI policies to ensure ethical usage of AI in the respective areas. These sections serve as the framework categories in our thesis for measuring differences in thematic emphases (see Table A.1 for more details). 33 3. Methods Ethical Impact 6 Gender 1 Assessment 7 Culture Ethical Governance 2 8 Education and Research and Stewardship Communication and 3 Data Policy 9 Information Development and 4 10 Economy and Labour International Cooperation Environment and Health and Social 5 11 Ecosystems Well-Being Figure 3.2: The 11 Framework Categories Based on UNESCO’s Recommendation. To ensure that the analysis remains grounded in established ethical AI principles, a manual keyword extraction approach was employed. This method was chosen to maintain conceptual accuracy while remaining unbiased and consistent with the normative language used in UNESCO’s recommendation when analyzing the results. A manual keyword extraction to assess the ethical policy areas allows for a context- aware selection of terms that might not have been conveyed through automated methods. The process followed UNESCO’s own categorization of ethical AI principles and policy areas to structure the analytical framework. Each section was read in full, and key terms were identified based on their explicit relevance to AI governance. Specifically, nouns, technical terms, and phrases describing AI impact assessments were identified. Example: The “Data Policy” section has a paragraph that says the following: Member States should work to develop data governance strategies that ensure the continual evaluation of the quality of training data for AI systems, including the adequacy of the data collection and selection processes, proper data security and [...]. In this case, we would identify the following keywords: data governance, data security, data quality. The same procedure would be applied for the rest of the paragraphs and turned into a list representing the Data Policy topic. 3.6.3 Analytical Application The results of the topic models were analyzed and compared to evaluate their utility in interpreting AI policies and correlating themes with governance indicators using the Economist Intelligence Unit’s Democracy Index [3]. The comparative analysis evaluated the outputs of two topic modeling approaches applied to AI policy documents. The aim was to determine which model most effectively extracts meaningful and interpretable themes, particularly concerning the level of democracy in the countries producing these policies, as reflected in the Democracy Index. 34 3. Methods The evaluation examined the coherence and relevance of the topics generated by each model, with a focus on how thematic priorities differed across the Democracy Index spectrum. For example, democratic nations may emphasize themes like transparency, ethics, and accountability, whereas authoritarian regimes might prioritize control, innovation, or surveillance. This analysis also assessed how well the generated topics reflected governance-related patterns and aligned with frameworks like UNESCO’s AI Ethics Guidelines. 3.6.3.1 Government Types versus Framework This section describes a thorough analysis of the government types and the constructed framework. The topics and their keywords generated by the LDA and BERTopic models were compared to each of the 11 categories of the framework to see which ethical dimensions each regime emphasized, and how emphases varied across regime types. To compare the government types to each other and the framework, the following steps were implemented: For a better analysis, the keywords representing each topic from the LDA and BERTopic models, as well as the framework keywords, were normalized. Since both models included stopword removal, we applied it to the framework as well for consistency. Moreover, NLTK’s word stemming was used, since morphological variations of n-grams were not of interest for these comparisons. This step also included converting the phrases to lowercase and tokenizing bigrams into individual words. The normalization step was particularly useful for bigrams in the later stages. For instance, the phrase “host organization” would be transformed into a list: [“host”, “organ”]. To get the overlap scores for each framework category, two options were considered: • Option 1: Full Overlap. For each government type’s topic, the unigram or bigram keyword was compared to each of the framework’s keywords. If the overlap was absolute, the framework category was returned with an assigned score of 1 (e.g., Data Privacy = 1). • Option 2: Partial Overlap. If there was no perfect match between the keywords – which was more common with bigrams – the closest matching framework (if any) was returned. The score was calculated as: score = Number of overlapping tokensTotal tokens in framework keyword . For example, the keyword [“discrimination”] (length = 1) would return a score of 0.5 if the bigram framework keyword was [“discrimination”, “policy”] (length = 2). If there was a partial overlap with several framework keywords, the category with the highest overlap score was selected. For each topic model, the output included different sets of framework categories with their assigned scores. All overlap scores were kept in order to keep a broad 35 3. Methods overview of all topics for further discussion. Furthermore, the results were aggregated by government type in two steps: first by summing the overlap scores of keywords within each topic for the same framework category, then summing those topic-level totals across all topics for each government type. For instance, let’s say we have two topics within the same regime. Topic 0 contains keywords keyword_1 and keyword_2, whilst Topic 1 has keyword_a, keyword_b, keyword_c, and keyword_d. Among these, keywords 1, 2, a, b, and c all have overlaps with the “Data Policy” framework category with scores 0.50, 0.75, 0.33, 1.00, and 1.00, respectively. The first score would be calculated by summing up the scores within each topic (i.e., Data Policy for Topic 0: 0.50 + 0.75 = 1.25, Data Policy for Topic 1: 0.33 + 1.00 + 1.00 = 2.33). The second aggregation combines these into an overall regime-level score for “Data Policy”, yielding 1.25 + 2.33 = 3.58. This two-stage aggregation provided a clear overview of which ethical dimensions were most discussed by each government type. Both raw and normalized data were used for comparison. The raw data (as described earlier in this section) allowed for a comprehensive analysis of topics within each government type. The normalized data was calculated by taking the raw overlap score and dividing it by the total overlap score across all overlap categories for each government type. The normalized data supported comparisons across government types, especially since different models could produce an uneven number of topics. These comparisons allowed us to identify the dominant themes emphasized by different regimes and assess whether distinctive differences emerge between them. 36 4 Results This chapter includes all the results obtained by the LDA and BERTopic models, including the hyperparameters from the grid search and topics with the respective keywords and evaluation metrics. Moreover, the Comparison Section 4.3 includes the results obtained by the personalized method used for the government type-level comparison and OCTIS for model comparison. 4.1 Model Configuration and Setup This section presents the final hyperparameter settings used for the topic models whose outputs form the basis of subsequent analysis. 4.1.1 LDA We applied grid search tuning for LDA across each regime type to identify the best- performing hyperparameter configurations. Specifically, we used number of topics (num_topics), number of training passes (passes), and priors (alpha and eta). As detailed in Section 3.2.2, the models were evaluated based on coherence and perplexity to ensure the interpretability of the topics. While using different hyperparameters across regime types introduces some challenges for direct comparison, this approach was necessary due to substantial differences in both the quantity and content of the data. Therefore, specific tuning for each government type ensured that the resulting topics were as meaningful and interpretable as possible within each context. For Flawed Democracy, in particular, we prioritized a slightly less optimal configuration with a lower coherence score to support a more balanced comparison between LDA and BERTopic. Table 4.1 summarizes the final hyperparameters used for each government type. The complete grid search results for the top five, configurations per regime are provided in Appendix A.4. 37 4. Results Government Type num_topics passes alpha eta Full Democracy 10 30 auto 0.05 Flawed Democracy 10 30 auto 0.01 Hybrid Regime 15 20 asymmetric 0.01 Authoritarian Regime 15 20 0.01 0.01 Table 4.1: LDA Hyperparameters by Government Type. 4.1.2 BERTopic For BERTopic, we ran a grid search using different combinations of hyperparameters – specifically varying the target number of topics (nr_topics), n_neighbors, and cluster_size – both without a representation model (POS tagging) and with it. Each combination yielded a set of topics and an associated coherence score. The results show that the model without the representation model generally produced higher coherence scores compared to the one with POS tagging (see Appendix A.6). However, POS tagging imposes restrictions on the types of topics that are extracted. Because there is a trade-off between coherence and interpretability, we decided to retain the representation model to achieve a more meaningful analysis. After the grid search, we selected the following strategy for choosing hyperparameters. For each government type (except Hybrid Regime, see Hyperparameter Grid Search in Section 3.3), we identified the top five hyperparameter sets that yielded the highest coherence scores (see Appendix A.5). From these sets, we chose the ones with the highest value for nr_topics. This decision was based on the observation that the coherence scores do not differ significantly, while a higher nr_topics value produced a more interpretable and detailed topic structure – especially important given the disproportionality in our dataset. However, the nr_topics hyperparameter is the maximum number of topics BERTopic will produce; thus, we also show the nr_output_topics parameter - the exact number of topics the model returned with the respective hyperparameters. Moreover, the custom hyperparameters allowed for a wider range of topics for further analysis. Government Type nr_topics n_neighbors cluster_size nr_output_topics Full Democracy 16 5 5 10 Flawed Democracy 16 15 10 10 Hybrid Regime 16 20 2 10 Authoritarian Regime 10 10 5 9 Table 4.2: Hyperparameters Used for Each Government Type. 4.2 Qualitative Topic Analysis To explore how LDA and BERTopic differ in their ability to extract meaningful and interpretable topics from AI policy documents, and to examine thematic variation 38 4. Results across governance types, the results are presented by model and regime classification. Tables 4.3, 4.4, 4.5, 4.6 4.7, 4.8, 4.9, and 4.10 were constructed based on the outputs of both models. Each table includes the topics and keywords found in the previous sections, together with a qualitative interpretation based on the documents they were extracted. 4.2.1 LDA As directed, LDA generated ten topics for Full Democracy, as seen in Table 4.3. Several topics relate to regulatory structure and data governance (Topics 0, 5, and 7), often emphasizing oversight, compliance, and coordination across institutions, but also within specific area of society such as health care (Topic 8), accountability, and individual rights within algorithmic decision-making and law enforcement (Topic 7). Others reflect broader economic themes, such as public investment, digital transformation, and EU-aligned recovery plan (Topics 4, 6, and 8). The ten topics extracted by LDA for Flawed Democracy (Table 4.4) reflect a strong emphasis on U.S. federal documentation and digital governance themes. Topics 1, 5 and 8 focus on digital transformation in different contexts such as education and infrastructure. Topics 3, 6, and 9 relate to research and development policy and legislative amendments, including grant procedures and military and cybersecurity. Topic 2 addresses ethical oversight in medical research in India. The number of topics was increased to align with the topics in BERTopic for comparative purposes, which led to some thematic overlap across topics. For the Hybrid Regime in LDA (Table 4.5), we output fifteen topics. Several topics in this governance type focus on education and digital literacy, particularly the integration of ICT into school systems and the development of local training ecosystems (Topics 0, 10, and 12). We also see broader national agendas aimed at preparing societies for digital transformation (Topics 8 and 11). A second group focused on data governance and security (Topics 3, 5, 6), as well as a third group that emphasized economic and industrial transformation (Topics 2, 7, 9, 13). Some topics exhibit keyword overlap (e.g., Topics 0 and 10 both contain “education”, “ICT”, and “ministry”), indicating closely related themes. While several topics exhibit thematic overlap, they often diverge in national context, policy focus, or implementation approach. Lastly, with LDA, we output fifteen topics for the Authoritarian Regime (Table 4.6). As with the previous government types, the model output a number of countries such as “China”, “Kazakhstan”, “Uzbekistan”, “Egypt”, “Russian”, “Dubai”, and “Vietnam”. A number of topics focus on digital infrastructure and personal data governance, including national strategies for data protection and cybersecurity (Topics 0, 1, 9, and 11). Some topical overlap is present in this group, reflecting the trade-off between coherence-based model selection and topic distinctiveness. Additionally, three topics address the health and education sectors (Topics 2, 6, and 10), including standardization of systems, integration of machine learning, and building technical capacity. A small number of topics reflect administrative reporting, archival material, or metadata, such as Topic 13, which appears to capture monthly archival records. 39 4. Results Topic Keywords Qualitative Interpretation 0 algorithm, knowledge, de, supervisor, A discourse focused on Dutch and European european, dutch, van, company, en, strategies for algorithm governance, including na- million tional supervision structures, public-private re- search partnerships, human capital initiatives, and regulatory frameworks within the Netherlands’ AI infrastructure development. 1 learn, strategy, user, society, science, A national AI strategy anchored in Japan’s AIP center, network, machine, company, network and supported by ministries and research principle centers, emphasizing explainability, user trust, and policy coordination. Highlights integration of sci- entific principles with public services, societal in- clusion, and international research dissemination. 2 uk, gov, cookie, university, we, coun- UK government communications on national AI cil, page, lead, professor, help strategy, featuring academic leadership, research councils, and policy announcements. Includes standard website elements like cookie notices and user feedback prompts. 3 article, law, aid, beneficiary, entity, Legal and procedural framework outlining direct subsidy, establish, provision, grant, subsidy grants to eligible entities, detailing aid pro- activity visions, beneficiary obligations, and justifications based on public interest and regulatory compli- ance. 4 investment, recovery, spain, pro- Spain’s recovery and transformation plan outlining mote, economic, component, spanish, investment and reform components aimed at pro- transformation, european, reform moting economic growth, digital transformation, and EU-aligned modernization across sectors. 5 uk, regulatory, regulator, across, Relates to UK-wide approaches to data regulation, individual, organisation, approach, outlining organisational and individual responsi- personal, guidance, worker bilities for lawful processing, personal data use, and regulatory coordination across sectors. 6 fund, language, european, eu, pro- EU-aligned public investment strategies, spanning gram, initiative, energy, employment, energy, language, employment, and digital econ- investment, economy omy programs aimed at economic resilience, inclu- sion, and sustainability. 7 algorithm, bias, organisation, algo- Regulation and oversight of algorithmic systems, rithmic, individual, human, tool, focusing on organisational practices for mitigating could, police, group bias, ensuring human involvement, and protecting individual rights across contexts, including law enforcement. 8 trial, health, solution, vehicle, cdaa, Strategy for data-driven innovation in health and safety, automatic, science, training, science, specifically on trials, automatic systems, analysis training, vehicle-based safety monitoring, through collaboration between the public and private enti- ties. 9 test, vehicle, standard, traffic, road, Austrian regulations for automated vehicle testing publication, de, automate, drive, on public roads, safety standards, driver roles, le- driver gal requirements, and coordination between public authorities and industry. Table 4.3: LDA: Full Democracy Topics and Qualitative Interpretation. 40 4. Results Topic Keywords Qualitative Interpretation 0 document, federal, content, search, U.S federal documentation system, focusing on order, register, official, office, detail, the structure, publication, and public access to https official government content through the federal register. 1 digital, solution, sector, administra- Digital transformation in education, healthcare, tion, education, company, work, cre- and public administration, with emphasis on work- ate, skill, area force, institutional collaboration, and the develop- ment of digital services and platforms. 2 india, health, ec, participant, study, Ethical oversight and risk management in medical risk, medical, review, ensure, must and health research in India, with emphasis on participant protection, informed consent, and EC (Ethics Committee) review. 3 rd, budget, nsf, federal, nist, nitrd, U.S federal research and development strategy health, network, advance, nih and budgeting for scientific advancement, includ- ing initiatives by NSF, NIH, NIST, and related agencies, with a focus on health and national in- frastructure. 4 explanation, sec, decision, para- Legal and policy adjustments promoting explain- graph, model, title, explainable, able models and decision-making through amend- amend, principle, strike ments and principles. 5 digital, innovation, country, sector, Public sector digital innovation, focusing on ser- industry, field, model, university, vice design, institutional transformation, and ministry, infrastructure building infrastructure and skills across ministries and administrations. 6 nsf, proposal, gov, award, comment, U.S federal policy on testing and approving inno- nist, submit, fairness, organization, vative services, focusing on transparency relevant grant to grant submission and organizational account- ability. 7 employer, employee, consortium, Partnership between industry and academia with google, laboratory, website, comput- partners such as Google and IBM aimed at ad- ing, employment, person, ibm vancing AI research with ethical and employment challenges in computing and public-private collab- oration. 8 al, standard, rd, human, federal, re- Federal digital transformation strategy focusing on sponse, strategy, application, test, infrastructure, emerging technologies to support strategic innovation and public service modernization. 9 title, sec, paragraph, force, strike, U.S. federal legislative text relating to amend- amend, inserting, code, military, in- ments, code insertions, and military provisions, sert references titles, sections, subsections, and insert- ing clauses into U.S. Code regarding defense au- thorization, cybersecurity, and government opera- tions. Table 4.4: LDA: Flawed Democracy Topics and Qualitative Interpretation. 41 4. Results Topic Keywords Qualitative Interpretation 0 education, transformation, ict, min- Kenya’s Ministry of Education’s plan to integrate ICT istry, plan, implementation, page, into the national education system through the Digital read, integrate, society Literacy Programme and prepare learners for a digitally- driven society. 1 april, people, partner, user, hold, Nigeria’s effort in digital transformation through policy phase, responsible, facebook, train, development and stakeholder training. transformation 2 april, people, say, general, reach, ex- Armenian digital public communication and digitaliza- ecutive, facebook, email, standard, tion strategy, promoting high-standard digital adoption high in governance and private sector transformation. 3 datum, risk, page, impact, gover- Mexico’s national data governance strategy, addressing nance, current, agenda, responsible, current institutional challenges, responsible innovation, challenge, future and future societal impacts through coordinated policy and stakeholder engagement. 4 industry, drive, potential, europe, ed- Ukraine’s national strategy to drive industrial moderniza- ucation, every, read, aim, transition, tion and digital transformation by integrating technology find and reforming education to enable future economic po- tential. 5 datum, security, responsible, must, Responsible data governance, emphasizing human rights, risk, express, impact, say, education, education, and security in managing digital risks and human ethical technology development. 6 security, say, express, agency, seek, National security and technological innovation through establish, training, cooperation, gen- military training, and the establishment of advanced eral, among research centers, aiming to develop robotics and emerging technologies for public service and defense development. 7 say, revolution, must, like, force, Uganda’s strategy to capitalize on the Fourth Industrial user, partner, people, many, eco- Revolution by establishing a task force, partnering with nomic global tech leaders, and promoting inclusive economic development through local technology adoption and in- novation. 8 agenda, minister, implementation, Industrial and digital transformation strategy, driven by photo, view, africa, industry, num- a formal agenda, ministers, task forces, and technological ber, approve, revolution infrastructure. 9 transformation, user, plan, state, Peru’s digital transformation efforts, focused on telecom- change, internet, company, main, munication regulation, user empowerment, state plan- telecommunication, save ning, internet service transparency, and cultural change within public digital governance. 10 education, ict, programme, ministry, Implement and promote ICT in primary and sec- phase, learn, primary, adopt, con- ondary schools to adopt digital learning tools, support tent, implementation competency-based curricula, and enhance teacher train- ing, content development, and infrastructure across edu- cation phases. 11 agenda, implementation, minister, National reforms for digital transformation across soci- republic, approve, official, present, ety, emphasizing public service modernization and stake- issue, society, also holder cooperation. 12 education, link, ict, startup, learn, Development of local ICT education ecosystems linking need, local, ecosystem, knowledge, training, startups, and knowledge to meet digital learning training and innovation needs. 13 research, official, center, minister, ini- Government-led innovation agendas fostering research- tiative, agenda, application, industry, industry collaboration through strategic policy, funding include, council programs, and institutional coordination. 14 april, stakeholder, contribute, article, Policy development and public engagement in digital phase, facebook, like, partner, hold, initiatives, highlighted through events and stakeholder seek contributions across platforms. Table 4.5: LDA: Hybrid Regime Topics and Qualitative Interpretation. 42 4. Results Topic Keywords Qualitative Interpretation 0 ng, viet nam, hc, hi, can, ch, state, Viet Nam’s strategies focused on personal data protec- personal, data, protection tion, digital infrastructures, and state-led innovation to support secure digital transformation. 1 algorithm, security, governance, A national framework to strengthen algorithmic gover- strengthen, supervision, social, china, nance in China, emphasizing enterprise responsibility, enterprise, risk, right netizen oversight, social supervision and regulatory mech- anisms to ensure security and protect rights. 2 china, improve, major, intelligent, Initiatives in China advancing intelligent technologies college, set, generation, computing, through major computing research, college reform, and increase, basic next-generation talent development. 3 grant, fund, report, phd, agreement, Government-administered research funding initiatives in result, candidate, applicant, republic Kazakhstan supporting PhD candidates and innovation kazakhstan, environmental through structured grant programs with environmental oversight and performance-based reporting. 4 city, smart, state, uzbekistan, insti- Implementation and development of digital economy and tute, society, communication, open, electronics to build smart cities and enhancing public high, electronic services. 5 egypt, company, egyptian, level, National AI and digital innovation strategy in Egypt. model, one, student, number, phase, Specifically education reform, local startup growth, and different phased adoption of machine learning models across sec- tors to address economic and societal challenges. 6 health, standardization, healthcare, Russian initiatives to standardize healthcare and educa- read, group, personal, committee, ac- tion through technical committees and data protection. cess, patient, working 7 lab, team, idea, user, solution, step, Innovation in Abu Dhabi’s health sector through design- innovator, story, product, stake- thinking labs that engage stakeholders, iterate on user- holder centered ideas, and develop solutions. 8 ministry, uzbekistan, document, also, Digital development and international policy coordina- vietnam, minister, republic, commu- tion across ministries and nations (e.g., Uzbekistan, Viet- nication, foreign, unit nam), with emphasis on communication, governance, and foreign affairs. 9 ministry, agency, open, portal, digi- Efforts by Russian state ministries and agencies to en- talization, state, search, russian, pro- hance transparency and efficiency in governance through curement, email digital portals, particularly for procurement, public ac- cess to information, and administrative services in na- tional digitalization. 10 egypt, also, course, student, econ- Egypt’s efforts to improve its economy through education, omy, communication, many, improve, digital skills, and innovation. problem, level 11 dubai, law, vietnam, director, uae, National data sovereignty and digital infrastructure for head, minh, data, notice, group innovation. 12 oecd, governance, trustworthy, pol- International efforts to promote trustworthy and account- icy, website, risk, issue, privacy, ap- able data governance and AI policies, emphasizing pri- proach, principle vacy, risk management, and ethical principles. 13 april, ncai, february, january, july, Monthly archival records and activities related to march, may, december, november, Saudi Arabia’s National Center for Artificial Intelligence october (NCAI). 14 call, april, online, privacy, problem, Legal and regulatory updates around privacy, cybersecu- owner, cybersecurity, federal, foreign, rity, and federal or international data governance, espe- free cially involving new calls, policy revisions, and problem- solving initiatives across jurisdictions. Table 4.6: LDA: Authoritarian Regime Topics and Qualitative Interpretation. 43 4. Results 4.2.2 BERTopic For Full Democracy, we specified for BERTopic to output ten topics, as presented in Table 4.7. Several topics relate to the public-sector digital transformation, with particular attention to transparency, automation, and ethical oversight (Topics 0, 1, 2, and 6). These themes include human-centric regulation, algorithmic accountability, and integration of national and international digital strategies. Other topics address challenges posed by emerging technologies such as deepfakes and surveillance systems (Topics 6 and 8), while some reflect metadata or navigational content from digital government documents (Topics 4, 5, and 9). The BERTopic model identified ten topics for Flawed Democracies (Table 4.8), several of which mirror institutional and legal frameworks from the United States and India. Topics 0, 3, and 7 refer to U.S. federal law, national defense, and cybersecurity strategies, particularly concerning technical standards and military training. Other topics emphasize responsible AI development and implementation in public administration and healthcare contexts, such as India’s national efforts (Topic 1). Additional themes include explainable AI (Topic 4), technological collaboration (Topics 5, 6), and international competitiveness in AI governance (Topic 8). For Hybrid Regime, we generated 10 topics by using BERTopic (see Table 4.9). Half of these topics discuss policies from a national perspective, explicitly naming the country (e.g., Turkey, Kenya, etc.). Most of these (Topics 0, 1, 8, 9) discuss action plans, programs, and preparations for advancing technology. Another set of topics (Topics 2, 3, 4, and 6) highlights higher education and conferences, focusing on research projects, skill development, doctoral or academic programs, and technology- oriented events. Other topics (Topics 0 and 4) address issues related to human rights, notably in terms of protection of rights and juvenile/jurisdiction matters. By using BERTopic, we generated nine topics for the Authoritarian Regime (Table 4.10). These topics reflect national digital governance strategies, with an emphasis on data protection law, algorithmic supervision, and digital infrastructure development. Topic 1 outlines advanced algorithm governance and citizen supervision in China, while Topics 2 and 6 describe data protection revisions. Education and research programs also feature in grant-related topics (Topics 5 and 8), alongside national strategy planning for AI deployment and economic modernization (Topics 3 and 4). 44 4. Results Topic Keywords Qualitative Interpretation 0 digital, public, new, national, euro- A joint European AI strategy covering cooperation on AI pean, social, economic, international, competence centers, sovereign data infrastructure, and strategic, human national civilian plans, and ethical, legal, and transparent frameworks for human rights and democracy. 1 decision making, automated, algo- Integration of automated decision-making into public and rithmic, public, data protection, new, regulatory processes, including document digitization, discrimination, personal, human, reg- ML tools, data-protection impact assessments, personal- ulatory data transparency, and bias risks in training and infer- ence. 2 public, automatic, new, automatic Outlines a national data-science and automated-learning learning, national, digital, social, hu- framework for government, emphasizing transparency in man, strategic, autonomous decision-making, funding mechanisms, strategic goals, and human involvement. 3 beneficiary, general, concession res- Defines eligible subsidy actions and expenses, sets limits olution, subsidable, prior, following, on aid amounts, and specifies the documentation needed corresponding, economic, beneficiary to justify payments. entities, technical 4 additional cookies, additional, cookie Repeated banners explaining why, where, and how web- settings, obtain permission, copy- site cookies – both essential and additional – are used, right holders, party copyright, main and how to set preferences. Likely from the same UK content, visit nationalarchives, copy- government site. right information, ukdocopen 5 startxref, adobe design, service cen- PDF-export metadata combined with footer navigation tre, capable supercomputer, inter- list of digital-hub programs and webpage menu items. nal, data subjects, missing compo- nents, unchanged, ambassadors part- ners, systems mends 6 facial, live, facial identification, fa- Facial recognition technology – what it is, facial matching cial recognition, society groups, po- uses (one-to-one vs. one-to-many), and legal safeguards lice forces, identification systems, fa- (Data Protection, Human Rights and Equality Acts) cial verification, private, recognition around live police deployments. technology 7 doctoral, doctoral training, quantum, An overview of Vrije Universiteit Brussel’s (VUB) ecosys- menu group, international, centres tem – its doctoral and research groups and programs, and bpost, bpost parcel, campus manage- on-campus support and services (shops, parcel locker, ment, sports infrastructure, funded etc.) centres 8 deepfake, doctored, audio, visual, Discussion of deepfakes and audio-visual content as forms face replacement, enactment, dis- of disinformation. Highlights the need for media trans- information, fake, speech synthesis, parency on both pros and cons and encourages public media platforms vigilance and fact-checking. 9 visit today, card details, blank, credit Standard disclaimer text appearing across different UK card, new tab, wrong, financial infor- government webpages on AI regulation and strategies, mation, useful, personal, financial advising users not to share personal information and inviting them to complete surveys. Table 4.7: BERTopic: Full Democracy Topics and Qualitative Interpretation. 45 4. Results Topic Keywords Qualitative Interpretation 0 military, national, federal, following, Sections outlining U.S. defense and national-security fiscal, general, subparagraph, appro- law, detailing the duties and reporting obligations for priate, foreign, new the Secretaries of Defense and State, and setting rules on international maritime law, military installations, defense, funding, and oversight by various congressional committees. 1 indian, medical, national, clinical, Challenges of AI and its development in agriculture and new, responsible, ethical, potential, healthcare, with references to ethical policies by the key, human Indian Council of Medical Research and the importance of responsible nationwide implementation. 2 digital, public, public administration, Potential of a new industrial revolution driven by AI digital transformation, new, national, and digital technologies, focusing on national strategies country, industrial, industrial revolu- for digital transformation, U.S. digital security, public tion, fourth administration, and the role of education and research institutions. 3 federal, human, american, technical Due to the U.S. leadership in AI, federal agencies are standards, national, federal govern- responsible for fostering reliable AI development through ment, technical, strategic, regulatory, technical standards, with active participation from the standards development private sector and academia to support new industries. 4 explainable, explanation accuracy, Self-explainable, interpretable ML models provide both human, interpretable, neural, mean- global and per-decision explanations. When explanations ingful, counterfactual, decision accu- are not meaningful, alternative algorithms are used for racy, black box, knowledge limits additional information, and some metrics can be used to assess explanation accuracy. 5 digital, european, public, high, pub- Estonia’s Digital Agenda 2030 and National AI Strategy lic sector, private, possible, digital that include private and public sectors (i.e., roadmap government, action plan, main for development projects), and detail action plans for training, funding, and ongoing implementation updates. Also, funding and plans for Italian companies in both sectors focused on AI research. 6 technological development, techno- Lists research institutions and programs, including the logical, scientific, prestigious, inter- Chinese Economic and Scientific Delegation visit, a pres- national, summer school, korean, tigious Industry 4.0 conference, Splitech 2025 on sus- center, global, international confer- tainable and smart technologies, and Eastern European ence Machine Learning School. 7 national, public, military, digital, Sections of U.S. law covering national defense man- economic, new, armed, fiscal, strate- agement, cybersecurity policies, and training programs gic, subparagraph across the Navy, Army, Air Force, Marine Corps, and other branches. 8 israeli, cloud, intelligent, human cap- Technological revolution describing Israel’s advance- ital, national, human, technological, ments in AI technologies, human capital development, high, team, various cloud services, and national funding plans. 9 ordinary skill, fair use, ordinary, fair, Guidance for defense and intelligence agencies on han- natural person, trade secret, secret, dling personal and sensitive information and human- claimed invention, sui generis, natu- capital planning, alongside public debate on whether ral existing laws for intellectual-property systems should be updated for AI-generated content. Table 4.8: BERTopic: Flawed Democracy Topics and Qualitative Interpretation. 46 4. Results Topic Keywords Qualitative Interpretation 0 human rights, human, action plan, Turkey’s action plan regarding protection and promotion public, social, effective, doctoral, ju- of human rights and freedom, including alternative sanc- dicial, international, legal tions to short-term prison sentences and convict rights, rights to property, and victims of violence. 1 digital, government, digital transfor- Kenya’s three-phase program on technology integration mation, digital agenda, digital liter- in basic education systems through Competency-Based acy, primary, local, key, appropriate, Curriculum. It highlights that a new policy guide (e.g., technical integrating ICT, smart classroom setup), with the gov- ernment’s involvement, aims to lead this implementation to success. 2 human, multimodal, multimodal per- Global universities and their respective research projects, ception, societal, responsible, delle primarily focused on ethical and responsible AI devel- ricerche, societal use, use cases, trust- opment and applications (e.g., evidence-based chatbot worthy, industrial interactions, segmentation in automatic captioning sys- tems, and multimodal perception and modeling). 3 personal data, personal, data con- Policies on personal data collection and handling, partic- troller, subject, data subject, data ularly when data controllers and data processors can and controllers, necessary, international, should erase, destroy, or anonymize data. Moreover, it relevant, international organization highlights conditions for domestic and international data transfers, different purposes, special data categories, and measures to be taken by data importers. 4 scientific, right, human rights, profes- Human rights, higher education, and skills development sional, reference experiments, rights in the context of the Juvenile Justice System. Themes boards, indispensable, high educa- include support for convicts to acquire professional skills, tion, management skills, professional maintain contact with families, and the needs of minori- skills ties. 5 doctoral, academic, human rights, A booklet of policies outlining academia, higher edu- human, higher, judiciary, higher edu- cation requirements, and administrative affairs. Addi- cation, doctoral programs, new, pub- tionally, it discusses the EU Action Plan on Human lic Rights and Democracy, planning costs, and the exclu- sion of discrimination for decisions made through the e-Government gateway. 6 machine learning, high school, cover The Hybrid Human-AI Conference and its events focused letter, great enthusiasm, diverse, on AI developments, human-AI collaboration, and re- technical, high, relevant experience, search in machine learning, human-computer interaction, human, advanced and psychology. Additionally, technology-oriented and awareness-raising training sessions held to prepare young people to gain skills for the job market. 7 political party, party groups, politi- Turkey’s rules and timeframes regarding the election of cal, annexed, public, personal, mem- political party groups, Board members, and Authority bership positions, vacant, personal personnel. It also discusses exemptions from the per- data, total number sonal data law, particularly the conditions under which personal data is processed. 8 information technology, digital econ- Nigeria National Information Technology Development omy, new, emerging technologies, Agency’s approach to the exponential growth of technol- digital technologies, exponential, ex- ogy and AI. It is responsible for developing frameworks ponential growth, digital, corporate, and guidelines for the IT sector to support a sustainable urged stakeholders digital economy. 9 task force, industrial revolution, Uganda’s preparation for the Industrial Revolution to emerging technologies, harness op- drive economic development, including the creation of portunities, national guidance, an- a task force of scientists, policymakers, and engineers. nual report, advise government, na- The focus is on increasing agricultural production and tional task, industrial, digital inno- adopting domesticated technologies, instead of foreign- vation driven innovations (e.g., automated vehicles). Table 4.9: BERTopic: Hybrid Regime Topics and Qualitative Interpretation. 47 4. Results Topic Keywords Qualitative Interpretation 0 innovator teams, design thinking, Documentation on the AI Lab: what it is, its web portal, safe, key, product owner, value test- and its design-thinking process. Includes team structure, ing, project, prototyping, portal, dif- prototypes, and benefits of user-value testing and deliv- ferent ery. 1 algorithm, security governance, in- A guide to advanced algorithm filing – how it works formative service, service algorithms, and how to use it – alongside continuous netized partici- social, algorithm filing, orderly, so- pation and the improvement and development of tech- cial supervision, security risks, posi- nologies, algorithms, and models focused on supervision, tive energy transparency, legality, security risks, and overall security governance. 2 data protection, personal data, pub- Amendments to Saudi Arabia’s Personal Data Protec- lic, personal, federal, national reg- tion Law – resolving stakeholder concerns, allowing con- istry, data transfers, trade zones, free trollers to collect and use third-party personal data (un- trade, privacy framework less sensitive), and removing the national registry. Also, a discussion of the UAE’s privacy framework, including general privacy rights and prohibitions on data misuse. 3 national, new, international, public, AI’s importance in modern society, and a national plan economic, human, level, major, so- on how China will implement and promote it – includ- cial, smart ing lawmaking, safety assessments, labor training, and academic scholarships for AI programs. 4 national, national center, global, Saudi Arabia’s National Center for AI – covering ap- cloud, large, civil law, large data, plications, risks, challenges, and data protection – and trustworthy, economic, generative Vietnam’s recognition of gaps in AI development, leading to a Tactical Targeting Network Technology plan and development of broader AI strategies. 5 trng, financial, financial report, work- Contract for an internship in Kazakhstan, outlining re- ing days, free, environmental, evalua- quirements for receiving financial grant funding and de- tion, implementation, grant funding, tailing rules, including financial, technical, and adminis- technical trative details of the research implementation. 6 information technologies, digital, Uzbekistan’s development of IT and digital transforma- coordination commission, working tion roadmaps, highlighting increases in broadband ports, groups, international, digital trans- communication lines, implementation of information sys- formation, state bodies, road maps, tems and electronic services, and training workers in the software products, digital technolo- IT sector. gies 7 digital, personal data, personal, data Description of a research institute for digital technology protection, scientific, grant program, and AI development in Uzbekistan, established by presi- research institute, digital technolo- dential decree, outlining its main goals and functions. gies, grant agreement, information technologies 8 grant program, scientific, financing, Financial reporting requirements and documentation environmental, notarized, following, rules for a PhD research and training grant program grant funds, financial, host organiza- under EMF, including required documents, forms, and tion, technical conditions. Table 4.10: BERTopic: Authoritarian Regime Topics and Qualitative Interpretation. 4.3 Ethical Topic Variation Across Government Type Table 4.11 includes a quantitative summary of LDA and BERTopic models, including the number of topics per government type that overlap with the 11 framework 48 4. Results categories (Table A.1). The topic is said to overlap if there is at least one partial or perfect overlap with the framework keyword. It can be seen that LDA overlaps with 10 ethical framework categories, while BERTopic overlaps with all 11 categories. Government Type LDA Model BERTopic Model Full Democracy 9 9 Flawed Democracy 7 8 Hybrid Regime 10 11 Authoritarian Regime 9 9 Table 4.11: The Number of Government Type Overlaps With the Created Ethical Framework’s Topics (Out of 11) (A.1) for LDA and BERTopic. Figures 4.1 show the normalized topic-framework scores produced by the LDA and BERTopic models. These scores provide a comparative overview of how much emphasis each government type places on particular framework categories relative to their total topic distribution. The higher the score, the more accurate and stronger the overlap with the framework category. 49 4. Results (a) LDA (normalized scores) 1.0 Government Type 0.9 full democracyflawed democracy hybrid 0.8 authoritarian 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 1 2 3 4 5 6 7 8 9 10 11 Framework Topic (b) BERTopic (normalized scores) 1.0 Government Type 0.9 full democracyflawed democracy hybrid 0.8 authoritarian 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 1 2 3 4 5 6 7 8 9 10 11 Framework Topic 1 Ethical Impact Assessment 2 Ethical Governance and Stewardship 3 Data Policy 4 Development and International Cooperation 5 Environment and Ecosystems 6 Gender 7 Culture 8 Education and Research 9 Communication and Information 10 Economy and Labour 11 Health and Social Well-Being Figure 4.1: Stacked Bar Charts of Normalized Topic-Framework Scores from Both Topic Modeling Approaches. 50 Normalized Score Normalized Score 4. Results Framework Category LDA Overlap BERTopic Score Overlap Score 1 - Ethical Impact 0.51 0.95 Assessment 2 - Ethical Governance and 0.94 0.79 Stewardship 3 - Data Policy 0.37 0.46 4 - Development and 0.93 0.70 International Cooperation 5 - Environment and 0.11 0.07 Ecosystems 6 - Gender 0.15 0.16 7 - Culture 0.22 0.12 8 - Education and Research 0 0.03 9 - Communication and 0.17 0.04 Information 10 - Economy and Labour 0.19 0.34 11 - Health and Social 0.41 0.31 Well-Being Table 4.12: Model Topic Overlap with the Framework Categories, Normalized Scores. Green Color Indicates the Highest Score Overlaps, and Red Color Indicates the Lowest Scores. The Framework Categories That Both Models Match Are Also Highlighted Respectively. 4.4 Quantitative Results Using OCTIS This section presents OCTIS evaluation results to complement the qualitative evalua- tion to answer the main research question of how BERTopic and LDA compare in their ability to extract meaningful and interpretable topics. The metrics Coherence CV , WECoherencePairwise, Topic Diversity, and IRBO, are used to quantify various dimensions of model quality. Coherence, specifically CV , evaluates the seman- tic consistency of top keywords based on their co-occurrence in the input corpus. WECoherencePairwise computes the average pairwise cosine similarity between top topic words using word embeddings. Topic Diversity quantifies the uniqueness of words across topics and penalizes models that produce highly similar topics with slight variations. Lastly, IRBO complements Topic Diversity by measuring redundancy based on how often the same words appear in similar positions across topics. More detailed descriptions of these metrics are provided in Section 2.2.2. Results are organized by evaluation metric, with comparisons shown across models and government types to provide an overview of performance differences across configurations. Presented below are the results for OCTIS by government type and their correspond- 51 4. Results ing values from the metrics. The better-performing results are highlighted in bold. Following are the results described in the corresponding LDA and BERTopic sections. Government Type Topic Diversity Coherence (CV ) WECoherence Pairwise IRBO Full Democracy 0.7933 / 0.8867 0.5660 / 0.5772 0.0344 / 0.0118 0.9690 / 0.9824 Flawed Democracy 0.8600 / 0.8333 0.5385 / 0.5157 0.0370 / 0.0055 0.9671 / 0.9627 Hybrid Regime 0.6333 / 0.9714 0.4766 / 0.4567 0.0183 / 0.0130 0.9300 / 0.9713 Authoritarian Regime 0.8400 / 0.8778 0.5037 / 0.4713 0.0404 / 0.0105 0.9802 / 0.9715 Table 4.13: Comparative OCTIS Metrics for LDA vs BERTopic by Government Type. Higher Values Are Bolded. For Coherence (CV ), LDA achieved better coherence in three out of four government types (Flawed Democracy, Hybrid Regime, and Authoritarian Regime). This suggests that LDA’s topics were, on average, more semantically consistent and interpretable based on word co-occurrence patterns. The only exception was Full Democracies, where BERTopic slightly outperformed LDA and achieved the highest coherence score overall. This suggests that LDA more reliably produced interpretable topics across the different government types. In the metric WECoherencePairwise, LDA again performed best, with a notable margin across all regimes. LDA produced more internally coherent topics while BERTopic, which often selects top words based on conceptual similarity rather than lexical proximity, performed lower on this measure. For Topic Diversity, BERTopic outperformed LDA in three out of four regime types, indicating that it resulted in more varied and less repetitive topics. The exception was Flawed Democracies, where LDA showed higher diversity. While higher diversity can indicate broader thematic coverage, it does not necessarily guarantee interpretability. Lastly, IRBO saw mixed results with LDA scored higher for Flawed and Authoritar- ian regimes, while BERTopic was stronger in Full and Hybrid regimes. However, the differences in IRBO values were relatively small, suggesting that both models maintained relatively balanced topic distributions. These results provide an initial indication of model behavior across regimes in terms of topic coherence, diversity, and balance. While LDA consistently scored higher on coherence-based metrics, BERTopic showed strengths in topic diversity. These patterns are discussed further in the following sections. 52 5 Discussions The following sections thoroughly discuss the results obtained in the previous chapter. Specifically, the LDA and BERTopic models are compared on both quantitative and qualitative levels. Additionally, we perform an in-depth analysis of the topics obtained by the models, particularly in relation to government types. 5.1 Model Comparison and Qualitative Analysis This section discusses the thematic patterns identified in AI policy documents across regime types, focusing on how key issues are framed differently depending on governance context, and how BERTopic and LDA capture and represent these variations. 5.1.1 Model Comparison per Government Type The topics extracted from the BERTopic’s Full Democracy documents strongly emphasize transparency in policymaking. In particular, the texts explore different strategies for automation tools in government and decision-making, highlighting democratic human rights and the need for transparent models. Additionally, Full Democracy addresses modern technologies, facial recognition, deepfakes, and audio- visual content, stressing the media’s need for transparency and information about disinformation risks. Moreover, about one-third of the topics consist of scraped metadata, including footers, banners, etc., which do not contribute directly to the policy analysis. This indicates that further data cleaning was needed to reduce the amount of metadata. However, due to time constraints, more extensive preprocessing was not feasible. LDA’s topics for Full Democracy show a strong focus on regulatory and ethical application of technology. Several topics include references to specific national or regional contexts, often appearing as named entities (e.g., countries, institutions, or programs). Even though the documents were not analyzed by country, some topics show consistent signals from national strategies. While the topics vary by national contexts, some recurring patterns are visible. Several topics relate to regulatory structure and data governance, often emphasizing oversight, compliance, and coordination across institutions, but also within specific areas of society, such as health care, accountability, and individual rights within algorithmic decision-making and law enforcement. Others reflect broader economic themes, such as public investment, digital transformation, and an EU-aligned recovery plan. These 53 5. Discussions patterns indicate that Full Democracies are positioned at a later stage of digital governance, where regulatory frameworks, transparency, and ethical oversight play a central role in shaping technological development. Flawed Democracy topics from BERTopic focus on national AI policies and defense. The majority, at least four out of the ten topics, outline U.S. leadership and military-related regulations. Moreover, these documents highlight the potential of AI in driving a technical revolution, introducing the need for national strategy plans, implementation roadmaps, and funding allocations. Similar to Full Democracy, the documents also emphasize transparency and explainability of AI models and the role of academic and research institutions in shaping AI development. For LDA’s 10 topics, there was a large focus on the U.S., particularly on digital innovation, research infrastructure, and legislative processes across different contexts. For example, both Topics 5 and 6 mention digital innovation, where Topic 5 centers on public sector innovation and Topic 6 highlights transparency and accountability in federal grant submissions and organizational evaluation. The large number of topics focusing on the U.S. is likely influenced by the large volume of U.S.-based documents in the dataset. As a global hegemon with a considerable amount of policy documentation, the U.S. influence is captured in the prevalence of topics where the country appears. This reflects not only data quantity but also geopolitical influence, which shapes the visibility of U.S. policies in the global AI discourse. Furthermore, as in BERTopic, “military” was mentioned but not to the same extent in LDA, where it was only mentioned once. Additionally, because we opted to use the lesser-performing hyperparameters in this regime type to allow for more topics in LDA, we also observe some redundancy, particularly in topics related to federal structures (e.g., Topics 0 and 8). The Hybrid Regime for BERTopic had 10 topics, with a strong emphasis on advancing or developing technologies across varying contexts, including education, human rights, or economic development. Compared to BERTopic’s output in other regime types, the topics under Hybrid Regime display a more distinctly national focus. This may suggest that countries within this regime type exhibit less policy overlap, making country references a stronger indicator of topic relevance. It is worth noting that this pattern appears more pronounced in BERTopic than in LDA, which shows a stronger country-specific signal across all regime types. The difference is likely due to LDA’s probabilistic structure, which relies more heavily on frequent word co-occurrence patterns, which could more often find country mentions as a more dominant topic feature. In LDA’s topics there was an overarching focus on transformation. The topics mentioned are integrating information and communication technology into the educational systems and modernizing industrial capacity, showing a focus on building technological foundations. This suggests that Hybrid Regimes are in an earlier phase of technological development, with an emphasis on foundational implementation rather than developing regulatory frameworks. Furthermore, one LDA topic includes keywords related to training and security; while the term “military” is not explicitly present, further qualitative analysis show the topic pertains to military training. Authoritarian Regime countries acknowledge the growing importance of AI. In 54 5. Discussions particular, a Chinese document stresses this by establishing laws, training programs, and scholarships to make AI more appealing and engaging to more people. Other nations outline national plans and infrastructure roadmaps, broadband ports, com- munication lines, IT systems, and workforce development, to support future digital transformation. Some texts discuss the legal and transparent deployment of AI models and websites. Moreover, the regime includes sections on research institutes, grant programs, and technical contracts for AI development. These observations imply that the Authoritarian Regime prioritizes AI as a strategic tool for modernization, simultaneously underlining state-backed research pro- grams. LDA had similar topics, such as the implementation of digital infrastructure, the development of national research initiatives and funding, and the promotion of smart cities and digital governance. Additionally, LDA topics include references to education reform, with several also addressing privacy. However, certain topics show a distinctive character not found in the other government types. Notably, mentions of “supervision” and “social”, which, upon qualitative examination, point to a discourse emphasizing enterprise responsibility, social supervision, and state regulatory mechanisms. These findings suggest that while both models highlight the innovative focus of AI, LDA reveals an additional layer of governance through topics emphasizing social supervision, an aspect less visible in BERTopic, which centers more on innovation and digital infrastructure. 5.1.2 Cross-Regime Comparison The thematic differences in the government types identified by both models suggest a broader structural distinction in how regimes approach technological development. Hybrid Regimes appear to prioritize foundational implementation and capacity building, while Full Democracies more often emphasize the governance and regulation of already deployed technologies. This contrast may indicate different stages of technological adoption and institutional maturity across regime types. However, this distinction may not only be a reflection of political systems, but can also point to economic factors. For instance, countries with higher GDPs could be better positioned to focus on more advanced technology, whereas regimes with limited economic resources might need to prioritize infrastructure development and digital capacity. Recurring themes such as “military” and “health” also reveal significant regime-specific variations. The term “military,” for instance, appears explicitly in Flawed Democracy topics, where it is framed within legislative and safety protocol contexts, emphasizing national strategy and oversight. In Hybrid Regimes, however, military discourse centers on training, international cooperation, and the development of emerging technologies, suggesting a more operational and capability-building orientation. Similarly, the topic of “health” is addressed with different emphasis. In Full Democracies, health is often linked to individual rights and accountability in algorithmic decision-making, reflecting practical technological regulations. By contrast, in Hybrid Regimes, BERTopic highlights health within the legalistic context of data consent and human rights frameworks, focusing more on structural safeguards than on individual-level protections. 55 5. Discussions Finally, while all regime types exhibit an interest in digital innovation and modernization, the discursive framing of these developments varies notably across the spectrum of democratic governance. In Authoritarian Regimes, the focus on infrastructure, national AI strategies, and technological advancement mirrors themes seen in other regime types. However, a distinctive perspective appears in one of the topics, where terms such as “supervision” and “social” indicate a concern with enterprise responsibility, citizen oversight, and state-regulated monitoring. This element does not appear in the other government types and suggests a more centralized and control-oriented interpretation of digital governance. This contrasts with Full Democracies, where discussions of emerging technologies, such as facial recognition, deepfakes, and audiovisual manipulation, are situated within a discourse of transparency and public accountability. There, the emphasis lies on mitigating disinformation and preserving democratic norms. Flawed and Hybrid Regimes, positioned between these two ends of the spectrum, reveal more mixed patterns: Hybrid Regimes emphasize infrastructure and transformation, while Flawed Democracies show signs of both regulatory formality and strategic assertiveness, particularly in relation to U.S. leadership. 5.2 Ethical Topic Variation Across Government Types This section interprets how the extracted topics vary across the four regime types defined by the Democracy Index: Full Democracies, Flawed Democracies, Hybrid Regimes, and Authoritarian Regimes. In particular, the comparison was split into two parts: topic-level and government-level comparisons. This allowed an in-depth inspection of how the two models compare to each other in terms of overlap scores with our created ethical framework (see Table A.1). Additionally, it was examined how much the ethical framework categories were discussed within the extracted topics, especially how those categories were distributed within the same government type documents. Based on this analysis, we investigate whether there is evidence that different regimes discuss different ethical aspects when implementing AI policies. 5.2.1 Topic-Level Comparison This section includes a topic-level discussion, particularly about the framework categories that the LDA and BERTopic models discuss and emphasize using the normalized score plot (Figure 4.1). Additionally, Table 4.12 provides numerical normalization overlap scores for both models, highlighting the top and bottom categories. For each topic, we count how many of the 11 ethics framework categories it overlaps with. Tables A.7 and A.8 show these overlaps and scores, while Figures in 4.1 visualize their distributions. The two models share the three highest-score categories – “Ethical Impact Assessment”, “Ethical Governance and Stewardship” and “Development and Inter- national Cooperation”. This shows that the key topics discussed for each model include similar issues and ethical concerns. Similarly, the models place the 56 5. Discussions least emphasis on the same three out of four categories – “Environment and Ecosystems”, “Education and Research”, and “Communication and Information”. The overlap in the top and bottom ranking categories indicates that both models share a similar topic distribution, which is expected given their use of the same dataset. Next, we examine how these overlaps play out proportionally across regime types in the Government-Level Comparison. 5.2.2 Government-Level Comparison This part includes a comparison of the four regimes to see whether there are identi- fiable differences between them, and which categories receive the most significant portion of the policy texts. As mentioned, the normalized plots in Figure 4.1 provide a good overview of how the topics are distributed across the four government types. This is particularly important since the regimes have different numbers and sizes of documents, thus introducing the need for normalization for reasonable and consistent comparison. Full Democracy displays a broad policy focus, but the two models emphasize different framework categories. As evident in LDA (Figure 4.1 (a)), the most discussed topics – “Development and International Cooperation” (31%) and “Health and Social Well-Being” (14%) – account for 45% of the framework-aligned content. In contrast, BERTopic (Figure 4.1(b)) highlights “Ethical Impact Assessment” (23%) and “Development and International Cooperation” (22%), again totaling 45% of the overlaps. These results suggest that both models place the same emphasis on one of the two categories. Flawed Democracy places a strong emphasis on the ethics-themed categories. In LDA, “Ethical Impact Assessment” (approximately 29%) and “Ethical Governance and Stewardship” (23%) account for 52% of frame overlapped content. In BERTopic, the same categories make up 54% of the overlap – 29% and 25%, respectively. These results indicate consistent results, with Flawed Democracy focusing on ethical topics across both models. Hybrid Regime has a broader distribution of the overlapping framework categories. In LDA, the most considered classifications – “Ethical Governance and Stewardship” (32%) and “Development and International Cooperation” (29%) – make up 61% of the total framework-aligned content. In BERTopic, the most common framework topics covered are “Ethical Impact Assessment” (24%) and “Development and International Cooperation” (19%), totaling 43% of the content. The results indicate that Hybrid Regime has a slightly more spread-out distribution of topics, covering more framework category aspects throughout the documents. Authoritarian Regime, similarly to the Flawed Democracy, highlights ethics. Besides the “Ethical Governance and Stewardship” (27%) category, topics in LDA also emphasize “Data Policy” (20%), which is 47% of the total overlap of the framework. In BERTopic, “Ethical Governance and Stewardship” (21%) and “Ethical Impact 57 5. Discussions Assessment” (19%) make up 40% of the framework-aligned content. Again, these results underline the regime’s consistent ethical focus across both models. Overall, the most prominent framework category across all regimes relates to ethics. While Flawed Democracy and Authoritarian Regime place more emphasis on ethical topics (i.e., “Ethical Governance and Stewardship”, “Ethical Impact Assessment”), Full Democracy and Hybrid Regime prioritize “Development and International Cooperation” among others. However, even with the inconsistencies between the models, almost all regimes (except Full Democracy in LDA) highlight at least one of the two ethics topics. 5.3 Quantitative Analysis Using OCTIS This subsection includes a discussion on the comparative performance of the two topic models, LDA and BERTopic, based on quantitative evaluation metrics. This analysis complements the previous qualitative evaluation and directly contributes to answering the main research question by assessing how well each model performs in extracting coherent and diverse topics from AI policy documents. To assess the performance of LDA and BERTopic, it is useful to compare the models directly across evaluation metrics, rather than by government type. This approach provides a clearer picture of each model’s strengths and limitations. In terms of Topic Diversity, BERTopic outperforms LDA in three out of the four government categories. The only exception is in Flawed Democracies, which also happen to contain the largest dataset. One plausible explanation for BERTopic’s lower Topic Diversity score in Flawed Democracies is that LDA, when provided with ample data, can have many well-separated topics, reducing vocabulary overlap and enhancing diversity scores. Conversely, BERTopic’s reliance on embeddings may begin to cluster thematically similar content more tightly as corpus size grows, resulting in broader but less lexically distinct topics. When evaluating IRBO, BERTopic and LDA performed better in two government types each. Overall, the differences in IRBO scores between the models are relatively minor. In Coherence (CV ), LDA consistently outperforms BERTopic across all government types, except for Full Democracies. This aligns with LDA’s modeling approach, which favors internally coherent groupings of top-ranked terms based on co-occurrence statistics. In contrast, BERTopic prioritizes semantic context through embeddings, which can result in broader conceptual coverage at the expense of tight lexical cohesion. The metric where LDA clearly dominates is WECoherencePairwise, in which it consistently outperforms BERTopic across all government types. This result is expected, as LDA directly optimizes for word co-occurrence patterns within topics. However, while these quantitative metrics offer a structured means to compare model performance, they do not always capture how meaningful 58 5. Discussions or interpretable the resulting topics are to human readers. In particular, BERTopic’s lower WECoherencePairwise scores may reflect the model’s tendency to include top words in a topic that are semantically related but lexically diverse. While this leads to lower lexical coherence by standard measures, it can actually enhance interpretability by capturing broader, real-world thematic groupings. For example, the recurrence of domain-relevant tokens, such as terms related to law, health, or regulation, across topics often reflects legitimate thematic intersections rather than redundancy. This overlap can help human interpretation by revealing nuanced variations and relationships between conceptually connected topics. For example, the appearance of similar regulatory terminology in both public health and environmental topics may indicate a shared conceptual framework, rather than poor topic separation. BERTopic’s embedding-based architecture enables it to capture such semantic proximity and contextual overlap, offering a more nuanced view of thematic content. In contrast, LDA’s stricter lexical boundaries between topics contribute to its higher scores in WECoherencePairwise and Coherence (CV ), but this comes at the cost of thematic flexibility. While such strict boundaries improve quantitative evaluations, they may confuse meaningful connections between topics, particularly when similar concepts are expressed in slightly different lexical forms or embedded within different discursive contexts. Ultimately, the differences between model evaluation metrics and human interpretability indicate a methodological tension. LDA performs best in generating tightly bound, internally consistent topics, which are rewarded by stan- dard coherence and diversity metrics. BERTopic, while often penalized by these same metrics, may better align with how humans understand and navigate complex, overlapping thematic landscapes. Our results reflect this widely established trade-off between better metric performance and qualitative interpretability, high- lighting the current limitations in evaluating topic models in a way that reflects human reasoning and understanding, particularly in domains like political science, where semantic nuance and discourse structure are central to interpretation. 59 5. Discussions 60 6 Conclusion This chapters concludes the thesis by summarizing the key observations and facts from the previous sections, highlighting the limitations encountered throughout the thesis and further research that could follow. 6.1 Conclusion The thesis aimed to see whether there are quantitative and qualitative differences between BERTopic and LDA models, and what the similarities and differences are for interpreting topics generated from AI policy documents. To answer this, two sub-questions were introduced: (1) regarding the themes and keywords found across the AI policies and different government types, and (2) whether there are notable differences in what the government types discuss regarding ethical, economic, etc. considerations. To answer the first sub-question, we examined the topics identified across different government types. Some themes, such as digital transformation, were present across all government types, while other were more government-specific. For instance, Full Democracies emphasized ethical concerns as transparent algorithmic decision- making, facial recognition, deepfakes, and disinformation. Flawed Democracies, heavily influenced by U.S policy documents, showed a strong focus on military perspectives of AI. Hybrid regimes were found to prioritize foundational capacity building. Meanwhile, Authoritarian Regimes uniquely address social supervision and centralized regulatory mechanisms, framing AI development within a broader context of state control and oversight. The second sub-question asked whether different government regimes place special emphasis on distinct ethical considerations. We addressed this on a topic and government level by examining overlap scores with the ethical framework that we created. Both LDA and BERTopic models produced consistent results, with the top 3 and bottom 3 topics aligning with the same framework categories. While Flawed Democracy and Authoritarian Regime had more topics aligning with the two ethics topics (“Ethical Impact Assessment” and “Ethical Governance and Stewardship”), and Full Democracy and Hybrid Regime touched on other topics (e.g., “Development and International Cooperation”) among the ethics ones. Overall, all regimes more or less discussed ethics, though the nuances in emphasis were not enough to draw strong, regime-specific conclusions. 61 6. Conclusion To complement the two previous subsections and fully address the main research question regarding the differences between LDA and BERTopic, we employed OCTIS to quantitatively compare their topic modeling performance. The OCTIS results indicate that LDA outperforms BERTopic in two out of four evaluation metrics, suggesting it is more effective at generating coherently consistent topics. However, BERTopic demonstrated better topic diversity, which supports the qualitative obser- vations that its topics were more distinct and with less overlap. This again reinforces the finding that BERTopic produces themes that are easier for humans to interpret and connect. As a final conclusion of our findings, this study also aimed to reflect on topic modeling as a methodological tool in political science. While LDA performed better on traditional coherence metrics, our qualitative analysis found BERTopic’s output to be more meaningful to human interpretation. Despite the methodological differences, both models often identified similar topics and themes related to social science within the government types. The overlap suggests that, while the model selection impacts the interpretability of the results, the topics remain consistent, supporting the validity of the results related to the social science domain. This points to a broader issue within topic modeling: There remains a gap between how models are evaluated and how their outputs are used in practice, as standard metrics do not fully capture the value of coherence, clarity, or relevance in human-centered research contexts. 6.2 Limitations Even though we have successfully answered our primary questions, we encountered several limitations along the way. To begin with, data processing faced computational and time constraints. Finding, collecting, and processing data takes time, and many things need to be considered. For instance, the majority of documents had to be translated due to multilingualism. Google Translate API was used since more precise and accurate translation models are computationally expensive. However, this library struggles with input containing more than one language, requiring manual and thorough inspection. Additionally, introducing automated translation also risked altering or rephrasing the original ideas of the corpus. Moreover, a deeper understanding of site-protection scraping mechanisms could have increased the size and quality of the scraped dataset. Even though we tried considering several edge cases to overcome the website bot protection, a deeper inspection of how certain websites handle automatic access (i.e., by code and not manual human access) could be explored. In addition to the latter two points, while most of the topics are coherent and interpretable, the data could have benefited from a more thorough cleaning process. While interpreting the topics, the noisy text from PDF scraping and translation was visible, limiting the full understanding of those documents. Furthermore, some country documents were notably larger than others. In particular, 62 6. Conclusion the U.S. in the Flawed Democracy contained several times more tokens than the next largest document within that regime. Although the document contributed valuable insights to our analysis and discussion, the output topics of both models were heavily influenced by these large documents, potentially underrepresenting other countries. Finally, time constraints limited both the depth of qualitative analysis and the extent of model optimization. In particular, a more thorough hyperparameter search and fine-tuning process could have led to more robust topic-modeling results. This, in turn, would have supported a more nuanced interpretation of thematic patterns across regime types. 6.3 Further Research Further research may be conducted to address the limitations discussed in the previous section and explore other potential fields within AI policies across different government types. Future work could consider the following areas: • Exploration of Different Models: Only two topic models were explored in this research. Initially, we had the goal to include the third topic model, LEGAL-BERT, to measure whether the BERT model fine-tuned on legal data would outperform the other classic topic models. Additionally, a hybrid of BERT and LDA models could be implemented to incorporate strengths from both models. Moreover, the BERT model has a limitation of input not exceeding 512 tokens. Therefore, it might be of interest to explore other potential models for this task. • Parameter Selection: Additional attention could be paid to the hyperparam- eter selection, as it is as important as selecting the right model. Alternative methods could include relying only on quantitative metrics (e.g., coherence score) or changing the number of output topics, • Compare Policies across Different Countries or Over Time: A potential area to explore could include a more in-depth comparison between the policies themselves. For instance, one could see whether a country’s geographical position has an impact on its policies. In particular, compare regions such as China, the European Union, and the U.S. and see whether there are distinct differences. Moreover, focusing on a certain country over time could reveal how the policies change over time with different administrations in charge and what they prioritize. 63 6. Conclusion 64 Bibliography [1] A. Yoder, M. Hickok, G. S. Thompson, and K. Caunes, Artificial Intelli- gence and Democratic Values 2025, Volume I, M. Rotenberg, Ed. Washington, D.C.: Center for AI and Digital Policy, 2025, isbn: 979-8218669669. [Online]. Available: https://www.caidp.org/reports/aidv-2025/. [2] OECD.AI, powered by EC/OECD, Database of national ai policies, https: //oecd.ai, Accessed: 2025-05-24, 2021. [3] Economist Intelligence Unit, “Democracy index 2023: Age of conflict,” Feb. 2024, Accessed: 2024-12-03. [Online]. Available: https://www.economistgroup.com/ press-centre/economist-intelligence/eius-2023-democracy-index- conflict-and-polarisation-drive-a-new-low-for. [4] K. Chowdhary and K. Chowdhary, “Natural language processing,” Fundamen- tals of artificial intelligence, pp. 603–649, 2020. [5] D. Khurana, A. Koli, K. Khatter, and S. Singh, “Natural language process- ing: State of the art, current trends and challenges,” Multimedia tools and applications, vol. 82, no. 3, pp. 3713–3744, 2023. [6] J. Eisenstein, Introduction to natural language processing. MIT press, 2019. [7] A. Vaswani, N. Shazeer, N. Parmar, et al., “Attention is all you need,” in Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, et al., Eds., vol. 30, Curran Associates, Inc., 2017. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2017/file/ 3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf. [8] D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent dirichlet allocation,” Journal of machine Learning research, vol. 3, no. Jan, pp. 993–1022, 2003. [9] M. Hoffman, F. Bach, and D. Blei, “Online learning for latent dirichlet alloca- tion,” advances in neural information processing systems, vol. 23, 2010. [10] M. Grootendorst, “Bertopic: Neural topic modeling with a class-based tf-idf procedure,” arXiv preprint arXiv:2203.05794, 2022. [11] L. Gan, T. Yang, Y. Huang, et al., “Experimental comparison of three topic modeling methods with lda, top2vec and bertopic,” in Artificial Intelligence and Robotics, H. Lu and J. Cai, Eds., Singapore: Springer Nature Singapore, 2024, pp. 376–391. [12] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” in Proceed- ings of the 2019 Conference of the North American Chapter of the Associa- tion for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), J. Burstein, C. Doran, and T. Solorio, Eds., Min- 65 Bibliography neapolis, Minnesota: Association for Computational Linguistics, Jun. 2019, pp. 4171–4186. doi: 10.18653/v1/N19- 1423. [Online]. Available: https: //aclanthology.org/N19-1423/. [13] A. Radford, K. Narasimhan, T. Salimans, I. Sutskever, et al., “Improving language understanding by generative pre-training,” 2018. [Online]. Available: https://cdn.openai.com/research-covers/language-unsupervised/ language_understanding_paper.pdf. [14] E. Voita, D. Talbot, F. Moiseev, R. Sennrich, and I. Titov, “Analyzing multi- head self-attention: Specialized heads do the heavy lifting, the rest can be pruned,” in Proceedings of the 57th Annual Meeting of the Association for Com- putational Linguistics, A. Korhonen, D. Traum, and L. Màrquez, Eds., Florence, Italy: Association for Computational Linguistics, Jul. 2019, pp. 5797–5808. doi: 10.18653/v1/P19-1580. [Online]. Available: https://aclanthology. org/P19-1580/. [15] N. Patwardhan, S. Marrone, and C. Sansone, “Transformers in the real world: A survey on nlp applications,” Information, vol. 14, no. 4, p. 242, 2023. doi: 10.3390/info14040242. [Online]. Available: https://doi.org/10.3390/ info14040242. [16] T. Lin, Y. Wang, X. Liu, and X. Qiu, “A survey of transformers,” AI Open, vol. 3, pp. 111–132, 2022, issn: 2666-6510. doi: https://doi.org/10.1016/ j.aiopen.2022.10.001. [Online]. Available: https://www.sciencedirect. com/science/article/pii/S2666651022000146. [17] S. Islam, H. Elmekki, A. Elsebai, et al., “A comprehensive survey on applications of transformers for deep learning tasks,” Expert Systems with Applications, vol. 241, p. 122 666, 2024, issn: 0957-4174. doi: https://doi.org/10.1016/j. eswa.2023.122666. [Online]. Available: https://www.sciencedirect.com/ science/article/pii/S0957417423031688. [18] A. Reuter, A. Thielmann, C. Weisser, B. Säfken, and T. Kneib, “Probabilistic topic modeling with transformer representations,” IEEE Transactions on Neural Networks and Learning Systems, pp. 1–15, 2025. doi: 10.1109/TNNLS.2025. 3538262. [19] N. M. Gardazi, A. Daud, M. K. Malik, A. Bukhari, T. Alsahfi, and B. Alshe- maimri, “Bert applications in natural language processing: A review,” Artificial Intelligence Review, vol. 58, no. 6, p. 166, 2025. doi: 10.1007/s10462-025- 11162- 5. [Online]. Available: https://doi.org/10.1007/s10462- 025- 11162-5. [20] Y. Zhou and V. Srikumar, “A closer look at how fine-tuning changes BERT,” in Proceedings of the 60th Annual Meeting of the Association for Computa- tional Linguistics (Volume 1: Long Papers), S. Muresan, P. Nakov, and A. Villavicencio, Eds., Dublin, Ireland: Association for Computational Linguistics, May 2022, pp. 1046–1061. doi: 10.18653/v1/2022.acl-long.75. [Online]. Available: https://aclanthology.org/2022.acl-long.75/. [21] C. Qu, L. Yang, M. Qiu, W. B. Croft, Y. Zhang, and M. Iyyer, “Bert with history answer embedding for conversational question answering,” in Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, ser. SIGIR’19, Paris, France: Association for Computing 66 Bibliography Machinery, 2019, pp. 1133–1136, isbn: 9781450361729. doi: 10.1145/3331184. 3331341. [Online]. Available: https://doi.org/10.1145/3331184.3331341. [22] W. Zheng, S. Lu, Z. Cai, R. Wang, L. Wang, and L. Yin, “Pal-bert: An improved question answering model,” Computer Modeling in Engineering & Sciences, vol. 10, 2023. [23] Y. Yu, Y. Wang, J. Mu, et al., “Chinese mineral named entity recognition based on bert model,” Expert Systems with Applications, vol. 206, p. 117 727, 2022, issn: 0957-4174. doi: https://doi.org/10.1016/j.eswa.2022.117727. [Online]. Available: https://www.sciencedirect.com/science/article/ pii/S0957417422010090. [24] M. Mohseni and A. Tebbifakhr, “MorphoBERT: A Persian NER system with BERT and morphological analysis,” in Proceedings of the First International Workshop on NLP Solutions for Under Resourced Languages (NSURL 2019) co-located with ICNLSP 2019 - Short Papers, A. A. Freihat and M. Abbas, Eds., Trento, Italy: Association for Computational Linguistics, Nov. 2019, pp. 23–30. [Online]. Available: https://aclanthology.org/2019.nsurl-1.4/. [25] H. Darji, J. Mitrović, and M. Granitzer, “German bert model for legal named entity recognition,” in Proceedings of the 15th International Conference on Agents and Artificial Intelligence, SCITEPRESS - Science and Technology Publications, 2023, pp. 723–728. doi: 10.5220/0011749400003393. [Online]. Available: http://dx.doi.org/10.5220/0011749400003393. [26] Y. Sun, Y. Zheng, C. Hao, and H. Qiu, “NSP-BERT: A prompt-based few-shot learner through an original pre-training task —— next sentence prediction,” in Proceedings of the 29th International Conference on Computational Linguistics, N. Calzolari, C.-R. Huang, H. Kim, et al., Eds., Gyeongju, Republic of Korea: International Committee on Computational Linguistics, Oct. 2022, pp. 3233– 3250. [Online]. Available: https://aclanthology.org/2022.coling-1.286/. [27] Y. Levine, B. Lenz, O. Dagan, et al., “SenseBERT: Driving some sense into BERT,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, D. Jurafsky, J. Chai, N. Schluter, and J. Tetreault, Eds., Online: Association for Computational Linguistics, Jul. 2020, pp. 4656– 4667. doi: 10.18653/v1/2020.acl-main.423. [Online]. Available: https: //aclanthology.org/2020.acl-main.423/. [28] D. Song, S. Ma, Z. Sun, S. Yang, and L. Liao, “Kvl-bert: Knowledge enhanced visual-and-linguistic bert for visual commonsense reasoning,” Knowledge-Based Systems, vol. 230, p. 107 408, 2021, issn: 0950-7051. doi: https://doi. org/10.1016/j.knosys.2021.107408. [Online]. Available: https://www. sciencedirect.com/science/article/pii/S0950705121006705. [29] A. Chiche and B. Yitagesu, “Part of speech tagging: A systematic review of deep learning and machine learning approaches,” Journal of Big Data, vol. 9, no. 1, p. 10, 2022. [30] S. Pei, L. Wang, T. Shen, and Z. Ning, “Da-bert: Enhancing part-of-speech tagging of aspect sentiment analysis using bert,” in Advanced Parallel Processing Technologies, P.-C. Yew, P. Stenström, J. Wu, X. Gong, and T. Li, Eds., Cham: Springer International Publishing, 2019, pp. 86–95, isbn: 978-3-030-29611-7. 67 Bibliography [31] W. Liu, S. Lin, B. Gao, et al., “Bert-pos: Sentiment analysis of mooc reviews based on bert with part-of-speech information,” in Artificial Intelligence in Education. Posters and Late Breaking Results, Workshops and Tutorials, In- dustry and Innovation Tracks, Practitioners’ and Doctoral Consortium, M. M. Rodrigo, N. Matsuda, A. I. Cristea, and V. Dimitrova, Eds., Cham: Springer International Publishing, 2022, pp. 371–374, isbn: 978-3-031-11647-6. [32] R. Saidi, F. Jarray, and M. Mansour, “A bert based approach for arabic pos tagging,” in International Work-Conference on Artificial Neural Networks, Springer, 2021, pp. 311–321. [33] L. Bobojonova, A. Akhundjanova, P. S. Ostheimer, and S. Fellenz, “BBPOS: BERT-based part-of-speech tagging for Uzbek,” in Proceedings of the First Workshop on Language Models for Low-Resource Languages, H. Hettiarachchi, T. Ranasinghe, P. Rayson, et al., Eds., Abu Dhabi, United Arab Emirates: Association for Computational Linguistics, Jan. 2025, pp. 287–293. [Online]. Available: https://aclanthology.org/2025.loreslm-1.23/. [34] M. A. Cheragui, A. H. Dahou, and A. Abdedaiem, “Exploring bert models for part-of-speech tagging in the algerian dialect: A comprehensive study,” in Proceedings of the 6th International Conference on Natural Language and Speech Processing (ICNLSP 2023), 2023, pp. 140–150. [35] N. Reimers and I. Gurevych, “Sentence-bert: Sentence embeddings using siamese bert-networks,” in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing, Association for Computational Lin- guistics, Nov. 2019. [Online]. Available: https://arxiv.org/abs/1908.10084. [36] N. Khodeir and F. Elghannam, “Efficient topic identification for urgent mooc forum posts using bertopic and traditional topic modeling techniques,” Ed- ucation and Information Technologies, vol. 30, pp. 5501–5527, 2025. doi: 10.1007/s10639- 024- 13003- 4. [Online]. Available: https://doi.org/ 10.1007/s10639-024-13003-4. [37] M. Grootendorst. “Bertopic - the algorithm.” Accessed: 2025-04-22. (2024), [Online]. Available: https://maartengr.github.io/BERTopic/algorithm/ algorithm.html. [38] S. P. Crain, K. Zhou, S.-H. Yang, and H. Zha, “Dimensionality reduction and topic modeling: From latent semantic indexing to latent dirichlet allocation and beyond,” in Mining Text Data, C. C. Aggarwal and C. Zhai, Eds. Boston, MA: Springer US, 2012, pp. 129–161, isbn: 978-1-4614-3223-4. doi: 10.1007/978- 1-4614-3223-4_5. [Online]. Available: https://doi.org/10.1007/978-1- 4614-3223-4_5. [39] M. Allaoui, M. L. Kherfi, and A. Cheriet, “Considerably improving clustering algorithms using umap dimensionality reduction technique: A comparative study,” in Jul. 2020, pp. 317–325, isbn: 978-3-030-51934-6. doi: 10.1007/978- 3-030-51935-3_34. [40] L. McInnes, J. Healy, S. Astels, et al., “Hdbscan: Hierarchical density based clustering.,” J. Open Source Softw., vol. 2, no. 11, p. 205, 2017. [41] L. McInnes, J. Healy, and S. Astels, Hdbscan documentation: Parameter se- lection, Accessed: 2025-04-21, 2016. [Online]. Available: https://hdbscan. readthedocs.io/en/latest/parameter_selection.html. 68 Bibliography [42] S.-l. developers, Feature extraction, Accessed: 2025-04-21, 2025. [Online]. Avail- able: https://scikit-learn.org/stable/modules/feature_extraction. html#text-feature-extraction. [43] S.-l. developers, Tfidftransformer, Accessed: 2025-04-21, 2025. [Online]. Avail- able: https://scikit-learn.org/stable/modules/generated/sklearn. feature_extraction.text.TfidfTransformer.html#sklearn.feature_ extraction.text.TfidfTransformer. [44] M. Grootendorst. “Bertopic - fine-tune topic representation.” Accessed: 2025- 04-21. (2024), [Online]. Available: https://maartengr.github.io/BERTopic/ api/representations.html. [45] M. Grootendorst. “Bertopic - representation models.” Accessed: 2025-04-21. (2024), [Online]. Available: https : / / maartengr . github . io / BERTopic / getting_started/representation/representation.html. [46] M. Röder, A. Both, and A. Hinneburg, “Exploring the space of topic coherence measures,” in Proceedings of the eighth ACM international conference on Web search and data mining, 2015, pp. 399–408. [47] J. H. Lau, D. Newman, and T. Baldwin, “Machine reading tea leaves: Automat- ically evaluating topic coherence and topic model quality,” in Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, 2014, pp. 530–539. [48] J. Chang, S. Gerrish, C. Wang, J. Boyd-Graber, and D. Blei, “Reading tea leaves: How humans interpret topic models,” Advances in neural information processing systems, vol. 22, 2009. [49] S. Terragni, E. Fersini, B. G. Galuzzi, P. Tropeano, and A. Candelieri, “Octis: Comparing and optimizing topic models is simple!” In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, 2021, pp. 263–270. [50] W. Webber, A. Moffat, and J. Zobel, “A similarity measure for indefinite rankings,” ACM Transactions on Information Systems (TOIS), vol. 28, no. 4, pp. 1–38, 2010. [51] K. Manheim and L. Kaplan, “Artificial intelligence: Risks to privacy and democracy,” Yale JL & Tech., vol. 21, p. 106, 2019. [52] K. Crawford, The atlas of AI: Power, politics, and the planetary costs of artificial intelligence. Yale University Press, 2021. [53] M. Veale and I. Brass, “Administration by algorithm? public management meets public sector machine learning,” in Oxford University Press, 2019. [54] UNESCO, Recommendation on the ethics of artificial intelligence, Programme and Meeting Document, 2022. [Online]. Available: https://unesdoc.unesco. org/ark:/48223/pf0000381137. [55] B. Wagner, “Ethics as an escape from regulation. from “ethics-washing” to ethics-shopping?,” 2018. [56] L. Richardson, Beautiful soup documentation, Online, Accessed: March 5, 2025, 2023. [Online]. Available: https : / / www . crummy . com / software / BeautifulSoup/bs4/doc/. [57] VeNoMouS, Cloudscraper github repository, GitHub repository, Accessed: March 5, 2025, 2023. [Online]. Available: https://github.com/VeNoMouS/cloudscraper. 69 Bibliography [58] SeleniumHQ, Selenium documentation, Online, Accessed: March 5, 2025, 2023. [Online]. Available: https://www.selenium.dev/documentation/. [59] py-pdf, Pypdf2 documentation, Online, Accessed: March 5, 2025, 2023. [Online]. Available: https://pypdf2.readthedocs.io/en/3.x/. [60] Python Software Foundation, Python io.bytesio documentation, Online, Ac- cessed: March 5, 2025, 2023. [Online]. Available: https://docs.python.org/ 3/library/io.html#io.BytesIO. [61] JaidedAI, Easyocr: Ready-to-use ocr with 80+ languages supported, https: //github.com/JaidedAI/EasyOCR, Accessed: 2025-05-01, 2020. [62] dwyl, English words github repository, GitHub repository, Accessed: March 5, 2025, 2023. [Online]. Available: https://github.com/dwyl/english-words. [63] R. Řehřek and P. Sojka, “Software framework for topic modelling with large corpora,” 2010. [64] N. Reimers and I. Gurevych, “Sentence-BERT: Sentence embeddings using Siamese BERT-networks,” in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Con- ference on Natural Language Processing (EMNLP-IJCNLP), K. Inui, J. Jiang, V. Ng, and X. Wan, Eds., Hong Kong, China: Association for Computational Linguistics, Nov. 2019, pp. 3982–3992. doi: 10.18653/v1/D19-1410. [Online]. Available: https://aclanthology.org/D19-1410/. 70 A Appendix 1 Framework Overview The following framework outlines key dimensions for analyzing ethical and societal impacts of AI systems. Each category includes relevant focus areas and guiding principles for analysis. Category Focus Areas and Considerations Ethical Impact Assessment ethical review, impact assessment, risk prevention, hu- man rights impact, fundamental freedom, due diligence, oversight mechanisms, impact evaluation, socioeconomic assessment, digital divide, transparency protocols, access to information, decision-making autonomy, regulatory framework, auditability, traceability, explainability, in- clusion, public authorities, citizen participation Ethical Governance and AI governance, inclusive, transparent, multidisciplinary, Stewardship human rights law obligation, remediation mechanisms, enforcement mechanisms, accountability, responsibility, liability frameworks, auditability, system robustness, safety and security risks, explainability, inclusive de- velopment, innovation, SMEs, civil society organizations, fundamental freedoms, cultural and social diversities, dis- information, misinformation, algorithmic stereotyping, access to AI, freedom of expression, policy prototypes, strategic research, global collaboration, public oversight Data Policy data governance, privacy by design, privacy impact as- sessments, right to privacy, data security, personal and sensitive data, data quality, gold standard datasets, an- notating, disaggregated data, surveillance concerns, data protection legislation, transparency mechanisms, fair data sharing, consent, data trust, open data, interoper- ability, cross-border data flow, responsible AI develop- ment I A. Appendix 1 Category Focus Areas and Considerations Development and Interna- AI ethics, ethical frameworks, international collaboration, tional Cooperation platforms for cooperation, AI for development, education, science, healthcare, agriculture, environment, natural resources, infrastructure, economy, Global AI research, data sharing, geo-technical divide, international law, tech exchange, funding, policy consulting Environment and Ecosys- environmental impact assessments, AI system lifecycle, tems carbon footprint, energy consumption, raw material ex- traction, sustainability, ecosystem monitoring, disaster resilience, circular economy, sustainable finance, climate mitigation, pollution detection and prevention, energy, resource-efficient AI, safeguards and justification for AI use Gender gender equality, AI system lifecycle, transversal gen- der perspective, gender action plans, dedicated public funds, digital gender gaps, STEM education for girls and women, career development, online violence preven- tion, AI bias and stereotyping, economic incentives, best practice transfer Culture cultural heritage preservation, accessibility, endangered languages, indigenous languages, cultural programs, AI’s cultural impact, automated translation, language reduc- tion, promoting diversity in algorithms, local content, visibility, AI, arts, IP rights Education and Research AI literacy, public education, digital divides, critical thinking, media literacy, ethics in AI curricula, children’s rights, gender inclusion, accessibility for disabilities and minorities, ethical design, interdisciplinary research, AI risks and limitations, AI in policy and academia Communication and Infor- access to knowledge, freedom of expression, information mation disclosure, automated content, communication regula- tion, diverse viewpoints, disinformation, misinformation, journalism, transparency, media recommendations Economy and Labour labor markets, skill requirements, reskilling, job transi- tions, AI unemployment protection, social protection, fair competition, monopoly prevention, market exploita- tion, compliance, trade, labour-intensive sector support Health and Social Well- healthcare, mental health, physical health, disease miti- Being gation, privacy, informed consent, human oversight, di- agnostics, treatment, AI safety, validation, psychological impact, youth, social isolation, addiction, elderly care, disability support, human dignity in health-AI interac- tions Table A.1 II A. Appendix 1 Data Collection and Processing Table A.2 provides an overview of the status codes obtained by the URL addresses in the initial dataset. The links were split into two categories – Working Links and Broken Links – where the Working Links were selected for further data extraction and processing. Status Code Count Working Links 200 461 202 4 Total 765 Broken Links 400 2 403 95 404 76 471 1 520 1 Exception 105 Total 280 Overall Total 1045 Table A.2: “Public access URL” Status Codes and Counts. Table A.3 includes information for each government type. Particularly, it shows what countries belong to a specific regime, the number of working links per country, and total tokens obtained from those scraped websites. Government Type Country Working URLs Total Tokens Australia 13 11052 Austria 9 15973 Canada 17 16628 Costa Rica 1 1152 Denmark 8 4228 Finland 7 10893 France 40 24490 Germany 39 76876 Greece 1 34 Iceland 0 - Ireland 4 2869 Full Democracies Japan 19 23807 Korea 4 1456 Luxembourg 8 5570 Mauritius 1 442 Continued on next page III A. Appendix 1 Government Type Country Working URLs Total Tokens Netherlands 13 62478 New Zealand 9 22345 Norway 21 38373 Spain 25 331675 Sweden 10 9688 Switzerland 5 2327 United Kingdom 49 360136 Uruguay 4 26297 Argentina 4 2000 Belgium 19 4703 Brazil 10 34219 Bulgaria 2 575 Chile 6 22094 Colombia 29 141279 Cyprus 1 892 Czechia 14 5878 Estonia 13 51812 Hungary 4 1723 India 26 168613 Indonesia 1 346 Israel 12 79732 Flawed Democracies Italy 12 26522 Latvia 4 49817 Lithuania 3 2291 Malta 7 10025 Poland 5 72125 Portugal 20 28252 Romania 1 7 Serbia 22 35158 Singapore 26 16317 Slovak Republic 0 - Slovenia 6 1863 South Africa 4 5026 Thailand 5 8213 United States 73 1107352 Armenia 4 2624 Kenya 1 2572 Mexico 11 1856 Morocco 0 - Hybrid Regimes Nigeria 3 2209Peru 11 3129 Tunisia 3 511 Continued on next page IV A. Appendix 1 Government Type Country Working URLs Total Tokens Türkiye 25 63371 Uganda 1 1182 Ukraine 1 1288 China 20 22248 Egypt 7 28497 Kazakhstan 6 17904 Russian Federation 6 3565 Authoritarian Regimes Rwanda 4 2065 Saudi Arabia 4 1783 United Arab Emirates 6 12072 Uzbekistan 5 5431 Viet Nam 7 12992 Table A.3: Detailed Metrics by Government Type and Country. V A. Appendix 1 LDA Hyperparameters The following table shows the 5 (and 6 for Flawed Democracy) sets of hyperparameters for each government type that resulted in the highest coherence and lowest perplexity scores. Government Type num_topics passes alpha eta Coherence Perplexity 10 20 0.01 0.05 0.4395 -7.6850 10 30 auto 0.05 0.4409 -7.6617 Full Democracy 10 30 symmetric auto 0.4400 -7.5846 10 30 auto auto 0.4400 -7.5846 10 30 auto symmetric 0.4400 -7.6171 5 30 auto 0.05 0.4484 -7.8093 5 30 0.01 0.05 0.4282 -7.8090 Flawed Democracy 5 30 asymmetric 0.05 0.4491 -7.8087 5 20 auto 0.05 0.4491 -7.8087 5 20 0.01 0.05 0.4491 -7.8092 10 30 auto 0.01 0.3671 -8.5689 15 20 asymmetric auto 0.4686 -7.3301 15 20 asymmetric symmetric 0.4686 -7.3667 Hybrid Regime 15 20 asymmetric 0.01 0.4650 -7.9421 15 30 asymmetric symmetric 0.4686 -7.3577 15 30 asymmetric auto 0.4684 -7.3248 15 20 0.01 0.01 0.4507 -7.8995 5 30 auto auto 0.4416 -7.4900 Authoritarian Regime 5 30 0.01 auto 0.4416 -7.4952 5 30 0.01 symmetric 0.4416 -7.5757 5 30 symmetric auto 0.4416 -7.4998 Table A.4: Grid Search Results for LDA Hyperparameters Across Government Types. VI A. Appendix 1 BERTopic Hyperparameters Table A.5 shows the 5 and 10 sets of hyperparameters for each government type that resulted in the highest coherence scores. For Hybrid Regime, the hyperparameter grid search range was increased due to the limited number of output topics. Government Type nr_topics n_neighbors cluster_size Coherence Score 4 10 10 0.5349 4 15 5 0.5349 Full Democracy 4 15 10 0.5349 16 5 5 0.5219 10 5 10 0.5150 16 15 10 0.5449 6 15 5 0.5434 Flawed Democracy 14 15 10 0.5386 16 10 15 0.5298 8 10 10 0.5251 16 20 10 0.7126 16 15 10 0.6760 16 2 15 0.6429 16 5 20 0.5942 Hybrid Regime 16 5 5 0.590016 5 10 0.5815 16 20 5 0.5664 16 5 15 0.5444 16 20 2 0.5091 16 10 2 0.4948 6 10 5 0.5539 8 10 5 0.5473 Authoritarian Regime 10 10 5 0.5406 14 10 5 0.5406 16 10 5 0.5406 Table A.5: Top 5 (and Top 10 for Hybrid Regime) Hyperparameter Sets for Each Government Type. Table A.6 shows the coherence scores obtained by BERTopic for each government type. These scores were measured for the model with and without a representation model (POS Tagging). VII A. Appendix 1 Government Type Coherence Without POS Coherence With POS Full Democracy 0.7548 0.5349 Flawed Democracy 0.9139 0.5449 Hybrid Regime 0.7107 0.7126 Authoritarian Regime 0.7225 0.5539 Table A.6: Coherence Scores per Government Type: With and Without POS Tagging. VIII A. Appendix 1 Framework Overlap Evaluation Tables A.7 and A.8 provide the output of topic overlap from both – LDA and BERTopic – models with the framework in Table A.1. The Framework Topics with Raw Scores column provides the framework categories that the specific topic overlapped with, along with the overlap score. The higher the number, the stronger the overlap. An empty row suggests that no topic keyword matched any framework category. Topic ID Framework Topics with Raw Scores LDA Model - Full Democracy Topic 0 Communication and Information (0.50), Ethical Governance and Stewardship (0.50) Topic 1 Development and International Cooperation (1.00), Ethical Governance and Stewardship (0.33) Topic 2 - Topic 3 Development and International Cooperation (0.50) Topic 4 Gender (0.50), Culture (0.33) Topic 5 Communication and Information (0.50), Ethical Impact Assessment (0.50), Data Policy (0.33) Topic 6 Development and International Cooperation (2.00), Culture (1.00), Environ- ment and Ecosystems (1.00) Topic 7 Ethical Governance and Stewardship (1.00), Gender (0.50), Health and Social Well-Being: (0.50) Topic 8 Health and Social Well-Being (1.50), Development and International Coopera- tion (1.00) Topic 9 Culture (0.50), Data Policy (0.33) LDA Model - Flawed Democracy Topic 0 Culture (0.50) Topic 1 Development and International Cooperation (1.00), Economy and Labour (0.75), Ethical Impact Assessment (0.50) Topic 2 Ethical Impact Assessment (1.50), Health and Social Well-Being (0.50) Topic 3 Health and Social Well-Being (0.50) Topic 4 Ethical Impact Assessment (1.33) Topic 5 Development and International Cooperation (1.00), Ethical Governance and Stewardship (1.00), Ethical Impact Assessment (0.50), Economy and Labour (0.25) Topic 6 Economy and Labour (0.50), Ethical Governance and Stewardship (0.33) Topic 7 Data Policy (0.33) Topic 8 Ethical Governance and Stewardship (1.50), Health and Social Well-Being (0.50), Data Policy (0.33) Topic 9 - LDA Model - Hybrid Regime Topic 0 Development and International Cooperation (1.00), Ethical Governance and Stewardship (0.33), Gender (0.33) Topic 1 Ethical Governance and Stewardship (1.00) IX A. Appendix 1 Continued from previous page Topic ID Framework Topics with Raw Scores Topic 2 Data Policy (0.33) Topic 3 Ethical Governance and Stewardship (2.00), Ethical Impact Assessment (1.00) Topic 4 Development and International Cooperation (1.00), Economy and Labour (0.50) Topic 5 Ethical Governance and Stewardship (1.50), Ethical Impact Assessment (1.00), Development and International Cooperation (1.00), Data Policy (0.50), Health and Social Well-Being (0.50) Topic 6 Data Policy (0.50), Development and International Cooperation (0.50), Ethical Governance and Stewardship (0.50) Topic 7 Gender (0.5) Topic 8 - Topic 9 Gender (0.33) Topic 10 Development and International Cooperation (1.00), Culture (0.50) Topic 11 Ethical Governance and Stewardship (0.33) Topic 12 Development and International Cooperation (1.00), Communication and Infor- mation (0.50), Culture (0.50), Environment and Ecosystems (0.50) Topic 13 Ethical Governance and Stewardship (0.50) Topic 14 - LDA Model - Authoritarian Regime Topic 0 Data Policy (0.83), Economy and Labour (0.50) Topic 1 Ethical Governance and Stewardship (1.50), Data Policy (1.00), Economy and Labour (0.50), Ethical Impact Assessment (0.50) Topic 2 - Topic 3 Development and International Cooperation (1.00), Environment and Ecosys- tems (0.33) Topic 4 Communication and Information (0.50), Data Policy (0.50), Ethical Governance and Stewardship (0.33) Topic 5 - Topic 6 Development and International Cooperation (1.00), Ethical Governance and Stewardship (1.00), Data Policy (0.67), Health and Social Well-Being (0.50) Topic 7 Ethical Governance and Stewardship (1.00) Topic 8 Communication and Information (0.50) Topic 9 Data Policy (0.50), Ethical Impact Assessment (0.50) Topic 10 Development and International Cooperation (1.00), Communication and Infor- mation (0.50) Topic 11 Data Policy (0.50), Development and International Cooperation (0.50) Topic 12 Ethical Governance and Stewardship (1.50), Health and Social Well-Being (1.00), Ethical Impact Assessment (0.50) Topic 13 - Topic 14 Health and Social Well-Being (1.00), Gender (0.33) Table A.7: LDA: Raw scores for Topics by Government Type. X A. Appendix 1 Topic ID Framework Topics with Raw Scores BERTopic Model - Full Democracy Topic 0 Ethical Impact Assessment (1.00), Development and International Cooperation (0.50), Economy and Labour (0.50), Ethical Governance and Stewardship (0.50), Gender (0.50), Health and Social Well-Being (0.50) Topic 1 Ethical Impact Assessment (1.67), Data Policy (1.00), Culture (0.50), Ethical Governance and Stewardship (0.50), Health and Social Well-Being (0.50) Topic 2 Ethical Impact Assessment (1.00), Economy and Labour (0.50), Ethical Gover- nance and Stewardship (0.50), Health and Social Well-Being (0.50) Topic 3 Gender (0.50), Development and International Cooperation (0.33) Topic 4 Culture (0.50), Ethical Impact Assessment (0.50) Topic 5 Data Policy (1.00), Development and International Cooperation (0.50), Ethical Governance and Stewardship (0.50) Topic 6 Ethical Governance and Stewardship (0.83) Topic 7 Development and International Cooperation (2.50) Topic 8 Ethical Governance and Stewardship (1.00), Development and International Cooperation (0.50) Topic 9 Ethical Impact Assessment (0.50), Data Policy (0.33), Environment and Ecosys- tems (0.25) BERTopic Model - Flawed Democracy Topic 0 - Topic 1 Development and International Cooperation (1.00), Ethical Governance and Stewardship (1.00), Health and Social Well-Being (0.50) Topic 2 Ethical Impact Assessment (2.00) Topic 3 Ethical Governance and Stewardship (2.00), Health and Social Well-Being: (0.50), Ethical Impact Assessment (0.50), Data Policy (0.33), Development and International Cooperation (0.33) Topic 4 Ethical Impact Assessment (1.33), Communication and Information (0.50), Health and Social Well-Being (0.50) Topic 5 Ethical Impact Assessment (1.50), Ethical Governance and Stewardship (1.00), Gender (0.67) Topic 6 Development and International Cooperation (1.00), Ethical Governance and Stewardship (1.00) Topic 7 Ethical Impact Assessment (1.00), Ethical Governance and Stewardship (0.50), Gender (0.50) Topic 8 Health and Social Well-Being (1.00) Topic 9 Economy and Labour (2.50), Development and International Cooperation (1.00) BERTopic Model - Hybrid Regime Topic 0 Ethical Impact Assessment (1.17), Gender (0.67), Development and Interna- tional Cooperation (0.50), Economy and Labour (0.50), Health and Social Well-Being (0.50) Topic 1 Ethical Impact Assessment (1.50), Ethical Governance and Stewardship (1.00), Education and Research (1.00), Culture (0.50), Development and International Cooperation (0.33) Topic 2 Ethical Governance and Stewardship (1.00), Environment and Ecosystems (0.50), Health and Social Well-Being (0.50) XI A. Appendix 1 Continued from previous page Topic ID Framework Topics with Raw Scores Topic 3 Data Policy (2.50), Development and International Cooperation (1.00) Topic 4 Data Policy (1.00), Development and International Cooperation (1.00), Econ- omy and Labour (1.00), Ethical Impact Assessment (0.67) Topic 5 Ethical Impact Assessment (1.17), Development and International Cooperation (1.00), Culture (0.50), Health and Social Well-Being (0.50) Topic 6 Communication and Information (0.50), Health and Social Well-Being (0.50), Development and International Cooperation (0.33) Topic 7 Data Policy (1.00), Ethical Impact Assessment (0.50) Topic 8 Ethical Impact Assessment (1.50), Development and International Cooperation (1.00) Topic 9 Ethical Governance and Stewardship (2.00) BERTopic Model - Authoritarian Regime Topic 0 Ethical Governance and Stewardship (1.50), Data Policy (0.50) Topic 1 Ethical Governance and Stewardship (3.17), Economy and Labour (1.00), Ethical Impact Assessment (0.50) Topic 2 Data Policy (2.17), Economy and Labour (2.00), Health and Social Well-Being (1.00), Ethical Impact Assessment (0.50) Topic 3 Development and International Cooperation (0.50), Economy and Labour (0.50), Ethical Impact Assessment (0.50), Gender (0.50), Health and Social Well-Being (0.50) Topic 4 Data Policy (0.50), Development and International Cooperation (0.50), Ethical Governance and Stewardship (0.50), Gender (0.50) Topic 5 Development and International Cooperation (1.33), Ethical Impact Assessment (0.50), Environment and Ecosystems (0.33) Topic 6 Ethical Impact Assessment (2.00), Development and International Cooperation (0.50) Topic 7 Data Policy (1.67), Ethical Impact Assessment (1.50), Culture (0.50), Ethical Governance and Stewardship (0.50) Topic 8 Development and International Cooperation (1.33), Environment and Ecosys- tems (0.83), Culture (0.50), Ethical Governance and Stewardship (0.33) Table A.8: BERTopic: Raw scores for Topics by Government Type. XII A. Appendix 1 Overlap Score Distribution Figure A.1 includes the distribution of all raw overlap scores with the framework obtained by each topic. The x-axis of the plot indicates: • Score < 1: A fraction of a keyword has an overlap with the framework. • Score = 1: One keyword from a topic has a perfect overlap with a framework category/A combination of two or more partial overlaps (e.g., two partial overlaps each with score 0.5). • Score > 1: More than one perfect overlap/A combination of several partial overlaps were found. LDA 50 BERTopic 40 30 20 10 0 0.25 0.33 0.5 0.67 0.75 0.83 1.0 1.17 1.33 1.5 1.67 2.0 2.17 2.5 3.17 Score Figure A.1: Distributions of Overlap Scores by Topic Model. XIII Frequency A. Appendix 1 OCTIS Comparison Figure A.2: Comparison of Topic Diversity by Model and Government Type. Figure A.3: Comparison of CV Coherence by Model and Government Type. XIV A. Appendix 1 Figure A.4: Comparison of IRBO by Model and Government Type. Figure A.5: Comparison of WECoherencePairwise by Model and Government Type. XV