• English
    • svenska
  • English 
    • English
    • svenska
  • Login
View Item 
  •   Home
  • Student essays / Studentuppsatser
  • Department of Computer Science and Engineering / Institutionen för data- och informationsteknik
  • Masteruppsatser
  • View Item
  •   Home
  • Student essays / Studentuppsatser
  • Department of Computer Science and Engineering / Institutionen för data- och informationsteknik
  • Masteruppsatser
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Using Language Models to evaluate annotation bias in the Manifesto Project Corpus

Abstract
Good data is crucial for good research. Many organisations collect and provide data for research, in hopes of furthering our understanding of society. One such organisation is the Manifesto Project which curates a dataset that collects election manifestos as both raw text and encoded data, detailing the exact topics mentioned in the manifesto. While widely used, it is also heavily criticized. In this thesis we aim to analyze whether coder bias is noticeably present in this corpus. We will focus on the bias that can arise because the coders are aware of the party a manifesto belongs to, which could allow preconceptions to influence the coding process. We utilize a cosine similarity analysis of sentence embeddings and fine-tuned RoBERTa models to analyze the data. The cosine similarity analysis shows widespread and general inconsistencies in the coding process as similar sentences are often coded differently. The RoBERTa models show that green parties are treated differently regarding the topic of environmental protection. For instance, their sentences about agriculture are overproportionally assigned the code environmental protection, while segments discussing sustainability are over-proportionally assigned the topic “anti-growth economy” when compared to non-green parties. The actual content of these segments is similar, but the coding seems to be influenced by the party affiliation. Overall, our results indicate the presence of a coder bias in the Manifesto Project data, which researchers and practitioners should be aware of when using this dataset in their work.
Degree
Student essay
URI
https://hdl.handle.net/2077/83668
Collections
  • Masteruppsatser
View/Open
CSE 24-13 LC KH.pdf (1.767Mb)
Date
2024-10-16
Author
Carlsson, Leo
Hanlon, Konstantin
Keywords
Coding bias
Transformers
Natural Language Processing
Metadata
Show full item record

DSpace software copyright © 2002-2016  DuraSpace
Contact Us | Send Feedback
Theme by 
Atmire NV
 

 

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

LoginRegister

DSpace software copyright © 2002-2016  DuraSpace
Contact Us | Send Feedback
Theme by 
Atmire NV