• English
    • svenska
  • svenska 
    • English
    • svenska
  • Logga in
Redigera dokument 
  •   Startsida
  • Student essays / Studentuppsatser
  • Department of Philosophy,Lingustics and Theory of Science / Institutionen för filosofi, lingvistik och vetenskapsteori
  • Masteruppsatser / Master in Language Technology
  • Redigera dokument
  •   Startsida
  • Student essays / Studentuppsatser
  • Department of Philosophy,Lingustics and Theory of Science / Institutionen för filosofi, lingvistik och vetenskapsteori
  • Masteruppsatser / Master in Language Technology
  • Redigera dokument
JavaScript is disabled for your browser. Some features of this site may not work without it.

Don't Mention the Norm

Don't Mention the Norm

Sammanfattning
Reporting bias (the human tendency to not mention obvious or redundant information) and social bias (societal attitudes toward specific demographic groups) have both been shown to propagate from human text data to language models trained on such data (Shwartz and Choi, 2020; Paik et al., 2021; Caliskan, Bryson, and Narayanan, 2017; Garg et al., 2018). However, the two phenomena have not previously been studied in combination. This thesis aims to begin to fill this gap by studying the interaction between social biases and reporting bias in both human text and language models. We conduct a corpus study of human-written text, and find that n-gram frequencies in our chosen corpora show strong signs of reporting bias with regard to socially marked identities, mirroring current discourse in society. This thesis also introduces the MARB dataset for measuring model reporting bias with regard to socially marked attributes. We evaluate ten large pretrained language models on MARB and analyze the results in relation to both corpus frequencies and real-world frequencies. The results suggest a relationship between reporting bias and social bias in language models similar to that which was identified in human text. However, this relationship is not as straightforward in language models, and other factors, like sequence length and model vocabulary, are also observed to affect the outcome.
Examinationsnivå
Student essay
URL:
https://hdl.handle.net/2077/81766
Samlingar
  • Masteruppsatser / Master in Language Technology
Fil(er)
Master Thesis (1.105Mb)
Datum
2024-06-17
Författare
Södahl Bladsjö, Tom
Nyckelord
Language Technology
Språk
eng
Metadata
Visa fullständig post

DSpace software copyright © 2002-2016  DuraSpace
gup@ub.gu.se | Teknisk hjälp
Theme by 
Atmire NV
 

 

Visa

VisaSamlingarI datumordningFörfattareTitlarNyckelordDenna samlingI datumordningFörfattareTitlarNyckelord

Mitt konto

Logga inRegistrera dig

DSpace software copyright © 2002-2016  DuraSpace
gup@ub.gu.se | Teknisk hjälp
Theme by 
Atmire NV