IDENTIFYING HATE SPEECH IN SOCIAL MEDIA THROUGH CONTENT AND SOCIAL CONNECTIONS ANALYSIS
Abstract
Hate speech is a problem which puts its targets at risk of serious harm. It spreads fast and has a real
influence on the society because of the ubiquity of the internet and social media, and so various research
efforts have been put to find solutions to automatic hate speech detection. Despite major developments
in the field, challenges with data scarcity and characteristics often cause solutions reported in previous
research to overfit the datasets that were used to train and test them, which results in dramatic performance
losses and failures in generalization. This study addressed this issue, it tried to find a solution that would
mitigate overfitting effects originating from these issues and enhance language-based classifier with extra
user information concerning one’s social connections. It compared two single-source models – one based on
textual information, and the other based on information concerning one’s social connections and proposed
a joint decision engine that selects the model whose class assignment was more certain for a given instance.
Although the single-source models’ performance dropped drastically on test data, the joint decision engine
succeeded in reducing some of the issues related to overfitting, improving the overall performance. This
observation suggests that simple solutions might be efficient in reducing model overfit and paves the way
towards validating these findings.
Degree
Student essay
View/ Open
Date
2023-06-19Author
Stanišić, Milan
Keywords
hate speech, social media, natural language processing, classification
Language
eng