Evaluating Lexicon-Based Models versus BERT for Sentence-Level Sentiment Analysis in Swedish
Abstract
This thesis explores the development and evaluation of different approaches to sentiment
analysis for the Swedish language, focusing on sentence-level sentiment detection.
The study compares traditional rule- and lexicon-based models with modern
machine learning approaches, particularly the Bidirectional Encoder Representations
from Transformers (BERT), as well as a hybrid model combining the rule-based
model with Support Vector Machines SVM. Utilizing the Sparv pipeline for linguistic
analysis and breadkdown in tandem with the sentiment lexicon SenSALDO, we
aim to enhance the existing research on Swedish rule-based models by inclusion of
linguistic features. The research also involves expanding the lexicon with neutral,
positive and negative entries in order to improve coverage and accuracy of sentence
level sentiment analysis. The evaluation highlights the strengths and weaknesses of
each model where the BERT model was the best performing overall, especially for
neutral sentences, while the rule based and hybrid model were much better at positive
sentences, for negative sentiment detection the hybrid SVM model was the best
performing. Our thesis contributes to the ongoing discourse on effective sentiment
analysis in non-English languages and offers insights for further advancements in
natural language processing (NLP) for Swedish.
Keywords:
Degree
Student essay
Collections
View/ Open
Date
2024-10-16Author
Mansour, Ricardo
Nilsson, Erik
Keywords
Computer science
Sentiment analysis
BERT
Lexicon-based
Rule-based
Support Vector Machines (SVM)
Swedish language
Natural language processing (NLP)
Sparv
SALDO
SenSALDO
Machine learning