Modelling rare events using non-parametric machine learning classiﬁers - Under what circumstances are support vector machines preferable to conventional parametric classiﬁers?

Ma, Lukas

Modellering av ”rare events” med hjälp av maskininlärningsmetoder -- under vilka omständigheter är det mer lämpligt att tillämpa SVM än de konventionella klassificeringsmetoderna?

Abstract

Rare event modelling is an important topic in quantitative social science research. However, despite the fact that traditional classiﬁers based upon general linear models (GLM) might lead to biased results, little attention in the social science community is devoted to methodological studies aimed at alleviating such bias, even fewer of them have considered the use of machine learning methods to tackle analytical problems imposed by rare events. In this thesis, I compared the classiﬁcation performance of the SVMs – a group of machine learning classiﬁcation algorithms – with that of the GLMs under the presence of imbalanced classes and rare events. The results of this study shows that the standard SVMs have no better classiﬁcation performance than the traditional GLMs. In addition, the standard SVMs also tend to have low sensitivity, rendering it inappropriate for rare event modelling. Although the cost-sensitive SVMs could lead to more rare events be identiﬁed, these methods tend to suﬀer from overﬁtting as the events become rarer. Finally, the results of the empirical analysis using the Military Interstate Dispute (MID) data imply that the probabilistic outputs produced by Platt scaling are not reliable. For the above reasons, a wider application of SVMs in rare event modelling is not supported by the results of this study.

Degree

Student essay

Date

2021-04-06

Author

Ma, Lukas

Series/Report no.

202104:61

Uppsats

Language

eng

Metadata

Show full item record