Classification of role stereotypes for classes in UML class diagrams using machine learning
Abstract
Software development process is becoming inherently complex in recent decades.
To reduce the complexity in the development process developers, software practitioners are constantly looking for newer approach. One approach can be understanding the software design for instance, the UML models earlier in the software
development process. For analyzing UML models, one could use knowledge about
role-stereotypes. Knowledge about role stereotypes can help during software quality assessment, for summarizing software and thereby to ease the understanding of
software designs. This study presents a machine learning-based approach for classifying the role-stereotype of classes in UML class diagrams. We have established a
ground truth by manually labelling 391+ classes from 15 open source projects (using various programming languages). We analyze the performance of the machine
learning approach with the manually established ground truth. Besides, we show a
comparison between our approach and another machine learning approach from an
earlier case study which is based on source code. Furthermore, we compare different
machine learning (ML) algorithms to find out the best ML algorithm for classifying
our dataset. Another noteworthy contribution of this study is an analysis of which
features are most relevant for classifying classes into role stereotype and which features generate the best classification performance. According to our findings, the
J48 classifier performs best when classifying the raw dataset and the Random Forest classifier performs best on a more balanced dataset which has been obtained
by applying SMOTE oversampling. By using our classifier software developers can
analyze patterns in their software design at the early stage of software development
process.
Degree
Student essay
Collections
View/ Open
Date
2021-03-03Author
Ahmed, Jobaer
Huang, Maoyi
Keywords
role-stereotypes
machine learning algorithm
classification
data analysis
data mining
UML class diagram
software design
software engineering
Language
eng