Classification of role stereotypes for classes in UML class diagrams using machine learning
Software development process is becoming inherently complex in recent decades. To reduce the complexity in the development process developers, software practitioners are constantly looking for newer approach. One approach can be understanding the software design for instance, the UML models earlier in the software development process. For analyzing UML models, one could use knowledge about role-stereotypes. Knowledge about role stereotypes can help during software quality assessment, for summarizing software and thereby to ease the understanding of software designs. This study presents a machine learning-based approach for classifying the role-stereotype of classes in UML class diagrams. We have established a ground truth by manually labelling 391+ classes from 15 open source projects (using various programming languages). We analyze the performance of the machine learning approach with the manually established ground truth. Besides, we show a comparison between our approach and another machine learning approach from an earlier case study which is based on source code. Furthermore, we compare different machine learning (ML) algorithms to find out the best ML algorithm for classifying our dataset. Another noteworthy contribution of this study is an analysis of which features are most relevant for classifying classes into role stereotype and which features generate the best classification performance. According to our findings, the J48 classifier performs best when classifying the raw dataset and the Random Forest classifier performs best on a more balanced dataset which has been obtained by applying SMOTE oversampling. By using our classifier software developers can analyze patterns in their software design at the early stage of software development process.