Code smells in machine learning pipelines: an MSR sample study
As technical debt in software engineering projects continues to negatively impact the development process, this study focuses on technical debt in form of code smells in machine learning pipelines and in code written by data scientists. This study contributes to the body of knowledge on technical debt as it tries to quantify the assumption in the literature that scientists without a software engineering background struggle with software engineering’s best practices when writing code. Furthermore, as machine learning continues to evolve in software engineering, it makes sense to minimize technical debt in machine learning pipelines. Therefore, the source code from repositories in the version control system GitHub was analyzed. The results show that indeed data scientists produce more code smells than soft ware engineers. In addition, the study fails to demonstrate that data pipelines yield more code smells than non-data pipelines.