Automatic Scoring of Plots in Higher Data Science Education: Exploring the Potential and Challenges of Machine Learning in Scoring Student-generated Plots
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
This thesis explores the potential and challenges of using machine learning to score student-submitted plots in data science education, an area largely untouched in existing literature. As the number of students in data science courses grows, the demand for teacher assistant hours increases. This thesis investigates how machine learning could alleviate the work of teaching assistants by automating the scoring of student plots, thus saving time and costs.
A pipeline was developed to convert each plot into a machine learning compatible format using student submissions and reference solutions written in Python. Based on this format, relevant plot features were extracted and compared for similarity against reference solutions. The extracted features are based on the pre-defined scoring rubrics. These features were then used to train four machine learning models to identify the most accurate model for this task.
The results show the potential of the random forest regressor model and the broader feasibility of saving time and costs with fewer teaching assistant hours and automatic scoring of plots. But challenges are evident from the human-labelled dataset. Challenges such as unconscious bias, erroneous scores and inconsistent interpretations of rubrics prove difficult for the model to predict. The findings highlight a trade-off between staying consistent with human scoring and focusing on objective, automated evaluation, raising important questions for future development.