Automatic Scoring of Plots in Higher Data Science Education: Exploring the Potential and Challenges of Machine Learning in Scoring Student-generated Plots

Loading...
Thumbnail Image

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

This thesis explores the potential and challenges of using machine learning to score student-submitted plots in data science education, an area largely untouched in existing literature. As the number of students in data science courses grows, the demand for teacher assistant hours increases. This thesis investigates how machine learning could alleviate the work of teaching assistants by automating the scoring of student plots, thus saving time and costs.

A pipeline was developed to convert each plot into a machine learning compatible format using student submissions and reference solutions written in Python. Based on this format, relevant plot features were extracted and compared for similarity against reference solutions. The extracted features are based on the pre-defined scoring rubrics. These features were then used to train four machine learning models to identify the most accurate model for this task.

The results show the potential of the random forest regressor model and the broader feasibility of saving time and costs with fewer teaching assistant hours and automatic scoring of plots. But challenges are evident from the human-labelled dataset. Challenges such as unconscious bias, erroneous scores and inconsistent interpretations of rubrics prove difficult for the model to predict. The findings highlight a trade-off between staying consistent with human scoring and focusing on objective, automated evaluation, raising important questions for future development.

Description

Keywords

Charts, Data Transformation, Education, Ensemble, Machine Learning, Pipeline, Plots, Similarity

Citation

ISBN

Articles

Department

Defence location

Collections

Endorsement

Review

Supplemented By

Referenced By