×
Identifying Students at Risk From Online Clickstream Data using Machine Learning

Authors

Hadeel Alhabdan and Ala Alluhaidan, Princess Nourah bint Abdulrahman University, Saudi Arabia

Abstract

This study examines the use of four machine learning methods to identify students at risk from online clickstream data for 60 courses and the students' grades in these courses. To identify students at risk of failing, the study classified students with grades of “F” or “D” as at-risk, while students with grades of “A,” “B,” or “C” were classified as safe. Logistic regression, decision tree, neural networks and random forest models were used, with each model subjected to eight folds cross-validation. The decision tree model had the lowest performance across all four metrics, followed by the logistic regression model, while the neural network model showed marginally superior accuracy, sensitivity, and F1 score compared to the random forest model. The four machine learning models were found to be reliable in identifying at-risk students based on the provided online clickstream data.

Keywords

Decision tree, Logistic regression, Neural networks, Online clickstream data, Random Forest.