Hadeel Alhabdan and Ala Alluhaidan, Princess Nourah bint Abdulrahman University, Saudi Arabia
This study examines the use of four machine learning methods to identify students at risk from online clickstream data for 60 courses and the students' grades in these courses. To identify students at risk of failing, the study classified students with grades of “F” or “D” as at-risk, while students with grades of “A,” “B,” or “C” were classified as safe. Logistic regression, decision tree, neural networks and random forest models were used, with each model subjected to eight folds cross-validation. The decision tree model had the lowest performance across all four metrics, followed by the logistic regression model, while the neural network model showed marginally superior accuracy, sensitivity, and F1 score compared to the random forest model. The four machine learning models were found to be reliable in identifying at-risk students based on the provided online clickstream data.
Decision tree, Logistic regression, Neural networks, Online clickstream data, Random Forest.