A Data Science Approach To Predictive Analytic Research On Preventing Student Dropout In Higher Education
Student dropout in higher education is a critical challenge with significant so-cial and economic implications. This study, conducted under the Design Sci-ence Research (DSR) paradigm, develops and validates a predictive artefact for the early identification of students at risk of dropping out. Using a public dataset with 37 socioeconomic and academic variables from 4,424 students, we implemented a Machine Learning (ML) pipeline that includes feature en-gineering and class imbalance correction using SMOTE-ENN, as well as a Stacking Ensemble model combining Random Forest, XGBoost, and SVM. The final artefact, demonstrated and evaluated, achieved an accuracy of 96.4% and an F1-Score of 0.964. Model interpretability was ensured through SHAP (SHapley Additive exPlanations), which enabled a transparent under-standing of the prediction results by assigning a relevance value to each fea-ture. This work proposes an effective and interpretable system to be used by higher education institutions to design targeted and personalised interventions for preventing student dropout.
