Predicting Depression Using Machine Learning Models With Class Imbalance Handling
Depression is a widespread and debilitating mental health condition affecting hundreds of millions worldwide. This study investigates the effectiveness of tree-based machine learning models—specifically XGBoost and LightGBM—in predicting depression based on psychometric data derived from AUDIT, PSS, UCLA and BIS subscales. To mitigate class imbalance, seven resampling techniques were systematically evaluated. Among these, XGBoost combined with Tomek links achieved the highest predictive performance, with an accuracy of 0.56 and an area under the curve (AUC) of 0.85. These results underscore the utility of self-reported psychological assessments in machine learning–driven mental health diagnostics and emphasize the critical role of data preprocessing in improving model performance and reliability.
