Optimizing Tax Revenue Through Machine Learning-Driven Fraud Detection: An Explainable Approach For Cameroon
Effective tax administration is critical for mobilising domestic resources in developing countries. This paper presents an explainable machine learning (ML) system that proactively detects tax fraud in Cameroon, addressing critical gaps in tax revenue optimization. In Cameroon, the audit procedures in the Directorate General of Taxes (DGI) are inefficient—each investigation takes approximately 45 days and results in a 34% false-positiverate—recovering only an estimated 12% of fraudulent transactions. Although the system utilizes Excel files and tax databases, it lacks advanced intelligence. Our approach introduces a more intelligent and innovative solution. While the informal sector, which accounts for more than 80% of productive activity and contributes about 58% of gross domestic product (GDP) [9, 14, 15, 23, 27, 28], presents distinct challenges for tax collection, this work focuses on optimizing fraud detection in the formal tax sector. Using a dataset of 3.2 million transactions from fiscal, banking and customs sources, we trained supervised models including Extreme Gradient Boosting (XG-Boost), Random Forest (RF) and Support Vector Machines (SVM) and unsupervised models (Isolation Forest, autoencoders) with Bayesian optimisation. The pipeline processes over 10,000 transactions per second with latency below 100 ms and generates risk scores within two seconds. The best ensemble model reached a precision of 97.2%, re-call of 94.8% and F1-score of 96.0%. Explainability mechanisms based on Shapley Additive Explanations (SHAP) and Local Interpretable Model Agnostic Explanations (LIME) provide global and local explanations, increasing user acceptance by 67%. Pilot deployment across three tax centres reduced processing time by 67% and generated additional revenue. The results demonstrate that the proposed system significantly improves fraud detection and helps enhance fiscal transparency. This paper describes the data pipeline, model design, experimental results and the implications for public finance governance, while highlighting the need for future research focused on addressing fraud in the informal economy.
