Jiawei Zhang1, Xin Zhang2, Xinyin Miao 3, 1Senior Investment Analyst,USA, 2 Data Scientist,USA, 3Senior Data Analyst,USA
This paper provides an innovative methodology of partial penalty on machine learning models to handle the data imbalance scenario occurring in credit card fraud detection implementation. Unlike the normal over-sampling or under-sampling methodologies, partial penalty directs the machine learning model to focus on learning the minor class of target variable even when the class distribution is extremely imbalanced. Besides comparing the partial penalty approach with over-sampling and under-sampling approaches to handle data imbalance scenario, we’ve implemented this new approach under five machine learning classification models, including Logistic Regression, Random Forest, kNN, Decision Tree, and Light Gradient Boosting Model. The new partial penalty approach realizes a performance of 88.35% F1 score and 98.79% AUC score with Light GBM, higher than either over-sampling or under-sampling approaches in similar articles.
Partial Penalty, Gradient Boosting, Data Imbalance, Credit Card Fraud Detection, SMOTE