Customer churn prediction: balanced random forest with feature under-sampling

Conference: CAIBDA 2022 - 2nd International Conference on Artificial Intelligence, Big Data and Algorithms
06/17/2022 - 06/19/2022 at Nanjing, China

Proceedings: CAIBDA 2022

Pages: 7Language: englishTyp: PDF

Authors:
Ma, Wenbin; Xia, Guoen; Chen, Shuofeng; Li, Guoxiang (Guangxi University of Finance and Economics, Nanning, Guangxi, China)

Abstract:
There are two main problems with Customer Churn Prediction which are Class-Imbalanced and High-dimensional. The performance/ability of the model will decline when using data like this directly for the prediction. Balanced random forest integrating random under-sampling into the training process of random forest, which can effectively deal with the problem of data imbalance. However, the redundant and irrelevant features in the data will affect the accuracy of balanced random forest. Therefore, a balanced random forest algorithm with feature under-sampling strategy (FUS-BRF). is proposed. In this algorithm, samples are under-sampled first, then the feature groups are constructed by using L1 regularization logistic regression, and the grouped features are randomly under-sampled. Finally, a subset of samples with balanced categories, high feature quality and low dimension is constructed. The final results from the experiments on the four high-dimensional and class-imbalanced customer data sets present that AUC, Recall and Precision have been improved using FUS-BRF compared with using the method of Balanced Random Forest, and the prediction ability appears better when using FUS-BRF.