Research on data imbalance classification based on oversampling method

Konferenz: CAIBDA 2022 - 2nd International Conference on Artificial Intelligence, Big Data and Algorithms
17.06.2022 - 19.06.2022 in Nanjing, China

Tagungsband: CAIBDA 2022

Seiten: 4Sprache: EnglischTyp: PDF

Autoren:
Bai, Yuzhu; Feng, Haiwen (Shenyang University of Technology Tiexi District, Shenyang, Liaoning, China)
Yu, Wei (Neusoft Hunnan District, Shenyang, Liaoning, China)

Inhalt:
The problem of low recognition rate of minority class samples caused by the classification of unbalanced data widely exists in different fields. For this reason, this paper proposes an oversampling unbalanced data ensemble Random Forest classification algorithm. Perform feature scaling and feature selection on the sample data set to preprocess the original data set, combine the downsampling method and the oversampling method, and use the Logistic Regression and Random Forest classifiers for classification. The last two methods comparing. The experimental results show that the oversampling algorithm can improve the overall classification effect of the classifier.