Optimization of Classification Results on Gene Expression Datasets Using Dimensionality Reduction

Conference: CAIBDA 2022 - 2nd International Conference on Artificial Intelligence, Big Data and Algorithms
06/17/2022 - 06/19/2022 at Nanjing, China

Proceedings: CAIBDA 2022

Pages: 11Language: englishTyp: PDF

Authors:
Sun, Yichen (College of Software Nankai University Tianjin China)
Zhang, Fanyu (College of Artificial Intelligence and Data Science Hebei University of Technology Tianjin, China)

Abstract:
Datasets from Bioinformatics often have problems with high latitudes, small samples, and nonlinearity. For this reason, the datasets easy to create a dimensional curse. The goal of our project is to find a better dimensionality reduction algorithm for the classification of expression Bioinformatics datasets by comparing the different dimension reduction algorithms. We will use three kinds of datasets which are DLBCL, Colon, and Leukemia to demonstrate the performance of different dimensionality reduction algorithms in gene expression datasets. At first, we will preprocess the obtained data by a linear transformation. Secondly, we will visualize datasets. Then, using several dimensionality reduction algorithms to reduce the dimension and classify the datasets. In the linear algorithm part, we used PCA and MDS. And in the nonlinear algorithm part, we used ISOMAP and Laplacian Eigenmaps. We will introduce and compare these four algorithms and the SVM algorithm. SVM algorithm will be combined with these four algorithms to find the algorithm with the best classification performance. Finally, we will use several methods such as visual analysis and balanced accuracy to evaluate the classification performance. Our work will enable people to identify and screen the most appropriate data processing methods they use when they have the 'dimension curse' problem.