A Novel Patient Similarity Prediction Model Based on Semisupervised Learning

Conference: CAIBDA 2022 - 2nd International Conference on Artificial Intelligence, Big Data and Algorithms
06/17/2022 - 06/19/2022 at Nanjing, China

Proceedings: CAIBDA 2022

Pages: 9Language: englishTyp: PDF

Authors:
Zhang, Linlin (School of Software, Xinjiang University, Urumqi, China & College of Information Science and Engineering, Xinjiang University, Urumqi, China & College of Cyber Science and Engineering, Xinjiang University, Urumqi, China)
Li, Xinyao; Zhao, Kai (College of Information Science and Engineering, Xinjiang University, Urumqi, China & College of Cyber Science and Engineering, Xinjiang University, Urumqi, China)
Bi, Xuehua; Yu, Guanglei (College of Biomedical Engineering and Technology, Xinjiang Medical University, Urumqi, China)
Zhang, Ying (First Affiliated Hospital of Xinjiang Medical University, Urumqi, China)

Abstract:
Patient similarity prediction can contribute to personalized prediction of patient disease. Accurately identifying and ranking the similarity among patients based on their historical records is a key step in personalized healthcare. However, labeled data in patient similarity prediction tasks are time-consuming and scarce. In such a case, it is urgent to utilize the limited labeled data and the large amount of unlabeled data in patient similarity prediction. In this paper, we propose a patient similarity prediction model based on semi-supervised learning, by designing an enhanced sample algorithm to increase the number of labeled samples and improve the prediction accuracy of the classifier. At first, Light Gradient Boosting Machine (LightGBM)and Random Forest (RF) are used to as the basic classifiers for co-training, and the cotraining is used to filter the enhanced samples, which were composed of two classifiers predicting consistent unlabeled samples. Secondly, with the help of self-training ideas, the final enhancement sample is composed of samples with higher confidence among the enhancement samples to be selected. Finally, experiments were done on the new labeled sample set for patient similarity prediction to prove our approach based on a real electronic medical record dataset which came from certain hospital. Experiments show that our proposed method outperform the other baseline semi-supervised methods with an F1 value of 90.7%.