Multi-label topic classification for COVID-19 literature annotation: A BioBERT-based feature enhancement approach

Konferenz: CIBDA 2022 - 3rd International Conference on Computer Information and Big Data Applications
25.03.2022 - 27.03.2022 in Wuhan, China

Tagungsband: CIBDA 2022

Seiten: 4Sprache: EnglischTyp: PDF

Autoren:
Wang, Xin; Wang, Jian; Tang, Wentai; Zhang, Hongtong (College of Computer Science and Technology, Dalian University of Technology, Dalian, China)

Inhalt:
With the rapid expansion and exponential growth of biomedical literatures, especially in the current environment of COVID-19 pandemic, it is urgent to explore an effective technology to automatically manage and categorize massive information for biomedical texts. The wide application and powerful performance of BERT have shown promising results in the field of natural language processing. Thus, we first choose the improved pre-trained language models CovidBERT and BioBERT as the basis, from the best performance of which further enhances semantic representation of abstract with extra title information. Finally, a novel feature enhancement method is proposed to exploit and integrate the distribution of label information effectively. The experimental results show that our model achieves an instance-based F1 score, precision and recall of 93.94%, 93.5% and 94.38% in the task of multi-label topic classification from track 5 BioCreative VII.