Perturbation-enhanced-based RoBERTa combined with BiLSTM model for Text classification

Konferenz: ICETIS 2022 - 7th International Conference on Electronic Technology and Information Science
21.01.2022 - 23.01.2022 in Harbin, China

Tagungsband: ICETIS 2022

Seiten: 5Sprache: EnglischTyp: PDF

Autoren:
Shi, Wei; Song, Miao; Wang, Yong (College of Information Engineering, Shanghai Maritime University, Shanghai, China)

Inhalt:
Text classification is an essential task in the field of natural language processing, which aims to identify the most relevant category for a given piece of text. Although numerous methods of text classification have been proposed, there are still many difficulties such as metaphor expression, semantic diversity and grammatical specificity. In order to solve these problems, this paper proposes a new classification model F-Roberta-BilSTM. First, RoBERTa replaces the traditional static language model word2vec, which can dynamically adjust word vector features according to context information. Secondly, BiLSTM, composed of a forward LSTM and a backward LSTM, is added on the basis of RoBERTa, which helps to effectively extract semantic features according to forward context and backward context. Finally, we added noise disturbance to RoBERTa's word embedding layer to further improve the classification accuracy of the model, considering the impact of word vector feature accuracy on BiLSTM semantic feature extraction. The proposed model was validated on the two topic data sets (i.e., THUCNews and Shopping Review data sets) and achieved better performances than those of conventional classification models.