Research on text similarity calculation based on BERT and Word2Vec

Conference: ICETIS 2022 - 7th International Conference on Electronic Technology and Information Science
01/21/2022 - 01/23/2022 at Harbin, China

Proceedings: ICETIS 2022

Pages: 4Language: englishTyp: PDF

Authors:
Yuan, Wei (College of Artificial Intelligence and Data Science Hebei University of Technology Tianjin, China)
Lei, Yaoxu (Hebei Province Tobacco Company Xingtai Company Hebei, China)
Guo, Xin (Qinhuangdao Research Institute National Rehabilitation Auxiliary Research Center Qinhuangdao, China)

Abstract:
At present, the rapid development of NLP technology makes it more and more convenient in the direction of intelligent information retrieval and screening. The semantically based retrieval algorithm can effectively improve the retrieval rate by calculating the similarity between two sentences. This paper proposes a method to obtain the similarity between sentences based on Bert model, and compares the traditional ALBERT, ESIM and BIMPM models. Experimental results show that the accuracy of BERT model in calculating text similarity reaches 87%, which is obviously better than other models. At the same time, the synonym model is trained based on Word2Vec to obtain synonyms related to the target word. Therefore, the algorithm adopted in this paper can effectively improve the retrieval efficiency.