Research and Implementation of Sentence Error Correction Method in Thangka Field

Konferenz: CIBDA 2022 - 3rd International Conference on Computer Information and Big Data Applications
25.03.2022 - 27.03.2022 in Wuhan, China

Tagungsband: CIBDA 2022

Seiten: 5Sprache: EnglischTyp: PDF

Autoren:
Wang, Tiejun; Wang, Yu (Northwest Minzu University, School of Mathematics and Computer Science, Lanzhou, Gansu, China)
Cheng, Sujie (Northwest Minzu University, National Languages Information Technology, Boulder, Lanzhou, Gansu, China & Northwest Minzu University Key Laboratory of China’s Ethnic Languages and Information Technology of Ministry of Education, Lanzhou, Gansu, China)

Inhalt:
Text error correction is one of the research fields of natural language processing, which is applied in search engine, intelligent question answering, associative input and so on. At present, text error correction is mostly in the general domain, and there are few researches on the error correction of Thangka statements. This paper proposes a method of error correction of Thangka statements. Firstly, Thangka data is pre-processed to complete the training of language model and improve the efficiency and accuracy of text recognition. Then detect the suspected wrong position of the statement, and use domain confusion set, word granularity n-gram and word granularity n-gram to detect the wrong word. Finally, the candidate set of editing distance and phonetic/shape near word recall error correction were used, and the candidate results were sorted by the degree of confusion. The method proposed in this paper has a good application value in the field of Thangka error correction.