Chinese text clustering and classification applications of meteorological observational remarks

Konferenz: CAIBDA 2022 - 2nd International Conference on Artificial Intelligence, Big Data and Algorithms
17.06.2022 - 19.06.2022 in Nanjing, China

Tagungsband: CAIBDA 2022

Seiten: 5Sprache: EnglischTyp: PDF

Autoren:
Liu, Xiaoyan; Du, Jianhua; Yang, Qingwen (Hainan Meteorological Information Center, Hainan, China)

Inhalt:
The observations of monthly data sets of surface Automated Weather Stations (AWS) provide critical information regarding the quality of meteorological observations, the condition of observational instruments, changes in the AWS' surroundings, and so on. Every month, manual work is performed to examine the remarks, which takes a lot of time and effort. Remarks were divided into pure and combined remarks in this study, and then pure remarks were manually sorted into 27 classifications. Pure remarks were clustered using K-means and repeat bisection clustering methods, and it was discovered that 8 categories had low cluster accuracy of less than 0.15 due to tiny amounts of remarks of less than 10. The remaining remarks were used to train 12 classifiers, with the results revealing that the decision tree classifier had the most comprehensive effect, with high accuracy of more than 0.97 and a low time cost of fewer than 2 seconds.