A study on the method of eliminating duplication of ocean temperature and salinity data

Konferenz: AIIPCC 2022 - The Third International Conference on Artificial Intelligence, Information Processing and Cloud Computing
21.06.2022 - 22.06.2022 in Online

Tagungsband: AIIPCC 2022

Seiten: 7Sprache: EnglischTyp: PDF

Autoren:
Ji, Fengying; Dong, Mingmei; Liu, Yulong; Xu, Shanshan; Wan, Fangfang; Shi, Xiaoxiao; Han, Luyao; Yue, Xinyang; Zhang, Zengjian (National Marine Data and Information Service, China)

Inhalt:
Duplicate data introduced during data collection, transmission, exchange, and management will lead to uncertainty in the total amount of data and cause unreliability of the whole data set, resulting in erroneous results of statistical analysis of marine characteristics. In this paper, based on extensive data processing practices, the author first clarifies the source and type of temperature and salinity duplicate data, then proposes a series of threshold for identifying repeated data from different instruments, and establishes a process for identifying and eliminating duplicate data. In this way, duplicated data in the WOD data are effectively removed. This operational process can effectively remove the duplicates of temperature and salinity data and improve the application value of the dataset, which can also be applied to other types of data.