Intelligent Tracking Technology of Sensitive Data Based on Tag Distribution Learning

Conference: CIBDA 2022 - 3rd International Conference on Computer Information and Big Data Applications
03/25/2022 - 03/27/2022 at Wuhan, China

Proceedings: CIBDA 2022

Pages: 4Language: englishTyp: PDF

Authors:
Zhou, Xiaoming (State Grid Liaoning Electric Power Supply Co. Ltd, State Grid of China, Shenyang, China)
Yu, Hai; Teng, Ziyi (State Grid Liaoning Information and Communication Company, State Grid of China, Shenyang, China)
Yu, Pengfei (State Grid Key Laboratory of Information & Network Security, Global Energy Interconnection Research Institute Co. Ltd, Nanjing, China)

Abstract:
The frequent interaction and sharing of massive data in the big data environment brings increasingly serious data security problems, such as the theft of data by illegal users, the unauthorized use of data by legitimate users, and the illegal disclosure and dissemination of data. In practical application, the phenomenon of data leakage is inevitable due to the influence of environmental and human factors, but there are few studies on the traceability of data after data leakage. Data traceability means that when the data leaked to a third party is captured, the source head and relevant responsible person can be traced by analyzing the relevant characteristics of the illegal data. This paper studies the intelligent tracking technology of sensitive data based on tag distribution learning, which consists of three stages: tag generation technology of sensitive data content, tag distribution generation method and tag distribution comparison technology of sensitive data. Firstly, aiming at the problem that the existing machine learning algorithm has low recognition accuracy for sensitive data with no significant features, the sensitive data content tag generation technology is studied, and the tags of structured data and unstructured data are extracted. Secondly, the generation method of label distribution is studied to realize data multi-label identification based on label distribution descriptor model. Finally, the label distribution comparison technology of sensitive data is studied to realize accurate identification and tracking of sensitive data based on label distribution learning. Aiming at the problem of frequent data flow and difficult monitoring of sensitive data flow, the generation of sensitive digital tags and the construction of association relationship in the process of data flow are studied, and the data tracking technology based on tag recognition is studied to realize the dynamic tracking and security monitoring of sensitive data.