Academy & Industry Research Collaboration Center (AIRCC)

Volume 12, Number 14, August 2022

An Optimized Method for Massive Sensitive Data Classification in an Industry Environment

  Authors

Qi Zhong, Shichang Gao and Bo Yi, Northeastern University, China

  Abstract

In the era of big data, data is endowed with higher potential value. However, new challenges are also brought to data security, especially for the sensitive data in an industrial environment. Nowadays, with the development of industrial internet, enterprises connect each other, under which a slight carelessness may lead to the leakage of sensitive data, which will bring inestimable losses to enterprises. Hence, sensitive data classification is required as a secure way to avoid such situation. This paper presents a sensitive data classification method based on an improved ID3 decision algorithm. Firstly, we introduce the idea of attribute weighting to optimize the basic structure of traditional ID3. Secondly, we use the weighted information gain to select nodes during tree construction, which improves multi-value bias defect compared with the traditional algorithm. Experimental results show that we can achieve branching accuracy up to 97.38%.

  Keywords

Sensitive data, Data classification, ID3 decision tree, Industrial environment.