Academy & Industry Research Collaboration Center (AIRCC)

Volume 12, Number 23, December 2022

Active Learning Entropy Sampling based Clustering Optimization Method for Electricity Data

  Authors

Wang Qingnan and Zhang Zhaogong, Heilongjiang University, China

  Abstract

Clustering is a crucial part in the field of data mining, and common clustering methods include division-based methods, hierarchy-based methods, density-based methods, and grid-based methods. In order to improve the accuracy of clustering, an optimization study is made mainly for the division-based method FCM clustering, and an FCM clustering method that integrates active learning and principal component analysis (PCA) is proposed. The method first uses principal component analysis to reduce the dimensionality of the data to reduce the computation of electricity data, then trains the sample model by active learning, and introduces the entropy (Entropy) method in the uncertainty sampling method, the larger the entropy means the greater the uncertainty of the sample, and the smaller the entropy means the smaller the uncertainty of the sample, so as to filter the electricity data, and finally the electricity data are clustered by FCM clustering The power data is finally categorized by FCM clustering, and with the proliferation of power data, the power data can be more accurately categorized using this method to achieve the stability of the power grid as well as the utilization rate. Experimental results on three datasets show that this method improves the accuracy of power data clustering by up to 2 percentage points compared to the traditional clustering method without active learning, and achieves good results in each dataset compared to other methods.

  Keywords

Active Learning, Data Mining, FCM Clustering, Principal Component Analysis, Unsupervised Learning