Volume 13, Number 3
A Review on Classification of Data Imbalance using BigData
Authors
Ramasubramanian and Hariharan Shanmugasundaram, Shadan Women’s College of Engineering and Technology, India
Abstract
Classification is one among the data mining function that assigns items in a collection to target categories or collection of data to provide more accurate predictions and analysis. Classification using supervised learning method aims to identify the category of the class to which a new data will fall under. With the advancement of technology and increase in the generation of real-time data from various sources like Internet, IoT and Social media it needs more processing and challenging. One such challenge in processing is data imbalance. In the imbalanced dataset, majority classes dominate over minority classes causing the machine learning classifiers to be more biased towards majority classes and also most classification algorithm predicts all the test data with majority classes. In this paper, the author analysis the data imbalance models using big data and classification algorithm.
Keywords
Data imbalance, Big data, IoT, Data analytics & Classification.