Authors
Palak Narula, Adobe Inc., India
Abstract
Clustering is a widely used unsupervised learning technique for discovering hidden patterns in data.however, high-dimensional datasets often pose challenges in terms of computational efficiency and clustering effectiveness. This study investigates the impact of dimensionality reduction on clustering performance by applying principal component analysis (pca),independent component analysis (ica),randomized projection, and feature agglomeration before clustering. The research utilizes k-means and expectation-maximization (em) clustering algorithms on two real-world datasets: bankruptcy prediction and breast cancer diagnosis. The study examines how different dimensionality reduction techniques influence cluster formation, computational efficiency, and interpretability. The results indicate that dimensionality reduction improves processing time and, in some cases, enhances clustering performance by removing noise and redundant features. However, certain techniques may lead to information loss, reducing cluster separability. This research provides insights into selecting appropriate dimensionality reduction methods to optimize clustering in unsupervised learning applications.
Keywords
Machine Learning, Unsupervised Learning, Clustering, Dimensionality Reduction, K-means clustering, EM clustering, Principal Component Analysis, Independent Component Analysis, Randomised Projection, Feature Agglomeration