Analysis of Unsupervised Clustering Algorithms and Impact of Dimensionality Reduction: A Data Driven Approach

doi:10.5121/mlaij.2025.12111

Analysis of Unsupervised Clustering Algorithms and Impact of Dimensionality Reduction: A Data Driven Approach

Authors

Palak Narula, Adobe Inc., India

Abstract

Clustering is a widely used unsupervised learning technique for discovering hidden patterns in data.however, high-dimensional datasets often pose challenges in terms of computational efficiency and clustering effectiveness. This study investigates the impact of dimensionality reduction on clustering performance by applying principal component analysis (pca),independent component analysis (ica),randomized projection, and feature agglomeration before clustering. The research utilizes k-means and expectation-maximization (em) clustering algorithms on two real-world datasets: bankruptcy prediction and breast cancer diagnosis. The study examines how different dimensionality reduction techniques influence cluster formation, computational efficiency, and interpretability. The results indicate that dimensionality reduction improves processing time and, in some cases, enhances clustering performance by removing noise and redundant features. However, certain techniques may lead to information loss, reducing cluster separability. This research provides insights into selecting appropriate dimensionality reduction methods to optimize clustering in unsupervised learning applications.

Keywords

Machine Learning, Unsupervised Learning, Clustering, Dimensionality Reduction, K-means clustering, EM clustering, Principal Component Analysis, Independent Component Analysis, Randomised Projection, Feature Agglomeration

Full Text Volume 12

MLAIJ

Analysis of Unsupervised Clustering Algorithms and Impact of Dimensionality Reduction: A Data Driven Approach