Author(s):
1. RAMANA LAKSHMI ADUSUMILLI:
Dept of Computer Science and Systems Engineering, Andhra University, Visakhapatnam, India
2. M. SHASHI:
Dept of Computer Science and Systems Engineering, AU College of Engineering. Andhra University, Visakhapatnam,
Abstract:
Clustering large datasets often suffer from scalability issues as they involve pair-wise distance computations among all the instances of the dataset. Grid based algorithms achieve scalability as they circumvent the distance estimation step but are vulnerable to quality and coverage issues. In this paper, the authors proposed a new framework for cluster formation with high scalability while maintaining the coverage and quality. The Scalable and Robust Clustering (SRC) framework has three modules. The first module involves PCA to convert the multi-dimensional dataset into a low dimensional grid space and based on the joinable boundaries, the dense grid cells are merged to form macro-clusters representing dense regions. The second module involves density based clustering applied separately within the dense regions to form micro-clusters. Finally the third module involves statistical technique to find appropriate clusters for data objects located in non-dense grid cells. The first two modules of the framework handles scalability issues while the third module focuses on improving coverage without affecting the quality of clusters. The experimental results obtained on bench mark datasets show that the framework could achieve scalability, coverage and quality while handling large multi-dimensional datasets.
Page(s):
2586-2592
DOI:
DOI not available
Published:
Journal: Journal of Theoretical and Applied Information Technology, Volume: 100, Issue: 8, Year: 2022
Keywords:
Grid Based Approach
,
Density Based Clustering
,
Hybrid Clustering
,
Post Processing
References:
References are not available for this document.
Citations
Citations are not available for this document.