Enhancing the Effectiveness of Clustering with Spectra Analysis

Enhancing the Effectiveness of Clustering with Spectra Analysis
For many clustering algorithms such as k-means, EM and CLOPE, there is usually a requirement to set some parameters.
Often, these parameters directly or indirectly control the number of clusters, i.e., k, to return. In the presence of different data
characteristics and analysis contexts, it is often difficult for the user to estimate the number of clusters in the data set. This is
especially true in text collections such as Web documents, images or biological data. In an effort to improve the effectiveness
of clustering, we seek the answer to a fundamental question: How can we effectively estimate the natural number of clusters
in a given data set? We propose an efficient method based on spectra analysis of eigenvalues (not eigenvectors) of the data set
as the solution to the above. We first present the relationship between a data set and its underlying spectra with theoretical and
experimental results. We then show how our method is capable of suggesting a range of k that is well-suited to different analysis
contexts. Finally, we conclude with further empirical results to show how the answer to this fundamental question enhances the
clustering process for large text collections and gene expression data.

Tags :
Your rating: None