Author :
Goebl, Sebastian ; Xiao He ; Plant, Claudia ; Bohm, Christian
Abstract :
The ability to simplify and categorize things is one of the most important elements of human thought, understanding, and learning. The corresponding explorative data analysis techniques -- dimensionality reduction and clustering -- have initially been studied by our community as two separate research topics. Later algorithms like CLIQUE, ORCLUS, 4C, etc. Performed clustering and dimensionality reduction in a joint, alternating process to find clusters residing in low-dimensional subspaces. Such a low-dimensional representation is extremely useful, because it allows us to visualize the relationships between the various objects of a cluster. However, previous methods of subspace, correlation or projected clustering determine an individual subspace for each cluster. In this paper, we demonstrate that it is even much more valuable to find clusters in one common low-dimensional subspace, because then we can study not only the intra-cluster but also the inter-cluster relationships of objects, and the relationships of the whole clusters to each other. We develop the mathematical foundation ORT (Optimal Rigid Transform) to determine an arbitrarily-oriented subspace, suitable for a given cluster structure. Based on ORT, we propose FOSSCLU (Finding the Optimal Sub Space for Clustering), a new iterative clustering algorithm. Our extensive experiments demonstrate that FOSSCLU outperforms the previous methods even in both aspects: clustering and dimensionality reduction.
Keywords :
data analysis; iterative methods; pattern clustering; FOSSCLU; ORT; data analysis; dimensionality reduction; finding the optimal sub space for clustering; iterative clustering algorithm; low-dimensional representation; mathematical foundation; optimal rigid transform; optimal subspace; Clustering algorithms; Covariance matrices; Eigenvalues and eigenfunctions; Matrix decomposition; Noise; Principal component analysis; Transforms; Joint Subspace Clustering;