Title :
Comparison of Cluster Representations from Partial Second- to Full Fourth-Order Cross Moments for Data Stream Clustering
Author :
Mingzhou Song ; Lin Zhang
Author_Institution :
Dept. of Comput. Sci., New Mexico State Univ., Las Cruces, NM
Abstract :
Under seven external clustering evaluation measures, a comparison is made for cluster representations from the partial second order to the fourth order in data stream clustering. Two external clustering evaluation measures, purity and cross entropy, adopted for data stream clustering performance evaluation in the past, penalize the performance of an algorithm when each hypothesized cluster contains points in different target classes or true clusters, while ignoring the issue of points in a target class falling into different hypothesized clusters. The seven measures will address both sides of the clustering performance. The represented geometry by the partial second-order statistics of a cluster is non-oblique ellipsoidal and cannot describe the orientation, asymmetry, or peakedness of a cluster. The higher-order cluster representation presented in this paper introduces the third and fourth cross moments, enabling the cluster geometry to be beyond an ellipsoid. The higher-order statistics allow two clusters with different representations to merge into a multivariate normal cluster, using normality tests based on multivariate skewness and kurtosis. The clustering performance under the seven external clustering evaluation measures with a synthetic and two real data streams demonstrates the effectiveness of the higher-order cluster representations.
Keywords :
Gaussian processes; data structures; pattern clustering; Gaussian mixture model; cluster representations; data stream clustering; higher-order cluster representation; multivariate normal cluster; multivariate skewness; Clustering algorithms; Computer science; Data mining; Ellipsoids; Entropy; Geometry; Higher order statistics; Partitioning algorithms; Streaming media; Testing; Cluster representation; Cross moment; Data stream clustering; Gaussian mixture model;
Conference_Titel :
Data Mining, 2008. ICDM '08. Eighth IEEE International Conference on
Conference_Location :
Pisa
Print_ISBN :
978-0-7695-3502-9
DOI :
10.1109/ICDM.2008.143