DocumentCode :
3428630
Title :
Order-preserving clustering and its application to gene expression data
Author :
Syeda-Mahmood, Tanveer
Author_Institution :
IBM Almaden Res. Center, San Jose, CA, USA
Volume :
4
fYear :
2004
fDate :
23-26 Aug. 2004
Firstpage :
637
Abstract :
Clustering of ordered data sets is a common problem faced in many pattern recognition tasks. Existing clustering methods either fail to capture the data or use restrictive models such as HMMs or AR models to model the data. In this paper, we present a general order-preserving clustering algorithm that allows arbitrary patterns of data evolution by representing each ordered set as a curve. Clustering of the data then reduces to grouping curves based on shape similarity. We develop a novel measure of shape similarity between curves using scale-space distance. Shape similarity or dis-similarity is judged by composing the higher-dimensional curves from constituent curves and noting the additional twists and turns in such curves that can be attributed to shape differences. An algorithm analogous to K-means clustering is then developed that uses prototypical curves for cluster representation. Results are demonstrated on the ordered gene expression data sets obtained from gene chips.
Keywords :
data structures; genetics; pattern clustering; K-means clustering; data evolution; data representation; gene chips; gene expression data; order-preserving clustering algorithm; pattern recognition; scale-space distance; shape similarity; Clustering algorithms; Clustering methods; Data analysis; Data models; Gene expression; Hidden Markov models; Information analysis; Pattern recognition; Prototypes; Shape measurement;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on
ISSN :
1051-4651
Print_ISBN :
0-7695-2128-2
Type :
conf
DOI :
10.1109/ICPR.2004.1333853
Filename :
1333853
Link To Document :
بازگشت