DocumentCode
3428630
Title
Order-preserving clustering and its application to gene expression data
Author
Syeda-Mahmood, Tanveer
Author_Institution
IBM Almaden Res. Center, San Jose, CA, USA
Volume
4
fYear
2004
fDate
23-26 Aug. 2004
Firstpage
637
Abstract
Clustering of ordered data sets is a common problem faced in many pattern recognition tasks. Existing clustering methods either fail to capture the data or use restrictive models such as HMMs or AR models to model the data. In this paper, we present a general order-preserving clustering algorithm that allows arbitrary patterns of data evolution by representing each ordered set as a curve. Clustering of the data then reduces to grouping curves based on shape similarity. We develop a novel measure of shape similarity between curves using scale-space distance. Shape similarity or dis-similarity is judged by composing the higher-dimensional curves from constituent curves and noting the additional twists and turns in such curves that can be attributed to shape differences. An algorithm analogous to K-means clustering is then developed that uses prototypical curves for cluster representation. Results are demonstrated on the ordered gene expression data sets obtained from gene chips.
Keywords
data structures; genetics; pattern clustering; K-means clustering; data evolution; data representation; gene chips; gene expression data; order-preserving clustering algorithm; pattern recognition; scale-space distance; shape similarity; Clustering algorithms; Clustering methods; Data analysis; Data models; Gene expression; Hidden Markov models; Information analysis; Pattern recognition; Prototypes; Shape measurement;
fLanguage
English
Publisher
ieee
Conference_Titel
Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on
ISSN
1051-4651
Print_ISBN
0-7695-2128-2
Type
conf
DOI
10.1109/ICPR.2004.1333853
Filename
1333853
Link To Document