• DocumentCode
    3428630
  • Title

    Order-preserving clustering and its application to gene expression data

  • Author

    Syeda-Mahmood, Tanveer

  • Author_Institution
    IBM Almaden Res. Center, San Jose, CA, USA
  • Volume
    4
  • fYear
    2004
  • fDate
    23-26 Aug. 2004
  • Firstpage
    637
  • Abstract
    Clustering of ordered data sets is a common problem faced in many pattern recognition tasks. Existing clustering methods either fail to capture the data or use restrictive models such as HMMs or AR models to model the data. In this paper, we present a general order-preserving clustering algorithm that allows arbitrary patterns of data evolution by representing each ordered set as a curve. Clustering of the data then reduces to grouping curves based on shape similarity. We develop a novel measure of shape similarity between curves using scale-space distance. Shape similarity or dis-similarity is judged by composing the higher-dimensional curves from constituent curves and noting the additional twists and turns in such curves that can be attributed to shape differences. An algorithm analogous to K-means clustering is then developed that uses prototypical curves for cluster representation. Results are demonstrated on the ordered gene expression data sets obtained from gene chips.
  • Keywords
    data structures; genetics; pattern clustering; K-means clustering; data evolution; data representation; gene chips; gene expression data; order-preserving clustering algorithm; pattern recognition; scale-space distance; shape similarity; Clustering algorithms; Clustering methods; Data analysis; Data models; Gene expression; Hidden Markov models; Information analysis; Pattern recognition; Prototypes; Shape measurement;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on
  • ISSN
    1051-4651
  • Print_ISBN
    0-7695-2128-2
  • Type

    conf

  • DOI
    10.1109/ICPR.2004.1333853
  • Filename
    1333853