• DocumentCode
    918894
  • Title

    Cluster analysis of gene expression data based on self-splitting and merging competitive learning

  • Author

    Wu, Shuanhu ; Liew, Alan Wee-chung ; Yan, Hong ; Yang, Mengsu

  • Author_Institution
    Dept. of Comput. Eng. & Inf. Technol., City Univ. of Hong Kong, China
  • Volume
    8
  • Issue
    1
  • fYear
    2004
  • fDate
    3/1/2004 12:00:00 AM
  • Firstpage
    5
  • Lastpage
    15
  • Abstract
    Cluster analysis of gene expression data from a cDNA microarray is useful for identifying biologically relevant groups of genes. However, finding the natural clusters in the data and estimating the correct number of clusters are still two largely unsolved problems. In this paper, we propose a new clustering framework that is able to address both these problems. By using the one-prototype-take-one-cluster (OPTOC) competitive learning paradigm, the proposed algorithm can find natural clusters in the input data, and the clustering solution is not sensitive to initialization. In order to estimate the number of distinct clusters in the data, we propose a cluster splitting and merging strategy. We have applied the new algorithm to simulated gene expression data for which the correct distribution of genes over clusters is known a priori. The results show that the proposed algorithm can find natural clusters and give the correct number of clusters. The algorithm has also been tested on real gene expression changes during yeast cell cycle, for which the fundamental patterns of gene expression and assignment of genes to clusters are well understood from numerous previous studies. Comparative studies with several clustering algorithms illustrate the effectiveness of our method.
  • Keywords
    DNA; arrays; biology computing; cellular biophysics; genetic algorithms; genetics; microorganisms; pattern clustering; statistical analysis; unsupervised learning; biologically relevant groups; cDNA microarray; cluster self-splitting learning; clustering algorithms; clustering framework; gene expression data cluster analysis; gene expression data simulation; gene expression fundamental patterns; genes identification; merging competitive learning paradigm; natural clusters; one-prototype-take-one-cluster; yeast cell cycle; Clustering algorithms; Condition monitoring; Data analysis; Databases; Fungi; Gene expression; Information technology; Merging; Pattern analysis; Testing; Algorithms; Artificial Intelligence; Cell Cycle; Gene Expression Profiling; Oligonucleotide Array Sequence Analysis; Pattern Recognition, Automated; Reproducibility of Results; Sensitivity and Specificity; Sequence Analysis, DNA; Yeasts;
  • fLanguage
    English
  • Journal_Title
    Information Technology in Biomedicine, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1089-7771
  • Type

    jour

  • DOI
    10.1109/TITB.2004.824724
  • Filename
    1271296