• DocumentCode
    49046
  • Title

    Rough Based Symmetrical Clustering for Gene Expression Profile Analysis

  • Author

    Sarkar, Anasua ; Maulik, Ujjwal

  • Author_Institution
    Dept. of Inf. Technol., Gov. Coll. of Eng. & Leather Technol., Kolkata, India
  • Volume
    14
  • Issue
    4
  • fYear
    2015
  • fDate
    Jun-15
  • Firstpage
    360
  • Lastpage
    367
  • Abstract
    Identification of coexpressed genes is the central goal in microarray gene expression data analysis. Point symmetry-based clustering is an important unsupervised learning technique for recognizing symmetrical convex or non-convex shaped clusters. To enable fast automatic clustering of large microarray data, in this article, a distributed time-efficient scalable parallel rough set based hybrid approach for point symmetry-based clustering algorithm has been proposed. A natural basis for analyzing gene expression data using the symmetry-based algorithm, is to group together genes with similar symmetrical patterns of expression. Rough-set theory helps in faster convergence and initial automatic optimal classification, thereby solving the problem of unknown knowledge of number of clusters in microarray data. This new parallel implementation with K-means algorithm also satisfies the linear speedup in timing on large microarray datasets. This proposed algorithm is compared with another parallel symmetry-based K-means and parallel version of existing K-means over four artificial and benchmark microarray datasets. We also have experimented over three skewed cancer gene expression datasets. The statistical analysis are also performed to establish the significance of this new implementation. The biological relevance of the clustering solutions are also analyzed.
  • Keywords
    bioinformatics; cancer; data analysis; genetics; genomics; parallel algorithms; pattern clustering; rough set theory; statistical analysis; unsupervised learning; K-means algorithm; artificial microarray datasets; benchmark microarray datasets; clustering solutions; coexpressed gene identification; convergence; distributed time-efficient scalable parallel rough set based hybrid approach; fast automatic clustering; gene expression profile analysis; initial automatic optimal classification; large microarray datasets; linear speedup; microarray gene expression data analysis; nonconvex shaped clusters; parallel symmetry-based K-means; point symmetry-based clustering algorithm; rough based symmetrical clustering; rough-set theory; skewed cancer gene expression datasets; statistical analysis; symmetrical convex shaped clusters; symmetry-based algorithm; unsupervised learning technique; Algorithm design and analysis; Clustering algorithms; Gene expression; Indexes; Lungs; Partitioning algorithms; Program processors; Automatic clustering algorithm; K-means algorithm; microarray gene expression data; point-symmetry based distance; rough set decision rules;
  • fLanguage
    English
  • Journal_Title
    NanoBioscience, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1536-1241
  • Type

    jour

  • DOI
    10.1109/TNB.2015.2421323
  • Filename
    7097734