DocumentCode
49046
Title
Rough Based Symmetrical Clustering for Gene Expression Profile Analysis
Author
Sarkar, Anasua ; Maulik, Ujjwal
Author_Institution
Dept. of Inf. Technol., Gov. Coll. of Eng. & Leather Technol., Kolkata, India
Volume
14
Issue
4
fYear
2015
fDate
Jun-15
Firstpage
360
Lastpage
367
Abstract
Identification of coexpressed genes is the central goal in microarray gene expression data analysis. Point symmetry-based clustering is an important unsupervised learning technique for recognizing symmetrical convex or non-convex shaped clusters. To enable fast automatic clustering of large microarray data, in this article, a distributed time-efficient scalable parallel rough set based hybrid approach for point symmetry-based clustering algorithm has been proposed. A natural basis for analyzing gene expression data using the symmetry-based algorithm, is to group together genes with similar symmetrical patterns of expression. Rough-set theory helps in faster convergence and initial automatic optimal classification, thereby solving the problem of unknown knowledge of number of clusters in microarray data. This new parallel implementation with K-means algorithm also satisfies the linear speedup in timing on large microarray datasets. This proposed algorithm is compared with another parallel symmetry-based K-means and parallel version of existing K-means over four artificial and benchmark microarray datasets. We also have experimented over three skewed cancer gene expression datasets. The statistical analysis are also performed to establish the significance of this new implementation. The biological relevance of the clustering solutions are also analyzed.
Keywords
bioinformatics; cancer; data analysis; genetics; genomics; parallel algorithms; pattern clustering; rough set theory; statistical analysis; unsupervised learning; K-means algorithm; artificial microarray datasets; benchmark microarray datasets; clustering solutions; coexpressed gene identification; convergence; distributed time-efficient scalable parallel rough set based hybrid approach; fast automatic clustering; gene expression profile analysis; initial automatic optimal classification; large microarray datasets; linear speedup; microarray gene expression data analysis; nonconvex shaped clusters; parallel symmetry-based K-means; point symmetry-based clustering algorithm; rough based symmetrical clustering; rough-set theory; skewed cancer gene expression datasets; statistical analysis; symmetrical convex shaped clusters; symmetry-based algorithm; unsupervised learning technique; Algorithm design and analysis; Clustering algorithms; Gene expression; Indexes; Lungs; Partitioning algorithms; Program processors; Automatic clustering algorithm; K-means algorithm; microarray gene expression data; point-symmetry based distance; rough set decision rules;
fLanguage
English
Journal_Title
NanoBioscience, IEEE Transactions on
Publisher
ieee
ISSN
1536-1241
Type
jour
DOI
10.1109/TNB.2015.2421323
Filename
7097734
Link To Document