DocumentCode :
1362413
Title :
Mutual Information-Based Supervised Attribute Clustering for Microarray Sample Classification
Author :
Maji, Pradipta
Author_Institution :
Machine Intell. Unit, Indian Stat. Inst., Kolkata, India
Volume :
24
Issue :
1
fYear :
2012
Firstpage :
127
Lastpage :
140
Abstract :
Microarray technology is one of the important biotechnological means that allows to record the expression levels of thousands of genes simultaneously within a number of different samples. An important application of microarray gene expression data in functional genomics is to classify samples according to their gene expression profiles. Among the large amount of genes presented in gene expression data, only a small fraction of them is effective for performing a certain diagnostic test. Hence, one of the major tasks with the gene expression data is to find groups of coregulated genes whose collective expression is strongly associated with the sample categories or response variables. In this regard, a new supervised attribute clustering algorithm is proposed to find such groups of genes. It directly incorporates the information of sample categories into the attribute clustering process. A new quantitative measure, based on mutual information, is introduced that incorporates the information of sample categories to measure the similarity between attributes. The proposed supervised attribute clustering algorithm is based on measuring the similarity between attributes using the new quantitative measure, whereby redundancy among the attributes is removed. The clusters are then refined incrementally based on sample categories. The performance of the proposed algorithm is compared with that of existing supervised and unsupervised gene clustering and gene selection algorithms based on the class separability index and the predictive accuracy of naive bayes classifier, K-nearest neighbor rule, and support vector machine on three cancer and two arthritis microarray data sets. The biological significance of the generated clusters is interpreted using the gene ontology. An important finding is that the proposed supervised attribute clustering algorithm is shown to be effective for identifying biologically significant gene clusters with excellent predictive capability.
Keywords :
Bayes methods; biology computing; biotechnology; diseases; genetics; genomics; ontologies (artificial intelligence); pattern clustering; support vector machines; arthritis microarray data sets; attribute clustering process; biological significance; biologically significant gene clusters; biotechnological means; cancer microarray data sets; class separability index; coregulated genes; diagnostic test; expression levels; functional genomics; gene expression profiles; gene ontology; gene selection algorithms; generated clusters; k-nearest neighbor rule; microarray gene expression data; microarray sample classification; microarray technology; mutual information-based supervised attribute clustering; naive Bayes classifier; predictive accuracy; predictive capability; quantitative measure; sample category; supervised attribute clustering algorithm; support vector machine; unsupervised gene clustering; Classification; Clustering algorithms; Gene expression; Information analysis; Mutual information; Prediction algorithms; Redundancy; Microarray analysis; attribute clustering; classification.; gene selection; mutual information;
fLanguage :
English
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
Publisher :
ieee
ISSN :
1041-4347
Type :
jour
DOI :
10.1109/TKDE.2010.210
Filename :
5611522
Link To Document :
بازگشت