Title :
Revisiting Link-Based Cluster Ensembles for Microarray Data Classification
Author :
Iam-On, Natthakan ; Boongoen, Tossapon
Author_Institution :
Sch. of Inf. Technol., Mae Fah Luang Univ., Chiang Rai, Thailand
Abstract :
Cancer has been identified as the leading cause of death. It is predicted that around 20-26 million people will be diagnosed with cancer by 2020. With this alarming rate, there is an urgent need for a more effective methodology to understand, prevent and cure cancer. Micro array technology provides a useful basis of achieving this ultimate goal. In particular to cancer research, it has become almost routine to create gene expression profiles, which can discriminate patients into good and poor prognosis groups, and identify possible tumor subtypes. This classification or predictive model offers a useful tool for individualized treatment of disease. However, the accuracy of existing classifiers have been constrained by the curse of dimensionality typically observed in micro array data. In addition to gene selection, one may transform the original data to another variation, where only key gene components are included. Unlike conventional transformation-based techniques found in the literature, this paper presents a novel method that makes use of cluster ensembles, specifically the summarizing information matrix, as the transformed data for the following classification step. Among different state-of-the-art methods, the link-based cluster ensemble approach (LCE) provides a highly accurate clustering, and thus particularly employed here. The performance of this transformation model is evaluated on published micro array datasets and C4.5, in comparison with benchmark techniques. The findings suggest that the new model can improve the classification accuracy of original data and performs better than the other transformation methods investigated in the empirical study.
Keywords :
cancer; data handling; medical computing; pattern classification; pattern clustering; C4.5; LCE; cancer research; dimensionality curse; disease treatment; gene components; gene expression profiles; link-based cluster ensemble approach; link-based cluster ensembles; microarray data classification; predictive model; prognosis groups; published microarray datasets; transformation methods; tumor subtypes; Accuracy; Cancer; Clustering algorithms; Data models; Error analysis; Gene expression; Principal component analysis; classification; cluster ensembles; link-based similarity; microarray data;
Conference_Titel :
Systems, Man, and Cybernetics (SMC), 2013 IEEE International Conference on
Conference_Location :
Manchester
DOI :
10.1109/SMC.2013.773