Title :
Extending multi-label feature selection with KEGG pathway information for microarray data analysis
Author :
Jungjit, Suwimol ; Freitas, Alex A. ; Michaelis, Martin ; Cinatl, J.
Author_Institution :
Sch. of Comput., Univ. of Kent, Canterbury, UK
Abstract :
We propose three approaches to extend our previous Multi-Label Correlation-based Feature Selection (ML-CFS) method with cancer-related KEGG pathway information, in order to select a better set of genes (features) for cancer microarray data classification. In the approach which produced the best results, ML-CFS was extended with a weighted formula that combines genes´ predictive power and occurrence in cancer-related KEGG pathways as criteria for gene selection. We also investigated the effect of different weights for those two criteria. That approach obtained, in general, a statistically significantly smaller hamming loss (i.e. higher predictive accuracy) when compared to the hamming loss obtained by ML-CFS without using KEGG pathway information, in two cancer-related microarray datasets, using two different multi-label classification algorithms - one based on neural networks, the other based on nearest neighbors. In addition to significantly improving predictive performance, the genes selected by that approach were found to be more biologically relevant to the analysis of our datasets than genes selected without using KEGG pathway information. To the best of our knowledge, this is the first paper to propose a KEGG pathway-based feature selection method for multi-label classification.
Keywords :
DNA; cancer; data analysis; feature selection; genetic algorithms; genetics; medical computing; molecular biophysics; neural nets; pattern classification; cancer microarray data classification; cancer-related KEGG pathway information; cancer-related microarray datasets; extending multilabel feature selection; gene predictive power; gene selection; gene sets; higher predictive accuracy; microarray data analysis; multilabel correlation-based feature selection; nearest neighbors; neural networks; statistically significantly smaller hamming loss; Accuracy; Biology; Cancer; Classification algorithms; Correlation; Correlation coefficient; Drugs; KEGG pathway; cancer-related microarray data; multi-label classification; multi-label feature selection; neuroblastoma;
Conference_Titel :
Computational Intelligence in Bioinformatics and Computational Biology, 2014 IEEE Conference on
Conference_Location :
Honolulu, HI
DOI :
10.1109/CIBCB.2014.6845501