Title :
A new GP-based wrapper feature construction approach to classification and biomarker identification
Author :
Ahmed, Shehab ; Mengjie Zhang ; Lifeng Peng
Author_Institution :
Sch. of Eng. & Comput. Sci., Victoria Univ. of Wellington, Wellington, New Zealand
Abstract :
Mass spectrometry (MS) is a technology used for identification and quantification of proteins and metabolites. It helps in the discovery of proteomic or metabolomic biomarkers, which aid in diseases detection and drug discovery. The detection of biomarkers is performed through the classification of patients from healthy samples. The mass spectrometer produces high dimensional data where most of the features are irrelevant for classification. Therefore, feature reduction is needed before the classification of MS data can be done effectively. Feature construction can provide a means of dimensionality reduction and aims at improving the classification performance. In this paper, genetic programming (GP) is used for construction of multiple features. Two methods are proposed for this objective. The proposed methods work by wrapping a Random Forest (RF) classifier to GP to ensure the quality of the constructed features. Meanwhile, five other classifiers in addition to RF are used to test the impact of the constructed features on the performance of these classifiers. The results show that the proposed GP methods improved the performance of classification over using the original set of features in five MS data sets.
Keywords :
diseases; drugs; feature extraction; genetic algorithms; mass spectra; medical computing; pattern classification; proteins; proteomics; random processes; GP methods; GP-based wrapper feature construction approach; MS data classification; MS data sets; biomarker identification; dimensionality reduction; disease detection; drug discovery; feature construction; feature reduction; genetic programming; high dimensional data; mass spectrometer; mass spectrometry; metabolite identification; metabolite quantification; metabolomic biomarker discovery; metabolomic biomarkers; patient classification; protein identification; protein quantification; proteomic biomarker discovery; random forest classifier; Accuracy; Cancer; Entropy; Radio frequency; Smoothing methods; Training; Vegetation;
Conference_Titel :
Evolutionary Computation (CEC), 2014 IEEE Congress on
Conference_Location :
Beijing
Print_ISBN :
978-1-4799-6626-4
DOI :
10.1109/CEC.2014.6900317