Title of article :
Improving Classifi cation of Cancer and Mining Biomarkers from Gene Expression Profi les Using Hybrid Optimization Algorithms and Fuzzy Support Vector Machine
Author/Authors :
Yousefi Moteghaed, Niloofar Department of Medical Genetics - Faculty of Medical Sciences - Tarbiat Modares University, Tehran, Iran , Maghooli, Keivan Department of Medical Genetics - Faculty of Medical Sciences - Tarbiat Modares University, Tehran, Iran , Garshasbi, Masoud Department of Medical Genetics - Faculty of Medical Sciences - Tarbiat Modares University, Tehran, Iran
Abstract :
Background: Gene expression data are characteristically high dimensional with a small sample size in contrast to the feature size and variability inherent in biological processes that contribute to diffi culties in analysis. Selection of highly discriminative features decreases the computational cost and complexity of the classifi er and improves its reliability for prediction of a new class of samples.
Methods: The present study used hybrid particle swarm optimization and genetic algorithms for gene selection and a fuzzy support vector machine (SVM) as the classifi er. Fuzzy logic is used to infer the importance of each sample in the training phase and decrease the outlier sensitivity of the system to increase the ability to generalize the classifi er. A decision-tree algorithm was applied to the most frequent genes to develop a set of rules for each type of cancer. This improved the abilities
of the algorithm by fi nding the best parameters for the classifi er during the training phase without the need for trial-and-error by the user. The proposed approach was tested on four benchmark gene expression profi les.
Results: Good results have been demonstrated for the proposed algorithm. The
classifi cation accuracy for leukemia data is 100%, for colon cancer is 96.67% and for breast cancer is 98%. The results show that the best kernel used in training the SVM classifi er is the radial basis function.
Conclusions: The experimental results show that the proposed algorithm can decrease the dimensionality of the dataset, determine the most informative gene subset, and improve classifi cation accuracy using the optimal parameters of the classifi er with no user interface.
Keywords :
Cancer classifi cation , fuzzy support vector machine , gene expression , genetic algorithm , particle swarm optimization algorithm
Journal title :
Astroparticle Physics