DocumentCode
2565440
Title
Feature selection and granular SVM classification for protein arginine methylation identification
Author
Ding, Zejin ; Zhang, Yan-Qing ; Zheng, Yujun George
Author_Institution
Dept. of Comput. Sci., Georgia State Univ., Atlanta, GA, USA
fYear
2009
fDate
11-14 Oct. 2009
Firstpage
2979
Lastpage
2983
Abstract
Protein methylation modification has been discovered for half a century but still far less been studied than other modifications. Computational analysis is recently introduced to discover other unknown methylation sites based on few known ones. To effectively predict possible methylation, sophisticated classification strategy should be well devised. In this paper, we first extracted informative features from methylated fragments in many protein sequences, including the physicochemical properties, secondary structure information, evolutionary profiles, and solvent accessibility of surrounding residues. Then, an efficient feature selection method (mRMR) is applied to eliminate redundant features but keep important ones. Since methylated residues are far less than non-methylated, the collected data is relatively imbalanced. Thus, we propose to use the granular support vector machine (GSVM) which is specially designed for imbalanced classification problems. A 7-fold cross validation shows that our strategy generates comparable predication accuracy with many current methods or even better. Meanwhile, our method provides insights to identify the underlying mechanisms of protein methylation.
Keywords
biology computing; feature extraction; molecular biophysics; pattern classification; proteins; support vector machines; 7-fold cross validation; evolutionary profile; feature selection; granular SVM classification; granular support vector machine; informative feature extraction; methylated fragment; methylated residue; physicochemical property; protein arginine methylation identification; protein methylation modification; protein sequence; solvent accessibility; Amino acids; Classification tree analysis; Data mining; Feature extraction; Proteins; Sequences; Solvents; Support vector machine classification; Support vector machines; USA Councils; Feature Selction 1; Granular Support Vector Machines (GSVM); Imbalanced Data Mining; Methylation Prediction; Protein Methylation;
fLanguage
English
Publisher
ieee
Conference_Titel
Systems, Man and Cybernetics, 2009. SMC 2009. IEEE International Conference on
Conference_Location
San Antonio, TX
ISSN
1062-922X
Print_ISBN
978-1-4244-2793-2
Electronic_ISBN
1062-922X
Type
conf
DOI
10.1109/ICSMC.2009.5345973
Filename
5345973
Link To Document