• DocumentCode
    2565440
  • Title

    Feature selection and granular SVM classification for protein arginine methylation identification

  • Author

    Ding, Zejin ; Zhang, Yan-Qing ; Zheng, Yujun George

  • Author_Institution
    Dept. of Comput. Sci., Georgia State Univ., Atlanta, GA, USA
  • fYear
    2009
  • fDate
    11-14 Oct. 2009
  • Firstpage
    2979
  • Lastpage
    2983
  • Abstract
    Protein methylation modification has been discovered for half a century but still far less been studied than other modifications. Computational analysis is recently introduced to discover other unknown methylation sites based on few known ones. To effectively predict possible methylation, sophisticated classification strategy should be well devised. In this paper, we first extracted informative features from methylated fragments in many protein sequences, including the physicochemical properties, secondary structure information, evolutionary profiles, and solvent accessibility of surrounding residues. Then, an efficient feature selection method (mRMR) is applied to eliminate redundant features but keep important ones. Since methylated residues are far less than non-methylated, the collected data is relatively imbalanced. Thus, we propose to use the granular support vector machine (GSVM) which is specially designed for imbalanced classification problems. A 7-fold cross validation shows that our strategy generates comparable predication accuracy with many current methods or even better. Meanwhile, our method provides insights to identify the underlying mechanisms of protein methylation.
  • Keywords
    biology computing; feature extraction; molecular biophysics; pattern classification; proteins; support vector machines; 7-fold cross validation; evolutionary profile; feature selection; granular SVM classification; granular support vector machine; informative feature extraction; methylated fragment; methylated residue; physicochemical property; protein arginine methylation identification; protein methylation modification; protein sequence; solvent accessibility; Amino acids; Classification tree analysis; Data mining; Feature extraction; Proteins; Sequences; Solvents; Support vector machine classification; Support vector machines; USA Councils; Feature Selction 1; Granular Support Vector Machines (GSVM); Imbalanced Data Mining; Methylation Prediction; Protein Methylation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Systems, Man and Cybernetics, 2009. SMC 2009. IEEE International Conference on
  • Conference_Location
    San Antonio, TX
  • ISSN
    1062-922X
  • Print_ISBN
    978-1-4244-2793-2
  • Electronic_ISBN
    1062-922X
  • Type

    conf

  • DOI
    10.1109/ICSMC.2009.5345973
  • Filename
    5345973