• DocumentCode
    1188843
  • Title

    Prediction of Protein Folds: Extraction of New Features, Dimensionality Reduction, and Fusion of Heterogeneous Classifiers

  • Author

    Ghanty, Pradip ; Pal, Nikhil R.

  • Author_Institution
    Praxis Softek Solutions Pvt. Ltd., Kolkata
  • Volume
    8
  • Issue
    1
  • fYear
    2009
  • fDate
    3/1/2009 12:00:00 AM
  • Firstpage
    100
  • Lastpage
    110
  • Abstract
    Here, we consider a two-level (four classes in level 1 and 27 folds in level 2) protein fold determination problem. We propose several new features and use some existing features including frequencies of adjacent residues, frequencies of residues separated by one residue, and triplets (trio) of amino acid compositions (AACs). The dimensionality of the trio AAC features is drastically reduced using a neural network based novel online feature selection scheme. We also propose new sets of features called trio potential computed using the hydrophobicity values considering only the selected trio AACs. We demonstrate that the proposed features including the selected trio AACs and trio potential have good discriminating power for protein fold determination. As machine learning tools, we use multilayer perceptron network, radial basis function network, and support vector machine. To improve the recognition accuracies further, we use fusion of different classifiers using the same set of features as well as different sets of features. The effectiveness of our schemes is demonstrated with a benchmark structural classification of proteins (SCOP) dataset. Our system achieves 84.9% test accuracy for the SCOP structural class (four classes) determination and 68.6% test accuracy for the fold recognition with 27 folds. In order to demonstrate the consistency of feature sets and fusion schemes, we also perform the fivefold cross-validation experiments.
  • Keywords
    biology computing; feature extraction; hydrophobicity; learning (artificial intelligence); molecular biophysics; multilayer perceptrons; pattern classification; proteins; radial basis function networks; support vector machines; SCOP dataset; amino acid composition; feature extraction; heterogeneous classifier fusion; hydrophobicity value; machine learning tool; multilayer perceptron network; neural network; online feature selection scheme; protein fold prediction; protein structural classification; radial basis function network; support vector machine; trio AAC feature dimensionality reduction; Fusion; majority voting; multilayer perceptron (MLP); online feature selection (OFS); protein structure prediction; radial basis function (RBF); structural classification of proteins (SCOP); support vector machine (SVM); triplets of amino acid composition (trio AAC); Algorithms; Amino Acid Sequence; Artificial Intelligence; Molecular Sequence Data; Pattern Recognition, Automated; Protein Folding; Proteins; Sequence Alignment; Sequence Analysis, Protein; Sequence Homology, Amino Acid;
  • fLanguage
    English
  • Journal_Title
    NanoBioscience, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1536-1241
  • Type

    jour

  • DOI
    10.1109/TNB.2009.2016488
  • Filename
    4799190