Title :
Prediction of Protein Folds: Extraction of New Features, Dimensionality Reduction, and Fusion of Heterogeneous Classifiers
Author :
Ghanty, Pradip ; Pal, Nikhil R.
Author_Institution :
Praxis Softek Solutions Pvt. Ltd., Kolkata
fDate :
3/1/2009 12:00:00 AM
Abstract :
Here, we consider a two-level (four classes in level 1 and 27 folds in level 2) protein fold determination problem. We propose several new features and use some existing features including frequencies of adjacent residues, frequencies of residues separated by one residue, and triplets (trio) of amino acid compositions (AACs). The dimensionality of the trio AAC features is drastically reduced using a neural network based novel online feature selection scheme. We also propose new sets of features called trio potential computed using the hydrophobicity values considering only the selected trio AACs. We demonstrate that the proposed features including the selected trio AACs and trio potential have good discriminating power for protein fold determination. As machine learning tools, we use multilayer perceptron network, radial basis function network, and support vector machine. To improve the recognition accuracies further, we use fusion of different classifiers using the same set of features as well as different sets of features. The effectiveness of our schemes is demonstrated with a benchmark structural classification of proteins (SCOP) dataset. Our system achieves 84.9% test accuracy for the SCOP structural class (four classes) determination and 68.6% test accuracy for the fold recognition with 27 folds. In order to demonstrate the consistency of feature sets and fusion schemes, we also perform the fivefold cross-validation experiments.
Keywords :
biology computing; feature extraction; hydrophobicity; learning (artificial intelligence); molecular biophysics; multilayer perceptrons; pattern classification; proteins; radial basis function networks; support vector machines; SCOP dataset; amino acid composition; feature extraction; heterogeneous classifier fusion; hydrophobicity value; machine learning tool; multilayer perceptron network; neural network; online feature selection scheme; protein fold prediction; protein structural classification; radial basis function network; support vector machine; trio AAC feature dimensionality reduction; Fusion; majority voting; multilayer perceptron (MLP); online feature selection (OFS); protein structure prediction; radial basis function (RBF); structural classification of proteins (SCOP); support vector machine (SVM); triplets of amino acid composition (trio AAC); Algorithms; Amino Acid Sequence; Artificial Intelligence; Molecular Sequence Data; Pattern Recognition, Automated; Protein Folding; Proteins; Sequence Alignment; Sequence Analysis, Protein; Sequence Homology, Amino Acid;
Journal_Title :
NanoBioscience, IEEE Transactions on
DOI :
10.1109/TNB.2009.2016488