DocumentCode :
844163
Title :
Hierarchical learning architecture with automatic feature selection for multiclass protein fold classification
Author :
Huang, Chuen-Der ; Lin, Chin-Teng ; Pal, Nikhil Ranjan
Volume :
2
Issue :
4
fYear :
2003
Firstpage :
221
Lastpage :
232
Abstract :
The structure classification of proteins plays a very important role in bioinformatics, since the relationships and characteristics among those known proteins can be exploited to predict the structure of new proteins. The success of a classification system depends heavily on two things: the tools being used and the features considered. For the bioinformatics applications, the role of appropriate features has not been paid adequate importance. In this investigation we use three novel ideas for multiclass protein fold classification. First, we use the gating neural network, where each input node is associated with a gate. This network can select important features in an online manner when the learning goes on. At the beginning of the training, all gates are almost closed, i.e., no feature is allowed to enter the network. Through the training, gates corresponding to good features are completely opened while gates corresponding to bad features are closed more tightly, and some gates may be partially open. The second novel idea is to use a hierarchical learning architecture (HLA). The classifier in the first level of HLA classifies the protein features into four major classes: all alpha, all beta, alpha + beta, and alpha/beta. And in the next level we have another set of classifiers, which further classifies the protein features into 27 folds. The third novel idea is to induce the indirect coding features from the amino-acid composition sequence of proteins based on the N-gram concept. This provides us with more representative and discriminative new local features of protein sequences for multiclass protein fold classification. The proposed HLA with new indirect coding features increases the protein fold classification accuracy by about 12%. Moreover, the gating neural network is found to reduce the number of features drastically. Using only half of the original features selected by the gating neural network can reach comparable test accuracy as that using all the origin- - al features. The gating mechanism also helps us to get a better insight into the folding process of proteins. For example, tracking the evolution of different gates we can find which characteristics (features) of the data are more important for the folding process. And, of course, it also reduces the computation time.
Keywords :
biology computing; feature extraction; image classification; image sequences; learning (artificial intelligence); molecular biophysics; neural nets; proteins; amino-acid protein composition sequence; automatic feature selection; bioinformatics; computation time; feature extraction; gating neural network; hierarchical learning architecture; indirect coding features; input node; multiclass protein fold classification; protein fold classification accuracy; proteins; structure classification; Bioinformatics; Biological neural networks; Control engineering; Data mining; Databases; Neural networks; Protein engineering; Support vector machine classification; Support vector machines; Testing; Algorithms; Amino Acid Sequence; Artificial Intelligence; Cluster Analysis; Computing Methodologies; Molecular Sequence Data; Neural Networks (Computer); Pattern Recognition, Automated; Protein Conformation; Protein Folding; Proteins; Reproducibility of Results; Robotics; Sensitivity and Specificity; Sequence Alignment; Sequence Analysis, Protein; Sequence Homology, Amino Acid;
fLanguage :
English
Journal_Title :
NanoBioscience, IEEE Transactions on
Publisher :
ieee
ISSN :
1536-1241
Type :
jour
DOI :
10.1109/TNB.2003.820284
Filename :
1254525
Link To Document :
بازگشت