DocumentCode :
1929491
Title :
PCA feature extraction for protein structure prediction
Author :
Melo, Jeane C B ; Cavalcanti, George D C ; Guimarães, Katia S.
Author_Institution :
Phys. & Math. Dept., Fed. Rural Univ. of Pernambuco, Recife, Brazil
Volume :
4
fYear :
2003
fDate :
20-24 July 2003
Firstpage :
2952
Abstract :
The PCA linear transformation method is used for feature extraction to the secondary structure prediction problem. The method of dimensionality reduction is applied on PSI-Blast profiles built on NCBI´s Nonredundant Protein database. Different numbers of components extracted are used as input to three artificial neural networks with 30, 35 or 40 nodes in the hidden layer. Those classifiers are trained with the RPROP algorithm. To estimate the accuracy of the predictor the sevenfold cross-validation method is applied to CB396, a database used previously to evaluate the performance of several predictors. Aiming to increase the efficiency of the predictor presented here, the outputs of the classifiers are combined through five simple rules: product, average, voting, minimum and maximum. This original application for the PCA method derives relevant results. Even with a drastic reduction from 260 to 80 components, the accuracy obtained is at least 1% superior to the best one published for another predictor, the CONSENSUS, a combination of four other predictors. With a reduction from 260 to 180 components the performance is even better, achieving an Q3 accuracy of 74.5%. The results flag the PCA as a promising method for feature extraction in the secondary structure prediction problem.
Keywords :
biology computing; feature extraction; neural nets; pattern classification; principal component analysis; proteins; CB396 database; CONSENSUS; Nonredundant Protein database; PCA feature extraction; PCA linear transformation method; PSI-Blast profiles; RPROP algorithm; artificial neural networks; protein structure prediction; secondary structure prediction problem; Artificial neural networks; Feature extraction; Informatics; Mathematics; Neural networks; Physics; Principal component analysis; Proteins; Sequences; Spatial databases;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Neural Networks, 2003. Proceedings of the International Joint Conference on
ISSN :
1098-7576
Print_ISBN :
0-7803-7898-9
Type :
conf
DOI :
10.1109/IJCNN.2003.1224040
Filename :
1224040
Link To Document :
بازگشت