DocumentCode :
3136555
Title :
Nonlinear System Identification Provides Insight Into Protein Folding
Author :
Green, James R. ; Korenberg, Michael J.
Author_Institution :
Syst. & Comput. Eng., Carleton Univ., Ottawa, Ont.
fYear :
2006
fDate :
38838
Firstpage :
721
Lastpage :
724
Abstract :
Much like the shape of a tool suggests its intended purpose, knowledge of a protein´s structure can provide substantial insight into its function. Therefore, computational prediction of protein structure based solely on protein sequence data is a challenge of fundamental importance to biomedical research. An effective solution promises significant advances in computational drug discovery and an increased understanding of complex disease processes such as cancer. We have recently developed a novel approach to determining the secondary structure of proteins from protein sequence data which makes use of parallel cascade identification (PCI), a powerful method of nonlinear system identification. PCI is used to create two layers of dynamic nonlinear systems that map divergent evolutionary profile input data into secondary structure assignment output data. PCI prediction accuracy compares well with eleven top contemporary methods over a dataset of new protein structures. Furthermore, PCI is a highly effective means to combine multiple experts achieving the highest observed accuracy over two test datasets and also the lowest rate of occurrence of a particularly detrimental class of errors. One limitation of the PCI classifiers is that approximately 13% of all amino acids cannot readily be assigned predictions due to settling times introduced by the dynamic linear component in each cascade model. In this paper we describe a number of methods designed to overcome this limitation. While zero-padding of the input sequence data proved to be the most effective solution in terms of prediction accuracy, an analysis of causal, anti-causal, and mixed cascades provides interesting insights into the biological mechanism of protein folding
Keywords :
biology computing; cancer; drugs; molecular biophysics; molecular configurations; pattern classification; proteins; biological mechanism; cancer; drug discovery; nonlinear system identification; parallel cascade identification classifier; protein folding; protein sequence; protein structure; Accuracy; Biomedical computing; Cancer; Diseases; Drugs; Nonlinear dynamical systems; Nonlinear systems; Protein engineering; Protein sequence; Shape; Protein secondary structure prediction; bioinformatics; nonlinear system identification; parallel cascade identification;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Electrical and Computer Engineering, 2006. CCECE '06. Canadian Conference on
Conference_Location :
Ottawa, Ont.
Print_ISBN :
1-4244-0038-4
Electronic_ISBN :
1-4244-0038-4
Type :
conf
DOI :
10.1109/CCECE.2006.277670
Filename :
4054671
Link To Document :
بازگشت