Title :
Prediction of protein solvent profile using SVR
Author :
Yuan, Zheng ; Bailey, Timothy L.
Author_Institution :
Inst. of Molecular Biosci., Queensland Univ., Brisbane, Qld., Australia
Abstract :
We describe a support vector regression (SVR) approach to predict the accessible surface area (ASA) of a protein from its sequence. Our approach encodes each protein residue as a vector of amino acid propensities derived from a multiple alignment of the subject protein with homologous proteins. The vector consists of the log-likelihood ratios of each of the twenty amino acids in the residue\´s multiple alignment column. Using a reference set of proteins of known structure and, hence, known ASA, we trained an SVR model. Each training sample consists of the fifteen log-likelihood vectors in a window of width fifteen surrounding a residue, along with the "true" ASA value, computed from the known structure. To apply the model to proteins of unknown structure, only the subject protein sequence is required. Our method uses PSI-BLAST to simultaneously determine a set of (putative) homologs and compute the log-likelihood vectors needed to encode the subject protein. We show that this method provides substantially improved accuracy in predicting ASA when compared with an earlier method.
Keywords :
biochemistry; biology computing; genetics; learning (artificial intelligence); macromolecules; molecular biophysics; molecular configurations; organic compounds; proteins; regression analysis; support vector machines; vectors; PSI-BLAST method; accessible proteins surface area; amino acid; homologous proteins; log-likelihood ratios; log-likelihood vectors; multiple alignment column; protein encoding; protein sequence; protein solvent profile prediction; protein structure; support vector regression; Accuracy; Amino acids; Australia; Bioinformatics; Encoding; Prediction methods; Protein sequence; Solvents; Vectors; accessible surface area; protein; solvent profile; support vector regression;
Conference_Titel :
Engineering in Medicine and Biology Society, 2004. IEMBS '04. 26th Annual International Conference of the IEEE
Conference_Location :
San Francisco, CA
Print_ISBN :
0-7803-8439-3
DOI :
10.1109/IEMBS.2004.1403822