Title :
Improving Prediction of Residue Solvent Accessibility with SVR and Multiple Sequence Alignment Profile
Author :
Wenlong Xu ; Ao Li ; Xian Wang ; Zhaohui Jiang ; Huanqing Feng
Author_Institution :
Dept. of Electron. Sci. & Technol., Univ. of Sci. & Technol. of China, Hefei
Abstract :
A new method based on support vector regression (SVR) has been introduced to predict the relative solvent accessibility (RSA) of residues from protein primary sequences, which uses the local information of protein primary sequences as input. Different to most previous methods which are designed to predict the exposure state (exposed/buried, exposed/intermediate/buried, etc) of a particular residue according to its relative solvent accessibility, this method predicts the real value of RSA, by which more information about residue location in protein 3D structure can be retained than state assignment. Measurements for prediction performance, i.e. the mean absolute error (MAE) and correlation coefficient (CC), were compared with a former RVP-Net method, which was based on a multilayer feed-forward neural network. With 3-fold cross validation test, the MAE and CC of the SVR method for all data sets were consistently better than those obtained by the RVP-Net. In addition, we used the profile of multiple sequence alignment as input and achieved a significant improvement in prediction performance comparing with using only single sequence information. The final prediction result on the CB-513 data set by our method was 16.8% for MAE and 0.562 for CC. The results demonstrate that SVR is a useful tool for protein sequence analyses
Keywords :
biology computing; feedforward neural nets; molecular biophysics; molecular configurations; proteins; regression analysis; support vector machines; RVP-Net method; SVR; correlation coefficient; mean absolute error; multilayer feedforward neural network; multiple sequence alignment profile; protein 3D structure; protein primary sequences; protein sequence analyses; residue location; residue solvent accessibility prediction; support vector regression; Bioinformatics; Design methodology; Feedforward neural networks; Feedforward systems; Machine learning; Multi-layer neural network; Neural networks; Protein sequence; Solvents; Testing; Bioinformatics; Machine learning; Protein structure prediction; Relative solvent accessibility; Support vector regression;
Conference_Titel :
Engineering in Medicine and Biology Society, 2005. IEEE-EMBS 2005. 27th Annual International Conference of the
Conference_Location :
Shanghai
Print_ISBN :
0-7803-8741-4
DOI :
10.1109/IEMBS.2005.1617000