DocumentCode :
3490872
Title :
Predicting functional impact of single amino acid polymorphisms by integrating sequence and structural features
Author :
Wang, Mingjun ; Shen, Hong-Bin ; Akutsu, Tatsuya ; Song, Jiangning
Author_Institution :
State Eng. Lab. for Ind. Enzymes, Chinese Acad. of Sci., Tianjin, China
fYear :
2011
fDate :
2-4 Sept. 2011
Firstpage :
18
Lastpage :
26
Abstract :
Single amino acid polymorphisms (SAPs) are the most abundant form of known genetic variations associated with human diseases. It is of great interest to study the sequence-structure-function relationship underlying SAPs. In this work, we collected the human variant data from three databases and divided them into three categories, i.e. cancer somatic mutations (CSM), Mendelian disease-related variant (SVD) and neutral polymorphisms (SVP). We built support vector machine (SVM) classifiers to predict these three classes of SAPs, using the optimal features selected by a random forest algorithm. Consequently, 280 sequence-derived and structural features were initially extracted from the curated datasets from which 18 optimal candidate features were further selected by random forest. Furthermore, we performed a stepwise feature selection to select characteristic sequence and structural features that are important for predicting each SAPs class. As a result, our predictors achieved a prediction accuracy (ACC) of 84.97, 96.93, 86.98 and 88.24%, for the three classes, CSM, SVD and SVP, respectively. Performance comparison with other previously developed tools such as SIFT, SNAP and Polyphen2 indicates that our method provides a favorable performance with higher Sensitivity scores and Matthew´s correlation coefficients (MCC). These results indicate that the prediction performance of SAPs classifiers can be effectively improved by feature selection. Moreover, division of SAPs into three respective categories and construction of accurate SVM-based classifiers for each class provides a practically useful way for investigating the difference between Mendelian disease-related variants and cancer somatic mutations.
Keywords :
biological techniques; biology computing; cancer; genetics; molecular biophysics; molecular configurations; organic compounds; support vector machines; Mendelian disease-related variant; SAPs class; SVM-based classifiers; cancer somatic mutations; genetic variations; human variant data; neutral polymorphisms; random forest algorithm; single amino acid polymorphisms; stepwise feature selection; support vector machine; Accuracy; Amino acids; Databases; Feature extraction; Humans; Proteins; Support vector machines; feature selection; non-synonymous SNPs; random forest; single amino acid polymorphisms (SAPs); support vector machine;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Systems Biology (ISB), 2011 IEEE International Conference on
Conference_Location :
Zhuhai
Print_ISBN :
978-1-4577-1661-4
Electronic_ISBN :
978-1-4577-1665-2
Type :
conf
DOI :
10.1109/ISB.2011.6033115
Filename :
6033115
Link To Document :
بازگشت