DocumentCode :
3582639
Title :
Improved protein disorder predictor by smoothing output
Author :
Iqbal, Sumaiya ; Islam, Md Nasrul ; Hoque, Md Tamjidul
Author_Institution :
Comput. Sci., Univ. of New Orleans, New Orleans, LA, USA
fYear :
2014
Firstpage :
110
Lastpage :
115
Abstract :
Intrinsically disorder regions (IDRs) or, proteins (IDPs) are associated with important biological functions, while lacking stable structure in their native state. The phenomena of disordered proteins or residues are abundant in nature and are extensively involved in critical human diseases and hence impacting drug discovery. Thus, the study using disorder prediction is becoming crucial in the proteomic research. The large scale growth of genome database demands high performance computational methods for identification of protein disorder. We developed a canonical support vector machine based disorder predictor, DisPredict by integrating RBF kernel. It employs novel feature set for accurate characterization of disorder which outperformed two leading predictors: the neural network based SPINE-D and Meta predictor MFDp based on ten-fold cross validation. We propose a post processing of probabilities to further improve the accuracy, named DisPredict1.1 which yields outstanding performance further both in binary annotation and real valued probability prediction per residue in both short and long disordered regions. It provides highest Mathews Correlation Coefficient (MCC), competitive Area Under receiver operating characteristic Curve (AUC) and lowest Mean Absolute Error (MAE) when compared with twenty existing predictors of several kinds on independent benchmark dataset. DisPredict is available online.
Keywords :
bioinformatics; probability; proteins; proteomics; radial basis function networks; support vector machines; AUC; DisPredict1.1; IDP; IDR; MAE; MCC; Mathews correlation coefficient; RBF kernel; accuracy improvement; area-under receiver operating characteristic curve; binary annotation; biological functions; canonical support vector machine based disorder predictor; critical human diseases; disorder characterization; disordered residues; drug discovery; feature set; genome database; high-performance computational methods; improved protein disorder predictor; intrinsically disorder proteins; intrinsically disorder regions; long-disordered regions; mean absolute error; output smoothing; probability postprocessing; proteomic research; real valued probability prediction; short-disordered regions; Accuracy; Computers; Databases; Kernel; Proteins; Support vector machines; Training; Bigram; Cross validation; Intrinsic disorder; Monogram; Pattern recognition; Probability smoothing; Protein prediction; RBF kernel; SVM;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer and Information Technology (ICCIT), 2014 17th International Conference on
Type :
conf
DOI :
10.1109/ICCITechn.2014.7073113
Filename :
7073113
Link To Document :
بازگشت