DocumentCode
3431003
Title
A nearest neighbor method for predicting solenoid proteins
Author
Cheng, Wen ; Sanjaka, Malinda ; Yan, Changhui
Author_Institution
Department of Computer Science, North Dakota State University, Fargo, USA
fYear
2012
fDate
11-13 Aug. 2012
Firstpage
68
Lastpage
71
Abstract
Solenoid proteins are proteins with repeats of 5 to 40 residues in length. Identifying solenoid proteins presents a big challenge because the repeat sequences are highly degenerated. Here, we present a nearest neighbor (NN) method for predicting solenoid proteins based on residue composition. The distance between proteins is calculated as a weighted Euclidean distance defined by the residue composition vector. The NN method predicts solenoid proteins with an overall accuracy of 95.5% with 94.3% sensitivity and 96% specificity, outperforming other methods in direct comparisons. We also demonstrate that combining the NN method with HHrepID and Trust, which are previously published methods for addressing the same problem, can dramatically reduce the false positive rates in predicting repeats.
Keywords
Accuracy; Databases; Proteins; Solenoids; nearest neighbor; prediction; solenoids; weighted Eclidean distance;
fLanguage
English
Publisher
ieee
Conference_Titel
Granular Computing (GrC), 2012 IEEE International Conference on
Conference_Location
Hangzhou, China
Print_ISBN
978-1-4673-2310-9
Type
conf
DOI
10.1109/GrC.2012.6468600
Filename
6468600
Link To Document