DocumentCode
167265
Title
BLKnn: A K-nearest neighbors method for predicting bioluminescent proteins
Author
Jing Hu
Author_Institution
Dept. of Math. & Comput. Sci., Franklin & Marshall Coll., Lancaster, PA, USA
fYear
2014
fDate
21-24 May 2014
Firstpage
1
Lastpage
6
Abstract
Bioluminescence is a chemical process in which light is produced and emitted by a living organism. Recent biotechnological applications of bioluminescence include using of bioluminescent proteins in gene expression analysis, bioluminescent imaging, study of protein-protein interaction and disease progression, drug discovery, toxicity determination, etc. Therefore, it is of great medical and commercial significances to identify bioluminescent proteins accurately and efficiently. In this study, we present BLKnn, a K-nearest neighbors method that can predict bioluminescent proteins. This method is based on the bit-score weighted Euclidean distance, which is calculated from compositions of selected amino acids and pseudo-amino acids. On a balanced training dataset, BLKnn achieved 74.9% sensitivity, 95.5% specificity, 85.2% accuracy, and 0.919 AUC (area under the ROC curve) by 10-fold cross-validation. When tested on a much bigger independent test dataset, the method also achieved a consistent performance of 88.0% overall accuracy and 0.989 AUC. Comparisons showed that BLKnn outperformed previously published methods. The method is available at https://edisk.fandm.edu/jing.hu/blknn/blknn.html.
Keywords
biochemistry; biology computing; bioluminescence; genetics; molecular biophysics; proteins; sensitivity analysis; 0.919 AUC; 10-fold cross-validation; BLKnn; ROC curve; balanced training dataset; bioluminescent imaging; bioluminescent proteins; biotechnological applications; bit-score weighted Euclidean distance; chemical process; commercial significances; disease progression; drug discovery; gene expression analysis; independent test dataset; k-nearest neighbors method; living organism; medical significances; protein-protein interaction; pseudoamino acids; selected amino acid compositions; sensitivity; toxicity determination; Accuracy; Amino acids; Bioluminescence; Euclidean distance; Proteins; Sensitivity; Training; K-nearest neighbors method; bit-score weighted Euclidean distance; pseudo-amino acid composition;
fLanguage
English
Publisher
ieee
Conference_Titel
Computational Intelligence in Bioinformatics and Computational Biology, 2014 IEEE Conference on
Conference_Location
Honolulu, HI
Type
conf
DOI
10.1109/CIBCB.2014.6845503
Filename
6845503
Link To Document