Title :
BLKnn: A K-nearest neighbors method for predicting bioluminescent proteins
Author_Institution :
Dept. of Math. & Comput. Sci., Franklin & Marshall Coll., Lancaster, PA, USA
Abstract :
Bioluminescence is a chemical process in which light is produced and emitted by a living organism. Recent biotechnological applications of bioluminescence include using of bioluminescent proteins in gene expression analysis, bioluminescent imaging, study of protein-protein interaction and disease progression, drug discovery, toxicity determination, etc. Therefore, it is of great medical and commercial significances to identify bioluminescent proteins accurately and efficiently. In this study, we present BLKnn, a K-nearest neighbors method that can predict bioluminescent proteins. This method is based on the bit-score weighted Euclidean distance, which is calculated from compositions of selected amino acids and pseudo-amino acids. On a balanced training dataset, BLKnn achieved 74.9% sensitivity, 95.5% specificity, 85.2% accuracy, and 0.919 AUC (area under the ROC curve) by 10-fold cross-validation. When tested on a much bigger independent test dataset, the method also achieved a consistent performance of 88.0% overall accuracy and 0.989 AUC. Comparisons showed that BLKnn outperformed previously published methods. The method is available at https://edisk.fandm.edu/jing.hu/blknn/blknn.html.
Keywords :
biochemistry; biology computing; bioluminescence; genetics; molecular biophysics; proteins; sensitivity analysis; 0.919 AUC; 10-fold cross-validation; BLKnn; ROC curve; balanced training dataset; bioluminescent imaging; bioluminescent proteins; biotechnological applications; bit-score weighted Euclidean distance; chemical process; commercial significances; disease progression; drug discovery; gene expression analysis; independent test dataset; k-nearest neighbors method; living organism; medical significances; protein-protein interaction; pseudoamino acids; selected amino acid compositions; sensitivity; toxicity determination; Accuracy; Amino acids; Bioluminescence; Euclidean distance; Proteins; Sensitivity; Training; K-nearest neighbors method; bit-score weighted Euclidean distance; pseudo-amino acid composition;
Conference_Titel :
Computational Intelligence in Bioinformatics and Computational Biology, 2014 IEEE Conference on
Conference_Location :
Honolulu, HI
DOI :
10.1109/CIBCB.2014.6845503