• DocumentCode
    167265
  • Title

    BLKnn: A K-nearest neighbors method for predicting bioluminescent proteins

  • Author

    Jing Hu

  • Author_Institution
    Dept. of Math. & Comput. Sci., Franklin & Marshall Coll., Lancaster, PA, USA
  • fYear
    2014
  • fDate
    21-24 May 2014
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    Bioluminescence is a chemical process in which light is produced and emitted by a living organism. Recent biotechnological applications of bioluminescence include using of bioluminescent proteins in gene expression analysis, bioluminescent imaging, study of protein-protein interaction and disease progression, drug discovery, toxicity determination, etc. Therefore, it is of great medical and commercial significances to identify bioluminescent proteins accurately and efficiently. In this study, we present BLKnn, a K-nearest neighbors method that can predict bioluminescent proteins. This method is based on the bit-score weighted Euclidean distance, which is calculated from compositions of selected amino acids and pseudo-amino acids. On a balanced training dataset, BLKnn achieved 74.9% sensitivity, 95.5% specificity, 85.2% accuracy, and 0.919 AUC (area under the ROC curve) by 10-fold cross-validation. When tested on a much bigger independent test dataset, the method also achieved a consistent performance of 88.0% overall accuracy and 0.989 AUC. Comparisons showed that BLKnn outperformed previously published methods. The method is available at https://edisk.fandm.edu/jing.hu/blknn/blknn.html.
  • Keywords
    biochemistry; biology computing; bioluminescence; genetics; molecular biophysics; proteins; sensitivity analysis; 0.919 AUC; 10-fold cross-validation; BLKnn; ROC curve; balanced training dataset; bioluminescent imaging; bioluminescent proteins; biotechnological applications; bit-score weighted Euclidean distance; chemical process; commercial significances; disease progression; drug discovery; gene expression analysis; independent test dataset; k-nearest neighbors method; living organism; medical significances; protein-protein interaction; pseudoamino acids; selected amino acid compositions; sensitivity; toxicity determination; Accuracy; Amino acids; Bioluminescence; Euclidean distance; Proteins; Sensitivity; Training; K-nearest neighbors method; bit-score weighted Euclidean distance; pseudo-amino acid composition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Intelligence in Bioinformatics and Computational Biology, 2014 IEEE Conference on
  • Conference_Location
    Honolulu, HI
  • Type

    conf

  • DOI
    10.1109/CIBCB.2014.6845503
  • Filename
    6845503