• DocumentCode
    2266718
  • Title

    A weighted k-nearest neighbor method for gene ontology based protein function prediction

  • Author

    Kharsikar, Saket ; Mugler, Dale ; Sheffer, Daniel ; Moore, Francisco ; Duan, Zhong-Hui

  • Author_Institution
    Univ. of Akron, Akron
  • fYear
    2007
  • fDate
    13-15 Aug. 2007
  • Firstpage
    25
  • Lastpage
    31
  • Abstract
    Numerous genome projects have produced a large and ever increasing amount of genomic sequence data. However, the biological functions of many proteins encoded by the sequences remain unknown. Protein function annotation and prediction become an essential and challenging task of post-genomic research. In this paper, we present an automated protein function prediction system based on a set of proteins of known biological functions. The functions of the proteins are characterized with gene ontology (GO) annotations. The prediction system uses a novel measure to calculate the pair-wise overall similarity between protein sequences. The protein function prediction is performed based on the GO annotations of similar sequences using a weighted k-nearest neighbor method. We show the prediction accuracies obtained using the model organism yeast (Sacchyromyces cerevisiae). The results indicate that the weighted k-nearest neighbor method significantly outperforms the regular k-nearest neighbor method for protein molecular function prediction.
  • Keywords
    biology computing; genetics; proteins; biological functions; gene ontology; genome projects; genomic sequence data; k-nearest neighbor method; protein function annotation; protein function prediction; protein molecular function prediction; weighted k-nearest neighbor; Accuracy; Bioinformatics; Biological information theory; Biological system modeling; Biomedical computing; Genomics; Ontologies; Organisms; Protein engineering; Sequences;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer and Computational Sciences, 2007. IMSCCS 2007. Second International Multi-Symposiums on
  • Conference_Location
    Iowa City, IA
  • Print_ISBN
    978-0-7695-3039-0
  • Type

    conf

  • DOI
    10.1109/IMSCCS.2007.61
  • Filename
    4392576