• DocumentCode
    1514339
  • Title

    Protein Classification with Extended-Sequence Coding by Sliding Window

  • Author

    De Souza Rodrigues, Thiago ; Cardoso, Fernanda Caldas ; Teixeira, Santuza Maria Ribeiro ; Oliveira, Sérgio Costa ; Braga, Antônio Pádua

  • Author_Institution
    Comput. Dept., Fed. Center of Technol. Educ. of Minas Gerais, Belo Horizonte, Brazil
  • Volume
    8
  • Issue
    6
  • fYear
    2011
  • Firstpage
    1721
  • Lastpage
    1726
  • Abstract
    A large number of unclassified sequences is still found in public databases, which suggests that there is still need for new investigations in the area. In this contribution, we present a methodology based on Artificial Neural Networks for protein functional classification. A new protein coding scheme, called here Extended-Sequence Coding by Sliding Windows, is presented with the goal of overcoming some of the difficulties of the well method Sequence Coding by Sliding Window. The new protein coding scheme uses more than one sliding window length with a weight factor that is proportional to the window length, avoiding the ambiguity problem without ignoring the identity of small subsequences Accuracy for Sequence Coding by Sliding Windows ranged from 60.1 to 77.7 percent for the first bacterium protein set and from 61.9 to 76.7 percent for the second one, whereas the accuracy for the proposed Extended-Sequence Coding by Sliding Windows scheme ranged from 70.7 to 97.1 percent for the first bacterium protein set and from 61.1 to 93.3 percent for the second one. Additionally, protein sequences classified inconsistently by the Artificial Neural Networks were analyzed by CD-Search revealing that there are some disagreement in public repositories, calling the attention for the relevant issue of error propagation in annotated databases due the incorrect transferred annotations.
  • Keywords
    biology computing; molecular biophysics; molecular configurations; neural nets; proteins; CD-search; artificial neural networks; bacterium protein set; error propagation; extended-sequence coding; protein coding scheme; protein functional classification; protein sequences; sliding window; Amino acids; Artificial neural networks; Computational biology; Databases; Encoding; Error analysis; Proteins; Artificial neural network; protein coding; protein functional classification; protein functional classification error.; Computational Biology; Databases, Protein; Molecular Sequence Annotation; Neural Networks (Computer); Proteins;
  • fLanguage
    English
  • Journal_Title
    Computational Biology and Bioinformatics, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    1545-5963
  • Type

    jour

  • DOI
    10.1109/TCBB.2011.78
  • Filename
    5765931