DocumentCode :
1514339
Title :
Protein Classification with Extended-Sequence Coding by Sliding Window
Author :
De Souza Rodrigues, Thiago ; Cardoso, Fernanda Caldas ; Teixeira, Santuza Maria Ribeiro ; Oliveira, Sérgio Costa ; Braga, Antônio Pádua
Author_Institution :
Comput. Dept., Fed. Center of Technol. Educ. of Minas Gerais, Belo Horizonte, Brazil
Volume :
8
Issue :
6
fYear :
2011
Firstpage :
1721
Lastpage :
1726
Abstract :
A large number of unclassified sequences is still found in public databases, which suggests that there is still need for new investigations in the area. In this contribution, we present a methodology based on Artificial Neural Networks for protein functional classification. A new protein coding scheme, called here Extended-Sequence Coding by Sliding Windows, is presented with the goal of overcoming some of the difficulties of the well method Sequence Coding by Sliding Window. The new protein coding scheme uses more than one sliding window length with a weight factor that is proportional to the window length, avoiding the ambiguity problem without ignoring the identity of small subsequences Accuracy for Sequence Coding by Sliding Windows ranged from 60.1 to 77.7 percent for the first bacterium protein set and from 61.9 to 76.7 percent for the second one, whereas the accuracy for the proposed Extended-Sequence Coding by Sliding Windows scheme ranged from 70.7 to 97.1 percent for the first bacterium protein set and from 61.1 to 93.3 percent for the second one. Additionally, protein sequences classified inconsistently by the Artificial Neural Networks were analyzed by CD-Search revealing that there are some disagreement in public repositories, calling the attention for the relevant issue of error propagation in annotated databases due the incorrect transferred annotations.
Keywords :
biology computing; molecular biophysics; molecular configurations; neural nets; proteins; CD-search; artificial neural networks; bacterium protein set; error propagation; extended-sequence coding; protein coding scheme; protein functional classification; protein sequences; sliding window; Amino acids; Artificial neural networks; Computational biology; Databases; Encoding; Error analysis; Proteins; Artificial neural network; protein coding; protein functional classification; protein functional classification error.; Computational Biology; Databases, Protein; Molecular Sequence Annotation; Neural Networks (Computer); Proteins;
fLanguage :
English
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
Publisher :
ieee
ISSN :
1545-5963
Type :
jour
DOI :
10.1109/TCBB.2011.78
Filename :
5765931
Link To Document :
بازگشت