DocumentCode
586364
Title
Annotation guided local similarity search in multiple sequences and its application to mitochondrial genomes
Author
Moritz, Ruby L. V. ; Bernt, Matthias ; Middendorf, Martin
Author_Institution
Parallel Comput. & Complex Syst. Group, Univ. Leipzig, Leipzig, Germany
fYear
2012
fDate
11-13 Nov. 2012
Firstpage
157
Lastpage
162
Abstract
Given a set of nucleotide sequences and corresponding gene annotations which might contain a moderate number of errors we consider the problem to identify common substrings occurring in homologous genes and to identify putative errors in the given annotations. The problem is solved by identifying nodes in a suffix tree that contains all substrings occurring in the data set. Due to the large size of the targeted data set our approach employs a truncated version of suffix trees. The approach is successfully applied to the mitochondrial nucleotide sequences and the corresponding annotations available in RefSeq for more than 2000 metazoan species. We demonstrate that the approach finds appropriate subsequences despite of errors in the given annotations. Moreover, it identifies several hundred errors within the RefSeq annotations.
Keywords
DNA; bioinformatics; data analysis; genomics; search problems; sequences; RefSeq annotations; annotation guided local similarity search; data analysis methods; gene annotations; homologous genes; mitochondrial genomes; mitochondrial nucleotide sequences; multiple sequences; nucleotide sequences; suffix tree; Encoding; Genomics; Indexes; Memory management; Proteins; Tree data structures; Vegetation;
fLanguage
English
Publisher
ieee
Conference_Titel
Bioinformatics & Bioengineering (BIBE), 2012 IEEE 12th International Conference on
Conference_Location
Larnaca
Print_ISBN
978-1-4673-4357-2
Type
conf
DOI
10.1109/BIBE.2012.6399666
Filename
6399666
Link To Document