Title :
Annotation guided local similarity search in multiple sequences and its application to mitochondrial genomes
Author :
Moritz, Ruby L. V. ; Bernt, Matthias ; Middendorf, Martin
Author_Institution :
Parallel Comput. & Complex Syst. Group, Univ. Leipzig, Leipzig, Germany
Abstract :
Given a set of nucleotide sequences and corresponding gene annotations which might contain a moderate number of errors we consider the problem to identify common substrings occurring in homologous genes and to identify putative errors in the given annotations. The problem is solved by identifying nodes in a suffix tree that contains all substrings occurring in the data set. Due to the large size of the targeted data set our approach employs a truncated version of suffix trees. The approach is successfully applied to the mitochondrial nucleotide sequences and the corresponding annotations available in RefSeq for more than 2000 metazoan species. We demonstrate that the approach finds appropriate subsequences despite of errors in the given annotations. Moreover, it identifies several hundred errors within the RefSeq annotations.
Keywords :
DNA; bioinformatics; data analysis; genomics; search problems; sequences; RefSeq annotations; annotation guided local similarity search; data analysis methods; gene annotations; homologous genes; mitochondrial genomes; mitochondrial nucleotide sequences; multiple sequences; nucleotide sequences; suffix tree; Encoding; Genomics; Indexes; Memory management; Proteins; Tree data structures; Vegetation;
Conference_Titel :
Bioinformatics & Bioengineering (BIBE), 2012 IEEE 12th International Conference on
Conference_Location :
Larnaca
Print_ISBN :
978-1-4673-4357-2
DOI :
10.1109/BIBE.2012.6399666