• DocumentCode
    586364
  • Title

    Annotation guided local similarity search in multiple sequences and its application to mitochondrial genomes

  • Author

    Moritz, Ruby L. V. ; Bernt, Matthias ; Middendorf, Martin

  • Author_Institution
    Parallel Comput. & Complex Syst. Group, Univ. Leipzig, Leipzig, Germany
  • fYear
    2012
  • fDate
    11-13 Nov. 2012
  • Firstpage
    157
  • Lastpage
    162
  • Abstract
    Given a set of nucleotide sequences and corresponding gene annotations which might contain a moderate number of errors we consider the problem to identify common substrings occurring in homologous genes and to identify putative errors in the given annotations. The problem is solved by identifying nodes in a suffix tree that contains all substrings occurring in the data set. Due to the large size of the targeted data set our approach employs a truncated version of suffix trees. The approach is successfully applied to the mitochondrial nucleotide sequences and the corresponding annotations available in RefSeq for more than 2000 metazoan species. We demonstrate that the approach finds appropriate subsequences despite of errors in the given annotations. Moreover, it identifies several hundred errors within the RefSeq annotations.
  • Keywords
    DNA; bioinformatics; data analysis; genomics; search problems; sequences; RefSeq annotations; annotation guided local similarity search; data analysis methods; gene annotations; homologous genes; mitochondrial genomes; mitochondrial nucleotide sequences; multiple sequences; nucleotide sequences; suffix tree; Encoding; Genomics; Indexes; Memory management; Proteins; Tree data structures; Vegetation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Bioinformatics & Bioengineering (BIBE), 2012 IEEE 12th International Conference on
  • Conference_Location
    Larnaca
  • Print_ISBN
    978-1-4673-4357-2
  • Type

    conf

  • DOI
    10.1109/BIBE.2012.6399666
  • Filename
    6399666