• DocumentCode
    1762587
  • Title

    GenomeTools: A Comprehensive Software Library for Efficient Processing of Structured Genome Annotations

  • Author

    Gremme, Gordon ; Steinbiss, Sascha ; Kurtz, S.

  • Author_Institution
    Center for Bioinf., Univ. of Hamburg, Hamburg, Germany
  • Volume
    10
  • Issue
    3
  • fYear
    2013
  • fDate
    May-June 2013
  • Firstpage
    645
  • Lastpage
    656
  • Abstract
    Genome annotations are often published as plain text files describing genomic features and their subcomponents by an implicit annotation graph. In this paper, we present the GenomeTools, a convenient and efficient software library and associated software tools for developing bioinformatics software intended to create, process or convert annotation graphs. The GenomeTools strictly follow the annotation graph approach, offering a unified graph-based representation. This gives the developer intuitive and immediate access to genomic features and tools for their manipulation. To process large annotation sets with low memory overhead, we have designed and implemented an efficient pull-based approach for sequential processing of annotations. This allows to handle even the largest annotation sets, such as a complete catalogue of human variations. Our object-oriented C-based software library enables a developer to conveniently implement their own functionality on annotation graphs and to integrate it into larger workflows, simultaneously accessing compressed sequence data if required. The careful C implementation of the GenomeTools does not only ensure a light-weight memory footprint while allowing full sequential as well as random access to the annotation graph, but also facilitates the creation of bindings to a variety of script programming languages (like Python and Ruby) sharing the same interface.
  • Keywords
    authoring languages; bioinformatics; data compression; genomics; graphs; object-oriented languages; text analysis; GenomeTools; Python; Ruby; annotation graph approach; annotation graph conversion; annotation graph creation; annotation graph processing; associated software tools; bioinformatics software; careful C implementation; catalogue; comprehensive software library; compressed sequence data; efficient pull-based approach; efficient software library; efficient structured genome annotation processing; genomic features; human variations; implicit annotation graph; light-weight memory footprint; low memory overhead; object-oriented C-based software library; plain text files; random access; script programming languages; sequential processing; unified graph-based representation; Bioinformatics; Computer languages; Genomics; Ontologies; Software; Software libraries; Bioinformatics; Computer languages; GenomeTools; Genomics; Ontologies; Python; Ruby; Scientific computing; Software; Software libraries; annotation graph approach; annotation graph conversion; annotation graph creation; annotation graph processing; associated software tools; authoring languages; bioinformatics; bioinformatics software; biology and genetics; careful C implementation; catalogue; comprehensive software library; compressed sequence data; data compression; efficient pull-based approach; efficient software library; efficient structured genome annotation processing; genomic features; genomics; graphs; human variations; implicit annotation graph; light-weight memory footprint; low memory overhead; object-oriented C-based software library; object-oriented languages; plain text files; programming environments; random access; reusable libraries; script programming languages; sequential processing; software engineering; text analysis; unified graph-based representation;
  • fLanguage
    English
  • Journal_Title
    Computational Biology and Bioinformatics, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    1545-5963
  • Type

    jour

  • DOI
    10.1109/TCBB.2013.68
  • Filename
    6529082