• DocumentCode
    2762482
  • Title

    Constructing Suffix Array During Decompression

  • Author

    Mahmoud, M. ; Abouelhoda, M.I. ; Kandil, A. ; Elbialy, A.

  • Author_Institution
    Fac. of Eng., Cairo Univ., Giza
  • fYear
    2008
  • fDate
    18-20 Dec. 2008
  • Firstpage
    1
  • Lastpage
    4
  • Abstract
    The suffix array is an indexing data structure used in a wide range of applications in Bioinformatics. Biological DNA sequences are available to download from public servers in the form of compressed files, where the popular lossless compression program gzip [1] is employed. The straightforward method to construct the suffix array for this data involves decompressing the sequence file, storing it on disk, and then calling a suffix array construction program to build the suffix array. This scenario, albeit feasible, requires disk access and throws away valuable information in the compressed file. In this paper, we present an algorithm that constructs the suffix array during the decompression requiring no disk access and making use of the decompression information to construct the suffix array.
  • Keywords
    DNA; bioinformatics; data compression; data structures; bioinformatics; biological DNA sequences; compressed files; decompression; gzip; indexing data structure; lossless compression program; suffix array; Algorithm design and analysis; Bioinformatics; DNA; Data engineering; Data structures; File servers; Genomics; Indexing; Proteins; Sequences;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Biomedical Engineering Conference, 2008. CIBEC 2008. Cairo International
  • Conference_Location
    Cairo
  • Print_ISBN
    978-1-4244-2694-2
  • Electronic_ISBN
    978-1-4244-2695-9
  • Type

    conf

  • DOI
    10.1109/CIBEC.2008.4786040
  • Filename
    4786040