DocumentCode
2762482
Title
Constructing Suffix Array During Decompression
Author
Mahmoud, M. ; Abouelhoda, M.I. ; Kandil, A. ; Elbialy, A.
Author_Institution
Fac. of Eng., Cairo Univ., Giza
fYear
2008
fDate
18-20 Dec. 2008
Firstpage
1
Lastpage
4
Abstract
The suffix array is an indexing data structure used in a wide range of applications in Bioinformatics. Biological DNA sequences are available to download from public servers in the form of compressed files, where the popular lossless compression program gzip [1] is employed. The straightforward method to construct the suffix array for this data involves decompressing the sequence file, storing it on disk, and then calling a suffix array construction program to build the suffix array. This scenario, albeit feasible, requires disk access and throws away valuable information in the compressed file. In this paper, we present an algorithm that constructs the suffix array during the decompression requiring no disk access and making use of the decompression information to construct the suffix array.
Keywords
DNA; bioinformatics; data compression; data structures; bioinformatics; biological DNA sequences; compressed files; decompression; gzip; indexing data structure; lossless compression program; suffix array; Algorithm design and analysis; Bioinformatics; DNA; Data engineering; Data structures; File servers; Genomics; Indexing; Proteins; Sequences;
fLanguage
English
Publisher
ieee
Conference_Titel
Biomedical Engineering Conference, 2008. CIBEC 2008. Cairo International
Conference_Location
Cairo
Print_ISBN
978-1-4244-2694-2
Electronic_ISBN
978-1-4244-2695-9
Type
conf
DOI
10.1109/CIBEC.2008.4786040
Filename
4786040
Link To Document