Title :
System for random access DNA sequence compression
Author :
Kaipa, Kalyan Kumar ; Lee, Kyusang ; Ahn, TaeJin ; Narayanan, Rangavittal
Author_Institution :
Samsung India Software Oper., Bangalore, India
Abstract :
DNA sequences are generally compressed by algorithms using approximate repeats that are found in most DNA sequences. The regions of DNA that are not part of a repeat are encoded by using arithmetic coder which estimates the probabilities for each symbol using a Markov model. Since arithmetic coding is used for compressing the bitstream, random access is very difficult in these methods as bthe itstream is tightly packed. Random access is a desirable feature as it enable to decompress only interesting regions in the sequence. This paper presents a system which uses the approximate repeats based compression algorithm and provides random access capability.
Keywords :
DNA; Markov processes; arithmetic codes; bioinformatics; data compression; genomics; molecular biophysics; random-access storage; DNA sequence compression; Markov model; arithmetic coding; bitstream; data compression; encoding; random access; repeats based compression algorithm; DNA Sequence Compression; Genomic Data Storage;
Conference_Titel :
Bioinformatics and Biomedicine Workshops (BIBMW), 2010 IEEE International Conference on
Conference_Location :
Hong, Kong
Print_ISBN :
978-1-4244-8303-7
Electronic_ISBN :
978-1-4244-8304-4
DOI :
10.1109/BIBMW.2010.5703942