DocumentCode :
2130930
Title :
Fast compression of huge DNA sequence data
Author :
Jichao Ouyang ; Ping Feng ; Jichang Kang
Author_Institution :
Sch. of Comput. Sci. & Technol., Northwestern Polytech. Univ., Xi´an, China
fYear :
2012
fDate :
16-18 Oct. 2012
Firstpage :
885
Lastpage :
888
Abstract :
DNA sequences can be enormous in size. There have been several DNA sequence oriented compression methods like Biocompress, DNACompress, Cfact, CTW+LZ, and DNADP. These compression methods can achieve high compression ratio, but sacrifice too much of time. For example, CTW+LZ takes several hours to compress a sequence HEMCMVCG of 227 KB. DNADP takes about 20 minutes to compress standard benchmark sequences. Here we introduce an improved RLE method, which has lower computation complex. Thus, it significantly improves the running time against previous DNA compression programs. Our improved LRE can achieve compression ratio of 1.862 bits per base. It only takes about 1 minute on a 2.1 GHz Core 2 duo processor to compress a 250MB chromosomes sequence file. And we use the Delta Encoding to reduce the second sequence to 4.8MB.
Keywords :
DNA; biology computing; biomedical electronics; cellular biophysics; computational complexity; data compression; field programmable gate arrays; CTW+LZ; Cfact; DNA compression programs; DNA sequence data; DNA sequence oriented compression methods; DNACompress; DNADP; HEMCMVCG; LRE; RLE method; biocompress; chromosomes sequence file; compression ratio; computational complexity; delta encoding; duo processor; standard benchmark sequences; Data compression; Delta Encoding; Field programmable gate arrays (FPGA); Run-length encoding; Sequence alignment; Variable Integers;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Biomedical Engineering and Informatics (BMEI), 2012 5th International Conference on
Conference_Location :
Chongqing
Print_ISBN :
978-1-4673-1183-0
Type :
conf
DOI :
10.1109/BMEI.2012.6512909
Filename :
6512909
Link To Document :
بازگشت