Title :
Compression of Whole Genome Alignments
Author :
Hanus, Pavol ; Dingel, Janis ; Chalkidis, Georg ; Hagenauer, Joachim
Author_Institution :
Dept. of Electr. Eng. & Inf. Technol., Tech. Univ. Munchen, Munich, Germany
Abstract :
Recent advances in DNA sequencing technology have caused an exponential growth of publicly available genomic sequence data. A particularly voluminous, frequently used static data set are whole genome alignments. The first lossless compression algorithm for such data sets based on well-established statistical evolutionary models and prediction techniques from lossless binary image compression is introduced. The compression rate is improved by a factor of 1.6 compared to the currently used Lempel-Ziv (LZ) compression.
Keywords :
DNA; biological techniques; biology computing; data compression; evolution (biological); genetics; genomics; molecular biophysics; statistical analysis; DNA sequencing; Lempel-Ziv compression; genomic sequence data; lossless binary image compression; lossless compression algorithm; statistical evolutionary models; whole genome alignment compression; Bioinformatics; Compression algorithms; DNA; Databases; Evolution (biology); Genetics; Genomics; Image coding; Predictive models; Sequences; Compression; genetics; lossless binary image compression; multiple sequence alignment; probabilistic models of evolution; whole genome alignment;
Journal_Title :
Information Theory, IEEE Transactions on
DOI :
10.1109/TIT.2009.2037052