Title :
Algorithm for DNA sequence compression based on prediction of mismatch bases and repeat location
Author :
Kaipa, Kalyan Kumar ; Bopardikar, Ajit S. ; Abhilash, Srikantha ; Venkataraman, Parthasarathy ; Lee, Kyusang ; Ahn, TaeJin ; Narayanan, Rangavittal
Author_Institution :
SAIT India Lab., Samsung, Bangalore, India
Abstract :
For DNA sequence Compression, it has been observed that methods based on Markov modeling and repeats give best results. However, these methods tend to use uniform distribution assumption of mismatches for approximate repeats. We show that these replacements are not uniformly distributed and we can improve compression efficiency by using non uniform distribution for mismatches. We also propose a hash table based method to predict repeat location which works well for block based genomic sequence compression algorithms. The proposed methods give good compression gains. The method can be incorporated into any algorithm that uses approximate repeats to realize similar gains.
Keywords :
DNA; Markov processes; bioinformatics; data compression; molecular biophysics; molecular configurations; DNA sequence compression algorithm; block based genomic sequence compression algorithms; compression efficiency; hash table based method; mismatch base prediction; mismatch prediction; nonuniform mismatch distribution; repeat location prediction;
Conference_Titel :
Bioinformatics and Biomedicine Workshops (BIBMW), 2010 IEEE International Conference on
Conference_Location :
Hong, Kong
Print_ISBN :
978-1-4244-8303-7
Electronic_ISBN :
978-1-4244-8304-4
DOI :
10.1109/BIBMW.2010.5703941