DocumentCode :
2380778
Title :
Algorithm for DNA sequence compression based on prediction of mismatch bases and repeat location
Author :
Kaipa, Kalyan Kumar ; Bopardikar, Ajit S. ; Abhilash, Srikantha ; Venkataraman, Parthasarathy ; Lee, Kyusang ; Ahn, TaeJin ; Narayanan, Rangavittal
Author_Institution :
SAIT India Lab., Samsung, Bangalore, India
fYear :
2010
fDate :
18-18 Dec. 2010
Firstpage :
851
Lastpage :
852
Abstract :
For DNA sequence Compression, it has been observed that methods based on Markov modeling and repeats give best results. However, these methods tend to use uniform distribution assumption of mismatches for approximate repeats. We show that these replacements are not uniformly distributed and we can improve compression efficiency by using non uniform distribution for mismatches. We also propose a hash table based method to predict repeat location which works well for block based genomic sequence compression algorithms. The proposed methods give good compression gains. The method can be incorporated into any algorithm that uses approximate repeats to realize similar gains.
Keywords :
DNA; Markov processes; bioinformatics; data compression; molecular biophysics; molecular configurations; DNA sequence compression algorithm; block based genomic sequence compression algorithms; compression efficiency; hash table based method; mismatch base prediction; mismatch prediction; nonuniform mismatch distribution; repeat location prediction;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Bioinformatics and Biomedicine Workshops (BIBMW), 2010 IEEE International Conference on
Conference_Location :
Hong, Kong
Print_ISBN :
978-1-4244-8303-7
Electronic_ISBN :
978-1-4244-8304-4
Type :
conf
DOI :
10.1109/BIBMW.2010.5703941
Filename :
5703941
Link To Document :
بازگشت