DocumentCode :
2300399
Title :
Genome compression using normalized maximum likelihood models for constrained Markov sources
Author :
Tabus, Ioan ; Korodi, Gergely
Author_Institution :
Dept. of Signal Process., Tampere Univ. of Technol., Tampere
fYear :
2008
fDate :
5-9 May 2008
Firstpage :
261
Lastpage :
265
Abstract :
The paper presents exact and implementable solutions to the problem of universal coding of approximate repeats by using the normalized maximum likelihood model for the class of Markov sources of first order, incorporating constraints which are standard in the context of fast searching similarities over full genomes. A coding scheme combining universal codes for memoryless sources and for sources with memory is then presented. The results when compressing the full human genome show that the combined scheme is able to provide slight improvements over the existing state of the art. As a side result, interesting pairs of sequences may be found, which are highly similar by the new NML model for Markov sources, but have a lower similarity score when evaluated with the NML for memoryless sources.
Keywords :
Markov processes; genetic engineering; genetics; maximum likelihood estimation; Markov sources; coding scheme; constrained Markov sources; genome compression; memoryless sources; normalized maximum likelihood models; universal coding; Bioinformatics; Context modeling; DNA; Encoding; Genomics; Humans; Paper technology; Pattern matching; Sequences; Signal processing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Theory Workshop, 2008. ITW '08. IEEE
Conference_Location :
Porto
Print_ISBN :
978-1-4244-2269-2
Electronic_ISBN :
978-1-4244-2271-5
Type :
conf
DOI :
10.1109/ITW.2008.4578663
Filename :
4578663
Link To Document :
بازگشت