DocumentCode :
2742305
Title :
Motifs in Ziv-Lempel-Welch Clef
Author :
Apostolico, Alberto ; Comin, Mattec ; Parida, Laxmi
Author_Institution :
Dipt. di Ingegneria dell´´ Informazione, Padova Univ., Italy
fYear :
2004
fDate :
23-25 March 2004
Firstpage :
72
Lastpage :
81
Abstract :
We present variants of classical data compression paradigms by Ziv, Lempel, and Welch in which the phrases used in compression are selected among suitably chosen motifs, defined here as strings of intermittently solid and wild characters that recur more or less frequently in the source textstring. This notion emerged primarily in the analysis of biological sequences and molecules. Whereas the number of motifs in a sequence or family may be exponential in the size of the input, a linear-sized basis of irredundant motifs may be defined such that any other motif can be obtained by the union of a suitable subset from the basis. Previous study has exposed the advantages of using irredundant motifs in lossy as well as lossless offline compression. In the present paper, we examine adaptations and extensions of classical incremental ZL and ZLW paradigms. First, hybrid schemata are proposed along these lines, in which motifs may be discovered and selected off-line, while the parse and encoding is still conducted on-line. The performances thus obtained improve on the one hand over previous off-line implementations of motif-based compression, and on the other, over the traditionally best implementations of ZLW. On the basis of this, both lossy and lossless motif-based schemata are introduced and tested that follow more closely the ZL and ZLW paradigms.
Keywords :
data compression; encoding; grammars; sequences; Motif compression; Ziv-Lempel-Welch Clef; biological sequence analysis; classical data compression paradigm; encoding; lossless offline compression; lossy scheme; molecule sequence; parse; phrase compression; source textstring; string; wild character; Data compression;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Compression Conference, 2004. Proceedings. DCC 2004
ISSN :
1068-0314
Print_ISBN :
0-7695-2082-0
Type :
conf
DOI :
10.1109/DCC.2004.1281452
Filename :
1281452
Link To Document :
بازگشت