DocumentCode
2707804
Title
Off-line compression by extensible motifs
Author
Apostolico, Alberto ; Comin, Matteo ; Parida, Laxmi
fYear
2005
fDate
29-31 March 2005
Firstpage
450
Abstract
Summary form only given. We present lossy off-line data compression techniques by textual substitution in which the patterns used in compression are chosen among the extensible motifs that are found to recur in the textstring with a minimum pre-specified frequency. A motif is to be interpreted here as a sequence of intermixed solid and don\´t care characters that obeys, in addition, some conditions of saturations: most notably, it must be not possible to eliminate some don\´t cares in the pattern without having to forfeit some of its occurrences. Motif discovery and motif-driven parses of various kinds have been previously introduced and used in Apostolico et al. (2004) and Apostolico et al. (2003). Whereas the motifs considered in those studies are "rigid", here we assume that each sequence of gaps present in a motif comes endowed with some individually prescribed degree of elasticity, whereby a same pattern may be stretched to fit segments of the source that match at all the solid characters but are otherwise of different lengths. This is expected to save on the size of the codebook, and hence to improve compression.
Keywords
data compression; string matching; table lookup; text analysis; codebook; extensible motifs; lossy off-line data compression; motif discovery; motif-driven parses; textstring; textual substitution; Data compression; Elasticity; Encoding; Error analysis; Error correction codes; Frequency; Image coding; Pattern matching; Solids; USA Councils;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Compression Conference, 2005. Proceedings. DCC 2005
ISSN
1068-0314
Print_ISBN
0-7695-2309-9
Type
conf
DOI
10.1109/DCC.2005.59
Filename
1402207
Link To Document