• DocumentCode
    2707804
  • Title

    Off-line compression by extensible motifs

  • Author

    Apostolico, Alberto ; Comin, Matteo ; Parida, Laxmi

  • fYear
    2005
  • fDate
    29-31 March 2005
  • Firstpage
    450
  • Abstract
    Summary form only given. We present lossy off-line data compression techniques by textual substitution in which the patterns used in compression are chosen among the extensible motifs that are found to recur in the textstring with a minimum pre-specified frequency. A motif is to be interpreted here as a sequence of intermixed solid and don\´t care characters that obeys, in addition, some conditions of saturations: most notably, it must be not possible to eliminate some don\´t cares in the pattern without having to forfeit some of its occurrences. Motif discovery and motif-driven parses of various kinds have been previously introduced and used in Apostolico et al. (2004) and Apostolico et al. (2003). Whereas the motifs considered in those studies are "rigid", here we assume that each sequence of gaps present in a motif comes endowed with some individually prescribed degree of elasticity, whereby a same pattern may be stretched to fit segments of the source that match at all the solid characters but are otherwise of different lengths. This is expected to save on the size of the codebook, and hence to improve compression.
  • Keywords
    data compression; string matching; table lookup; text analysis; codebook; extensible motifs; lossy off-line data compression; motif discovery; motif-driven parses; textstring; textual substitution; Data compression; Elasticity; Encoding; Error analysis; Error correction codes; Frequency; Image coding; Pattern matching; Solids; USA Councils;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Compression Conference, 2005. Proceedings. DCC 2005
  • ISSN
    1068-0314
  • Print_ISBN
    0-7695-2309-9
  • Type

    conf

  • DOI
    10.1109/DCC.2005.59
  • Filename
    1402207