Phrase elimination in greedy parsing dictionary coders with deferred innovation

Author

Yao, Zhen

Author_Institution

Dept. of Comput. Sci., Warwick Univ., Coventry, UK

fYear

2003

fDate

25-27 March 2003

Firstpage

456

Abstract

Summary form only given. LZ dictionary coders parse the message into successive substrings, each consists two parts, the citation, the longest prefix phrase that has already been accommodated in the dictionary, and the innovation, the symbol immediately following the citation. Suppose the input alphabet set is A and the dictionary D = {p₁, p₂...p_n} is a set of phrases where p_i∈A*, parsed by a greedy-parsing LZ coder. Represented in the form of a dictionary search tree, the process matching a phrase in D with a citation can be viewed as traversing from the root of the dictionary tree by matching consecutive symbols from the input until the mismatching innovation occurs. The dictionary is reduced to D´=D/E. Its phrase index is then encoded by a less redundant code (LRC) with upper bound of codeword length. The expected number of phrases in D´ was estimated. It was also verified with experiments that such estimation is accurate. It was also shown that 3% improvement is typical over LZW coders with LRC and 5% better than standard LZW.

Keywords

codes; data compression; grammars; search problems; string matching; LRC; citation; codeword length; deferred innovation; greedy parsing dictionary coders; less redundant code; longest prefix phrase; phrase elimination; Code standards; Compressors; Computer science; Data compression; Dictionaries; Image coding; Impedance matching; Technological innovation;

fLanguage

English

Publisher

ieee

Conference_Titel

Data Compression Conference, 2003. Proceedings. DCC 2003

ISSN

1068-0314

Print_ISBN

0-7695-1896-6

Type

conf

DOI

10.1109/DCC.2003.1194075

Filename

1194075