Title :
Applications of YK algorithm to the Internet transmission of Web-data: implementation issues and modifications
Author :
Banerji, Ashish ; Yang, En-Hui
Author_Institution :
Hughes Network Syst. Inc., Germantown, MD, USA
Abstract :
Summary form only given. Recently, Yang and Kieffer (2000) proposed a novel lossless grammar-based data compression algorithm, called the YK algorithm, in which a greedy sequential grammar transform is applied to the original data to construct an irreducible context free grammar, which is encoded indirectly by using an arithmetic coder. The basic implementation of the YK encoding algorithm consists of a sequentially iterative application of three fundamental steps: parsing, arithmetic encoding, and updating. This paper proposes five modifications of the basic YK algorithm, motivated by applications of the algorithm to the Internet transmission of Web-data. 1) Fast YK encoder: The parsing operation is a major step of the YK algorithm. A variant of the tree data structure is proposed for fast parsing. This is applicable for real-time compression of IP datagrams. 2) Pre-defined source statistics: known source statistics can be exploited to improve compression efficiency, which is particularly effective for small IP datagrams with a known structure. 3) Pre-defined grammar: starting with a “typical” pre-defined grammar can significantly improve the compression efficiency for applications such as HTML Web-page compression. 4) Memory constrained implementation: during YK compression, as the length of the data sequence increases, the grammar also continues to grow in size, which can potentially exhaust the available memory in the system. This paper proposes a way to check memory requirement by reusing variables in the grammar, once a user-chosen limit on grammar size is reached. 5) Error handling capability: the paper identifies all possible contingencies that can arise when an erroneous bit-stream is fed to the YK decoder, and provides explicit ways to handle these. This is important in applications where compressed IP datagrams are transmitted over unreliable links
Keywords :
Internet; arithmetic codes; data compression; grammars; iterative methods; transforms; tree data structures; HTML Web-page compression; IP datagrams; Internet transmission; NNTP datagrams; Web-data; YK algorithm; YK decoder; arithmetic coder; arithmetic encoding; compressed IP datagrams; dynamic alphabet; error handling capability; frequency counts; greedy sequential grammar transform; irreducible context free grammar; lossless grammar-based data compression; memory constrained implementation; parsed substring; parsing; pre-defined grammar; pre-defined source statistics; real-time compression; sequentially iterative application; stationary ergodic sources; unreliable links; updating; Arithmetic; Data compression; Decoding; Encoding; HTML; Internet; Iterative algorithms; Real time systems; Statistics; Tree data structures;
Conference_Titel :
Data Compression Conference, 2000. Proceedings. DCC 2000
Conference_Location :
Snowbird, UT
Print_ISBN :
0-7695-0592-9
DOI :
10.1109/DCC.2000.838193