DocumentCode :
1384707
Title :
The zero-frequency problem: estimating the probabilities of novel events in adaptive text compression
Author :
Witten, Ian H. ; Bell, Timothy C.
Author_Institution :
Dept. of Comput. Sci., Calgary Univ., Alta., Canada
Volume :
37
Issue :
4
fYear :
1991
fDate :
7/1/1991 12:00:00 AM
Firstpage :
1085
Lastpage :
1094
Abstract :
Approaches to the zero-frequency problem in adaptive text compression are discussed. This problem relates to the estimation of the likelihood of a novel event occurring. Although several methods have been used, their suitability has been on empirical evaluation rather than a well-founded model. The authors propose the application of a Poisson process model of novelty. Its ability to predict novel tokens is evaluated, and it consistently outperforms existing methods. It is applied to a practical statistical coding scheme, where a slight modification is required to avoid divergence. The result is a well-founded zero-frequency model that explains observed differences in the performance of existing methods, and offers a small improvement in the coding efficiency of text compression over the best method previously known
Keywords :
data compression; encoding; probability; Poisson process model; adaptive text compression; novel events; statistical coding scheme; zero-frequency problem; Arithmetic; Computer errors; Computer science; Context modeling; Councils; Data compression; Decoding; Drives; Encoding; Probability;
fLanguage :
English
Journal_Title :
Information Theory, IEEE Transactions on
Publisher :
ieee
ISSN :
0018-9448
Type :
jour
DOI :
10.1109/18.87000
Filename :
87000
Link To Document :
بازگشت