DocumentCode :
2021042
Title :
On vocabulary size of grammar-based codes
Author :
Dgbowski, L.
Author_Institution :
Inst. of Comput. Sci., Polish Acad. of Sci., Warsaw
fYear :
2007
fDate :
24-29 June 2007
Firstpage :
91
Lastpage :
95
Abstract :
We discuss inequalities holding between the vocabulary size, i.e., the number of distinct nonterminal symbols in a grammar-based compression for a string, and the excess length of the respective universal code, i.e., the code-based analog of algorithmic mutual information. The aim is to strengthen inequalities which were discussed in a weaker form in linguistics but shed some light on redundancy of efficiently computable codes. The main contribution of the paper is a construction of universal grammar-based codes for which the excess lengths can be bounded easily.
Keywords :
codes; data compression; grammars; redundancy; algorithmic mutual information; computable code redundancy; nonterminal symbol; string compression; universal grammar-based codes; vocabulary size; Computer science; Decoding; Encoding; Entropy; Integrated circuit noise; Mutual information; Natural languages; Stochastic processes; Vocabulary;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Theory, 2007. ISIT 2007. IEEE International Symposium on
Conference_Location :
Nice
Print_ISBN :
978-1-4244-1397-3
Type :
conf
DOI :
10.1109/ISIT.2007.4557209
Filename :
4557209
Link To Document :
بازگشت