DocumentCode :
3430849
Title :
A codebook generation algorithm for document image compression
Author :
Zhang, In ; Danskin, John M. ; Young, Neal E.
Author_Institution :
Dept. of Comput. Sci., Dartmouth Coll., Hanover, NH, USA
fYear :
1997
fDate :
25-27 Mar 1997
Firstpage :
300
Lastpage :
309
Abstract :
Pattern-matching based document compression systems rely on finding a small set of patterns that can be used to represent all of the ink in the document. Finding an optimal set of patterns is NP-hard; previous compression schemes have resorted to heuristics. We extend the cross-entropy approach, used previously for measuring pattern similarity, to this problem. Using this approach we reduce the problem to the fixed-cost k-median problem, for which we present a new algorithm with a good provable performance guarantee. We test our new algorithm in place of the previous heuristics (First Fit, with and without generalized Lloyd´s (k-means) postprocessing steps). The new algorithm generates a better codebook, resulting in an overall improvement in compression performance of almost 17%
Keywords :
data compression; document image processing; entropy; image coding; optimisation; pattern matching; NP-hard; codebook generation algorithm; compression performance; cross-entropy approach; document image compression; first fit; fixed-cost k-median problem; generalized Lloyd´s postprocessing; heuristics; pattern matching; pattern similarity; performance guarantee; Books; Computer science; Costs; Entropy; Image coding; Ink; Laboratories; Pattern matching; Probability distribution; Propagation losses;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Compression Conference, 1997. DCC '97. Proceedings
Conference_Location :
Snowbird, UT
ISSN :
1068-0314
Print_ISBN :
0-8186-7761-9
Type :
conf
DOI :
10.1109/DCC.1997.582053
Filename :
582053
Link To Document :
بازگشت