DocumentCode :
2916814
Title :
Word-based block-sorting text compression
Author :
Isal, R. Yugo Kartono ; Moffat, Alistair
Author_Institution :
Dept. of Comput. Sci. & Software Eng., Melbourne Univ., Vic., Australia
fYear :
2001
fDate :
2001
Firstpage :
92
Lastpage :
99
Abstract :
Block sorting is an innovative compression mechanism introduced in by M. Burrows and D.J. Wheeler (1994). It involves three steps: permuting the input one block at a time through the use of the Burrows-Wheeler transform (BWT); applying a move-to-front (MTF) transform to each of the permuted blocks; and then entropy coding the output with a Huffman or arithmetic coder. Until now, block-sorting implementations have assumed that the input message is a sequence of characters. In this paper, we extend the block-sorting mechanism to word-based models. We also consider other transformations as an alternative to MTF, and are able to show improved compression results compared to MTF. For large text files, the combination of word-based modelling, BWT and MTF-like transformations allows excellent compression effectiveness to be attained within reasonable resource costs
Keywords :
Huffman codes; arithmetic codes; data compression; entropy codes; sorting; text analysis; transform coding; Burrows-Wheeler transform; Huffman coder; arithmetic coder; entropy coding; input permutation; large text files; move-to-front transform; permuted blocks; resource costs; word-based block-sorting text compression; word-based modelling; Arithmetic; Computer science; Costs; Decoding; Dictionaries; Entropy coding; Frequency; Software engineering; Tree data structures;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Science Conference, 2001. ACSC 2001. Proceedings. 24th Australasian
Conference_Location :
Gold Coast, Qld.
ISSN :
1530-0900
Print_ISBN :
0-7695-0963-0
Type :
conf
DOI :
10.1109/ACSC.2001.906628
Filename :
906628
Link To Document :
بازگشت