DocumentCode
3460957
Title
An efficient data compression scheme based on semi-adaptive Huffman coding for moderately large Chinese text files
Author
Ong, Ghim Hwee ; Huang, Shell Ying
Author_Institution
Dept. of Inf. Syst. & Comput. Sci., Nat. Univ. of Singapore, Singapore
fYear
1995
fDate
3-7 Jul 1995
Firstpage
332
Lastpage
336
Abstract
This paper presents a data compression scheme for Chinese text files. Due to the skewness of the distribution of Chinese ideograms, the Huffman coding method is adopted. By storing the Huffman tree in the coding table and representing the Huffman tree using the Zaks sequence, the algorithm produces significant improvement on the compression results. The proposed method is evaluated by comparing its performance with three well-known compression algorithms and an algorithm specially designed to compress the coding table. This algorithm should also be applicable to other ideogram-based or oriental language texts. Also, it has the potential to reduce the dictionary size in a bigram or trigram-based semi-adaptive compression scheme for English texts
Keywords
Huffman codes; adaptive codes; data compression; Chinese ideograms; Chinese text files; Huffman tree; Zaks sequence; binary tree coding; data compression scheme; ideogram-based texts; oriental language texts; semi-adaptive Huffman coding; Algorithm design and analysis; Compression algorithms; Computer science; Data compression; Dictionaries; Encoding; Frequency; Huffman coding; Information systems; Natural languages;
fLanguage
English
Publisher
ieee
Conference_Titel
Networks, 1995. Theme: Electrotechnology 2000: Communications and Networks. [in conjunction with the] International Conference on Information Engineering., Proceedings of IEEE Singapore International
Print_ISBN
0-7803-2579-6
Type
conf
DOI
10.1109/SICON.1995.526073
Filename
526073
Link To Document