• DocumentCode
    3460957
  • Title

    An efficient data compression scheme based on semi-adaptive Huffman coding for moderately large Chinese text files

  • Author

    Ong, Ghim Hwee ; Huang, Shell Ying

  • Author_Institution
    Dept. of Inf. Syst. & Comput. Sci., Nat. Univ. of Singapore, Singapore
  • fYear
    1995
  • fDate
    3-7 Jul 1995
  • Firstpage
    332
  • Lastpage
    336
  • Abstract
    This paper presents a data compression scheme for Chinese text files. Due to the skewness of the distribution of Chinese ideograms, the Huffman coding method is adopted. By storing the Huffman tree in the coding table and representing the Huffman tree using the Zaks sequence, the algorithm produces significant improvement on the compression results. The proposed method is evaluated by comparing its performance with three well-known compression algorithms and an algorithm specially designed to compress the coding table. This algorithm should also be applicable to other ideogram-based or oriental language texts. Also, it has the potential to reduce the dictionary size in a bigram or trigram-based semi-adaptive compression scheme for English texts
  • Keywords
    Huffman codes; adaptive codes; data compression; Chinese ideograms; Chinese text files; Huffman tree; Zaks sequence; binary tree coding; data compression scheme; ideogram-based texts; oriental language texts; semi-adaptive Huffman coding; Algorithm design and analysis; Compression algorithms; Computer science; Data compression; Dictionaries; Encoding; Frequency; Huffman coding; Information systems; Natural languages;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Networks, 1995. Theme: Electrotechnology 2000: Communications and Networks. [in conjunction with the] International Conference on Information Engineering., Proceedings of IEEE Singapore International
  • Print_ISBN
    0-7803-2579-6
  • Type

    conf

  • DOI
    10.1109/SICON.1995.526073
  • Filename
    526073