• DocumentCode
    3062334
  • Title

    Application of a word-based text compression method to Japanese and Chinese texts

  • Author

    Yoshida, S. ; Morihara, T. ; Yahagi, H. ; Satoh, N.

  • Author_Institution
    Fujitsu Labs. Ltd., Atsugi, Japan
  • fYear
    1999
  • fDate
    29-31 Mar 1999
  • Firstpage
    561
  • Abstract
    Summary form only given. 16-bit Asian language texts are difficult to compress using conventional 8-bit sampling text compression schemes. Recently the word-based text compression method has been studied with the intention of compressing Japanese and Chinese texts individually. In order to compress a large number of small-sized Japanese documents, such as groupware and E-mail, we applied a semi-adaptive word-based method to Japanese at DCC´98. To further enable multilingual text compression, we also applied a static word-based method to both the Japanese and Chinese texts and evaluated compression characteristics and performance using a computer simulation
  • Keywords
    character sets; data compression; text analysis; 16-bit Asian language texts; 8-bit sampling; Chinese texts; E-mail; Japanese documents; groupware; multilingual text compression; semi-adaptive word-based method; word-based text compression; Compression algorithms; Computer simulation; Databases; Dictionaries; Laboratories; Natural languages; Sampling methods; Testing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Compression Conference, 1999. Proceedings. DCC '99
  • Conference_Location
    Snowbird, UT
  • ISSN
    1068-0314
  • Print_ISBN
    0-7695-0096-X
  • Type

    conf

  • DOI
    10.1109/DCC.1999.785718
  • Filename
    785718