Title :
Application of a word-based text compression method to Japanese and Chinese texts
Author :
Yoshida, S. ; Morihara, T. ; Yahagi, H. ; Satoh, N.
Author_Institution :
Fujitsu Labs. Ltd., Atsugi, Japan
Abstract :
Summary form only given. 16-bit Asian language texts are difficult to compress using conventional 8-bit sampling text compression schemes. Recently the word-based text compression method has been studied with the intention of compressing Japanese and Chinese texts individually. In order to compress a large number of small-sized Japanese documents, such as groupware and E-mail, we applied a semi-adaptive word-based method to Japanese at DCC´98. To further enable multilingual text compression, we also applied a static word-based method to both the Japanese and Chinese texts and evaluated compression characteristics and performance using a computer simulation
Keywords :
character sets; data compression; text analysis; 16-bit Asian language texts; 8-bit sampling; Chinese texts; E-mail; Japanese documents; groupware; multilingual text compression; semi-adaptive word-based method; word-based text compression; Compression algorithms; Computer simulation; Databases; Dictionaries; Laboratories; Natural languages; Sampling methods; Testing;
Conference_Titel :
Data Compression Conference, 1999. Proceedings. DCC '99
Conference_Location :
Snowbird, UT
Print_ISBN :
0-7695-0096-X
DOI :
10.1109/DCC.1999.785718