DocumentCode
3421441
Title
Compression of multilingual aligned texts
Author
Conley, Ehud S. ; Klein, Shmuel T.
Author_Institution
Dept. of Comput. Sci., Bar-Ilan Univ., Ramat-Gan
fYear
2006
fDate
28-30 March 2006
Lastpage
442
Abstract
Summary form only given. Multilingual text compression depends primarily on the ability to match the corresponding parts of related texts by identifying semantic correspondences across the various sub-texts, a task generally referred to as text alignment. Savings in storage space can be obtained by replacing words and phrases with pointers to their translations, determined by any alignment algorithm. The suggested method was tested on an English-French corpus of the European Union. The French part was compressed using pointers towards the English part. The obtained compression rate (22.0%) is similar to the performances of Bzip and HuffWord and better than that of Gzip. However, Bzip and Gzip´s performances degrade when small sub-sections are processed separately, which makes them inappropriate for systems which often decode only small pieces
Keywords
data compression; encoding; linguistics; English-French corpus; European Union; alignment algorithm; multilingual aligned text compression; storage space; Computer science; Concurrent computing; Data compression; Decoding; Degradation; Dictionaries; Encoding; Natural languages; Terminology; Testing;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Compression Conference, 2006. DCC 2006. Proceedings
Conference_Location
Snowbird, UT
ISSN
1068-0314
Print_ISBN
0-7695-2545-8
Type
conf
DOI
10.1109/DCC.2006.15
Filename
1607285
Link To Document