• DocumentCode
    3421441
  • Title

    Compression of multilingual aligned texts

  • Author

    Conley, Ehud S. ; Klein, Shmuel T.

  • Author_Institution
    Dept. of Comput. Sci., Bar-Ilan Univ., Ramat-Gan
  • fYear
    2006
  • fDate
    28-30 March 2006
  • Lastpage
    442
  • Abstract
    Summary form only given. Multilingual text compression depends primarily on the ability to match the corresponding parts of related texts by identifying semantic correspondences across the various sub-texts, a task generally referred to as text alignment. Savings in storage space can be obtained by replacing words and phrases with pointers to their translations, determined by any alignment algorithm. The suggested method was tested on an English-French corpus of the European Union. The French part was compressed using pointers towards the English part. The obtained compression rate (22.0%) is similar to the performances of Bzip and HuffWord and better than that of Gzip. However, Bzip and Gzip´s performances degrade when small sub-sections are processed separately, which makes them inappropriate for systems which often decode only small pieces
  • Keywords
    data compression; encoding; linguistics; English-French corpus; European Union; alignment algorithm; multilingual aligned text compression; storage space; Computer science; Concurrent computing; Data compression; Decoding; Degradation; Dictionaries; Encoding; Natural languages; Terminology; Testing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Compression Conference, 2006. DCC 2006. Proceedings
  • Conference_Location
    Snowbird, UT
  • ISSN
    1068-0314
  • Print_ISBN
    0-7695-2545-8
  • Type

    conf

  • DOI
    10.1109/DCC.2006.15
  • Filename
    1607285