• DocumentCode
    147059
  • Title

    Universal Text Preprocessing and Postprocessing for PPM Using Alphabet Adjustment

  • Author

    Alhawiti, Khaled M. ; Teahan, William J.

  • fYear
    2014
  • fDate
    26-28 March 2014
  • Firstpage
    395
  • Lastpage
    395
  • Abstract
    In this paper, we introduce several new universal pre-processing techniques to improve Prediction by Partial Matching (PPM) compression of UTF-8 encoded natural language text. These methods essentially ´adjust´ the alphabet in some manner (for example, by expanding or reducing it) prior to the compression algorithm then being applied to the amended text.
  • Keywords
    data compression; natural language processing; pattern matching; text analysis; PPM compression algorithm; UTF-8 encoded natural language text; alphabet adjustment; prediction by partial matching; universal text postprocessing; universal text preprocessing; Compression algorithms; Compressors; Computer science; Data compression; Educational institutions; Natural languages; Vocabulary; Bi-graphs; PPM; Text compression;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Compression Conference (DCC), 2014
  • Conference_Location
    Snowbird, UT
  • ISSN
    1068-0314
  • Type

    conf

  • DOI
    10.1109/DCC.2014.12
  • Filename
    6824447