DocumentCode
147059
Title
Universal Text Preprocessing and Postprocessing for PPM Using Alphabet Adjustment
Author
Alhawiti, Khaled M. ; Teahan, William J.
fYear
2014
fDate
26-28 March 2014
Firstpage
395
Lastpage
395
Abstract
In this paper, we introduce several new universal pre-processing techniques to improve Prediction by Partial Matching (PPM) compression of UTF-8 encoded natural language text. These methods essentially ´adjust´ the alphabet in some manner (for example, by expanding or reducing it) prior to the compression algorithm then being applied to the amended text.
Keywords
data compression; natural language processing; pattern matching; text analysis; PPM compression algorithm; UTF-8 encoded natural language text; alphabet adjustment; prediction by partial matching; universal text postprocessing; universal text preprocessing; Compression algorithms; Compressors; Computer science; Data compression; Educational institutions; Natural languages; Vocabulary; Bi-graphs; PPM; Text compression;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Compression Conference (DCC), 2014
Conference_Location
Snowbird, UT
ISSN
1068-0314
Type
conf
DOI
10.1109/DCC.2014.12
Filename
6824447
Link To Document