DocumentCode :
147059
Title :
Universal Text Preprocessing and Postprocessing for PPM Using Alphabet Adjustment
Author :
Alhawiti, Khaled M. ; Teahan, William J.
fYear :
2014
fDate :
26-28 March 2014
Firstpage :
395
Lastpage :
395
Abstract :
In this paper, we introduce several new universal pre-processing techniques to improve Prediction by Partial Matching (PPM) compression of UTF-8 encoded natural language text. These methods essentially ´adjust´ the alphabet in some manner (for example, by expanding or reducing it) prior to the compression algorithm then being applied to the amended text.
Keywords :
data compression; natural language processing; pattern matching; text analysis; PPM compression algorithm; UTF-8 encoded natural language text; alphabet adjustment; prediction by partial matching; universal text postprocessing; universal text preprocessing; Compression algorithms; Compressors; Computer science; Data compression; Educational institutions; Natural languages; Vocabulary; Bi-graphs; PPM; Text compression;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Compression Conference (DCC), 2014
Conference_Location :
Snowbird, UT
ISSN :
1068-0314
Type :
conf
DOI :
10.1109/DCC.2014.12
Filename :
6824447
Link To Document :
بازگشت