DocumentCode
3288681
Title
An Experiment Study on Text Transformation for Compression Using Stoplists and Frequent Words
Author
Tadrat, Jirapond ; Boonjing, Veera
Author_Institution
King Mongkut´´s Inst. of Technol. Ladkrabang, Bangkok
fYear
2008
fDate
7-9 April 2008
Firstpage
709
Lastpage
713
Abstract
The paper presents a new text transform algorithm suitable for embedding in compression algorithms. The strategy the new algorithm employed to increase performance of text compression is to replace words with predefined codes. Instead of using a huge dictionary containing exhaustive words as in previous works, the new algorithm uses a list of stoplists and/or frequent words. The research devised different encoding schemes for such a list. It then made experiments of using these schemes with different compression algorithms on standard texts. The result shows that each scheme gives increasing compression when using with specific compression algorithms.
Keywords
data compression; text analysis; dictionary; frequent word; predefined codes; stoplist word; text compression algorithm; text transformation algorithm; Compression algorithms; Computer science; Dictionaries; Encoding; Information technology; Laboratories; Mathematics; Natural languages; Software systems; Systems engineering and theory; LIPT; LPT; RLPT; SCLPT; Star encoding; Text preprocessing; Text transformation;
fLanguage
English
Publisher
ieee
Conference_Titel
Information Technology: New Generations, 2008. ITNG 2008. Fifth International Conference on
Conference_Location
Las Vegas, NV
Print_ISBN
0-7695-3099-0
Type
conf
DOI
10.1109/ITNG.2008.178
Filename
4492565
Link To Document