• DocumentCode
    3288681
  • Title

    An Experiment Study on Text Transformation for Compression Using Stoplists and Frequent Words

  • Author

    Tadrat, Jirapond ; Boonjing, Veera

  • Author_Institution
    King Mongkut´´s Inst. of Technol. Ladkrabang, Bangkok
  • fYear
    2008
  • fDate
    7-9 April 2008
  • Firstpage
    709
  • Lastpage
    713
  • Abstract
    The paper presents a new text transform algorithm suitable for embedding in compression algorithms. The strategy the new algorithm employed to increase performance of text compression is to replace words with predefined codes. Instead of using a huge dictionary containing exhaustive words as in previous works, the new algorithm uses a list of stoplists and/or frequent words. The research devised different encoding schemes for such a list. It then made experiments of using these schemes with different compression algorithms on standard texts. The result shows that each scheme gives increasing compression when using with specific compression algorithms.
  • Keywords
    data compression; text analysis; dictionary; frequent word; predefined codes; stoplist word; text compression algorithm; text transformation algorithm; Compression algorithms; Computer science; Dictionaries; Encoding; Information technology; Laboratories; Mathematics; Natural languages; Software systems; Systems engineering and theory; LIPT; LPT; RLPT; SCLPT; Star encoding; Text preprocessing; Text transformation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Technology: New Generations, 2008. ITNG 2008. Fifth International Conference on
  • Conference_Location
    Las Vegas, NV
  • Print_ISBN
    0-7695-3099-0
  • Type

    conf

  • DOI
    10.1109/ITNG.2008.178
  • Filename
    4492565