• DocumentCode
    3431900
  • Title

    Data compression using text encryption

  • Author

    Kruse, Holger ; Mukherjee, Amar

  • Author_Institution
    Dept. of Comput. Sci., Univ. of Central Florida, Orlando, FL, USA
  • fYear
    1997
  • fDate
    25-27 Mar 1997
  • Firstpage
    447
  • Abstract
    Summary form only given. We discuss the use of a new algorithm to preprocess text in order to improve the compression ratio of textual documents, in particular online documents such as web pages on the World Wide Web. The algorithm was first introduced in an earlier paper, and in this paper we discuss the applicability of our algorithm in Internet and Intranet environments, and present additional performance measurements regarding compression ratios, memory requirements and run time. Our results show that our preprocessing algorithm usually leads to a significantly improved compression ratio. Our algorithm requires a static dictionary shared by the compressor and the decompressor. The basic idea of the algorithm is to define a unique encryption or signature for each word in the dictionary, and to replace each word in the input text by its signature. Each signature consists mostly of the special character `*´ plus as many alphabetic characters as necessary to make the signature unique among all words of the same length in the dictionary. In the resulting cryptic text the most frequently used character is typically the `*´ character, and standard compression algorithms like LZW applied to the cryptic text can exploit this redundancy in order to achieve better compression ratios. We compared the performance of our algorithm to other text compression algorithms, including standard compression algorithms such as gzip, Unix `compress´ and PPM, and to one text compression algorithm which uses a static dictionary
  • Keywords
    Internet; cryptography; data compression; document image processing; image coding; word processing; Internet; Intranet; Unix; World Wide Web; alphabetic characters; compression ratio; compressor; cryptic text; data compression; decompressor; memory requirements; performance measurements; run time; signature; standard compression algorithms; static dictionary; text compression algorithms; text encryption; textual documents; web pages; Compression algorithms; Computer science; Cryptography; Data compression; Dictionaries; HTML; Internet; Measurement; Web pages; Web sites;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Compression Conference, 1997. DCC '97. Proceedings
  • Conference_Location
    Snowbird, UT
  • ISSN
    1068-0314
  • Print_ISBN
    0-8186-7761-9
  • Type

    conf

  • DOI
    10.1109/DCC.1997.582107
  • Filename
    582107