• DocumentCode
    1801289
  • Title

    Data compression using encrypted text

  • Author

    Franceschini, Robert ; Mukherjee, Amar

  • Author_Institution
    Dept. of Comput. Sci., Univ. of Central Florida, Orlando, FL, USA
  • fYear
    1996
  • fDate
    13-15, May 1996
  • Firstpage
    130
  • Lastpage
    138
  • Abstract
    We present an algorithm for text compression. The basic idea of our algorithm is to define a unique encryption or signature of each word in the dictionary by replacing certain characters in the words by a special character “*” and retaining a few characters so that the word is still retrievable. For any encrypted text the most frequently used character is “*” and the standard compression algorithms can exploit this redundancy in an effective way. We advocate the following compression paradigm: given a compression algorithm A and a text T, we apply the same algorithm A on an encrypted text *T and retrieve the original text via a dictionary which maps the decompressed text *T to the original text T. We report better results for most widely used compression algorithms such as Huffman, LZW, arithmetic, unix compress, gnu-zip with respect to a text corpus. The compression rates using these algorithms are much better than the dictionary based methods reported in the literature. One basic assumption of our algorithm is that the system has access to a dictionary of words used in all the texts along with a corresponding “cryptic” dictionary. The cost of this dictionary is amortized over the compression savings for all the text files handled by the organization. If two organizations wish to exchange information using our compression algorithm, they must share a common dictionary. We compare our methods with other dictionary based methods and present future research problems
  • Keywords
    cryptography; data compression; glossaries; word processing; common dictionary; compression rates; data compression; decompressed text; dictionary based methods; encrypted text; redundancy; special character; standard compression algorithms; text compression; text corpus; text files; Arithmetic; Bandwidth; Compression algorithms; Computer science; Costs; Cryptography; Data compression; Dictionaries; Explosions; Memory;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Digital Libraries, 1996. ADL '96., Proceedings of the Third Forum on Research and Technology Advances in
  • Conference_Location
    Washington, DC
  • Print_ISBN
    0-8186-7403-2
  • Type

    conf

  • DOI
    10.1109/ADL.1996.502523
  • Filename
    502523