DocumentCode
1801289
Title
Data compression using encrypted text
Author
Franceschini, Robert ; Mukherjee, Amar
Author_Institution
Dept. of Comput. Sci., Univ. of Central Florida, Orlando, FL, USA
fYear
1996
fDate
13-15, May 1996
Firstpage
130
Lastpage
138
Abstract
We present an algorithm for text compression. The basic idea of our algorithm is to define a unique encryption or signature of each word in the dictionary by replacing certain characters in the words by a special character “*” and retaining a few characters so that the word is still retrievable. For any encrypted text the most frequently used character is “*” and the standard compression algorithms can exploit this redundancy in an effective way. We advocate the following compression paradigm: given a compression algorithm A and a text T, we apply the same algorithm A on an encrypted text *T and retrieve the original text via a dictionary which maps the decompressed text *T to the original text T. We report better results for most widely used compression algorithms such as Huffman, LZW, arithmetic, unix compress, gnu-zip with respect to a text corpus. The compression rates using these algorithms are much better than the dictionary based methods reported in the literature. One basic assumption of our algorithm is that the system has access to a dictionary of words used in all the texts along with a corresponding “cryptic” dictionary. The cost of this dictionary is amortized over the compression savings for all the text files handled by the organization. If two organizations wish to exchange information using our compression algorithm, they must share a common dictionary. We compare our methods with other dictionary based methods and present future research problems
Keywords
cryptography; data compression; glossaries; word processing; common dictionary; compression rates; data compression; decompressed text; dictionary based methods; encrypted text; redundancy; special character; standard compression algorithms; text compression; text corpus; text files; Arithmetic; Bandwidth; Compression algorithms; Computer science; Costs; Cryptography; Data compression; Dictionaries; Explosions; Memory;
fLanguage
English
Publisher
ieee
Conference_Titel
Digital Libraries, 1996. ADL '96., Proceedings of the Third Forum on Research and Technology Advances in
Conference_Location
Washington, DC
Print_ISBN
0-8186-7403-2
Type
conf
DOI
10.1109/ADL.1996.502523
Filename
502523
Link To Document