DocumentCode
2684762
Title
Data compression using long common strings
Author
Bentley, Jon ; McIlroy, Douglas
Author_Institution
AT&T Bell Labs., Murray Hill, NJ, USA
fYear
1999
fDate
29-31 Mar 1999
Firstpage
287
Lastpage
295
Abstract
We describe a precompression algorithm that effectively represents any long common strings that appear in a file. The algorithm interacts well with standard compression algorithms that represent shorter strings that are near in the input text. Our experiments show that some real data sets do indeed contain many long common strings. We extend the fingerprint mechanisms of our algorithm to a program that identifies long common strings in an input file. This program gives interesting insights into the structure of real data files that contain long common strings
Keywords
data compression; data structures; string matching; text analysis; data compression; data file structure; data sets; fingerprint mechanisms; long common strings; precompression algorithm; string representation; text; Code standards; Compression algorithms; Computer science; Constitution; Data compression; Educational institutions; Fingerprint recognition; Plagiarism; Software libraries; Software systems;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Compression Conference, 1999. Proceedings. DCC '99
Conference_Location
Snowbird, UT
ISSN
1068-0314
Print_ISBN
0-7695-0096-X
Type
conf
DOI
10.1109/DCC.1999.755678
Filename
755678
Link To Document