• DocumentCode
    2684762
  • Title

    Data compression using long common strings

  • Author

    Bentley, Jon ; McIlroy, Douglas

  • Author_Institution
    AT&T Bell Labs., Murray Hill, NJ, USA
  • fYear
    1999
  • fDate
    29-31 Mar 1999
  • Firstpage
    287
  • Lastpage
    295
  • Abstract
    We describe a precompression algorithm that effectively represents any long common strings that appear in a file. The algorithm interacts well with standard compression algorithms that represent shorter strings that are near in the input text. Our experiments show that some real data sets do indeed contain many long common strings. We extend the fingerprint mechanisms of our algorithm to a program that identifies long common strings in an input file. This program gives interesting insights into the structure of real data files that contain long common strings
  • Keywords
    data compression; data structures; string matching; text analysis; data compression; data file structure; data sets; fingerprint mechanisms; long common strings; precompression algorithm; string representation; text; Code standards; Compression algorithms; Computer science; Constitution; Data compression; Educational institutions; Fingerprint recognition; Plagiarism; Software libraries; Software systems;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Compression Conference, 1999. Proceedings. DCC '99
  • Conference_Location
    Snowbird, UT
  • ISSN
    1068-0314
  • Print_ISBN
    0-7695-0096-X
  • Type

    conf

  • DOI
    10.1109/DCC.1999.755678
  • Filename
    755678