• DocumentCode
    2054671
  • Title

    dSCAM: finding document copies across multiple databases

  • Author

    García-Molina, Hkctor ; Gravano, Luis ; Shivakumar, Narayanan

  • Author_Institution
    Dept. of Comput. Sci., Stanford Univ., CA, USA
  • fYear
    1996
  • fDate
    18-20 Dec 1996
  • Firstpage
    68
  • Lastpage
    79
  • Abstract
    The advent of the Internet has made the illegal dissemination of copyrighted material easy. An important problem is how to automatically detect when a “new” digital document is “suspiciously close” to existing ones. The SCAM project at Stanford University has addressed this problem when there is a single registered-document database. However, in practice, test documents may appear in many autonomous databases, and one would like to discover copies without having to exhaustively search in all databases. The authors´ approach, dSCAM, is a distributed version of SCAM that keeps succinct metainformation about the contents of the available document databases. Given a suspicious document S, dSCAM uses its information to prune all databases that cannot contain any document that is close enough to S, and hence the search can focus on the remaining sites. They also study how to query the remaining databases so as to minimize different querying costs. They empirically study the pruning and searching schemes, using a collection of 50 databases and two sets of test documents
  • Keywords
    Internet; computer crime; copy protection; copyright; distributed databases; query processing; Internet; SCAM project; autonomous database; dSCAM; database pruning; database querying; digital document; document copy finding; illegal copyrighted material dissemination; multiple databases; querying costs; searching schemes; succinct metainformation; test documents; Computer science; Costs; Distributed databases; Information retrieval; Internet; Law; Legal factors; Software libraries; Testing; Writing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Information Systems, 1996., Fourth International Conference on
  • Conference_Location
    Miami Beach, FL
  • Print_ISBN
    0-8186-7475X
  • Type

    conf

  • DOI
    10.1109/PDIS.1996.568668
  • Filename
    568668