Title :
Copy detection systems for digital documents
Author :
Campbell, Douglas M. ; Chen, Wendy R. ; Smith, Randy D.
Author_Institution :
Dept. of Comput. Sci., Brigham Young Univ., Provo, UT, USA
Abstract :
Partial or total duplication of document content is common to large digital libraries. We present a copy detection system to automate the detection of application in digital documents. The system we present is sentence-based and makes three contributions: it proposes an intuitive definition of similarity between documents; it produces the distribution of overlap that exists between overlapping documents; it is resistant to inaccuracy due to large variations in document size. We report the results of several experiments that illustrate the behavior and functionality of the system
Keywords :
Internet; copy protection; copyright; digital libraries; document handling; copy detection systems; digital documents; document content duplication; experiments; large digital libraries; overlapping documents; sentence-based system; system functionality; Software libraries;
Conference_Titel :
Advances in Digital Libraries, 2000. Proceedings. IEEE
Conference_Location :
Washington, DC
Print_ISBN :
0-7695-0659-3
DOI :
10.1109/ADL.2000.848372