Title :
Detection of Cut-and-Paste in Document Images
Author :
Gandhi, Anshul ; Jawahar, C.V.
Author_Institution :
Center for Visual Inf. Technol., IIIT Hyderabad, Hyderabad, India
Abstract :
Many documents are created by Cut-And-Paste (CAP) of existing documents. In this paper, we proposed a novel technique to detect CAP in document images. This can help in detecting unethical CAP in document image collections. Our solution is recognition free, and scalable to large collection of documents. Our formulation is also independent of the imaging process (camera based or scanner based) and does not use any language specific information for matching across documents. We model the solution as finding a mixture of homographies, and design a linear programming (LP) based solution to compute the same. Our method is presently limited by the fact that we do not support detection of CAP in documents formed by editing of the textual content. Our experiments demonstrate that without loss of generality (i.e. without assuming the number of source documents), we can correctly detect and match the CAP content in a questioned document image by simultaneously comparing with large number of images in the database. We achieve the CAP detection accuracy of as high as 90%, even when the spatial extent of the CAP content in a document image is as small as 15% of the entire image area.
Keywords :
document image processing; linear programming; cut-and-paste detection; document image collections; imaging process; linear programming based solution; textual content; unethical CAP detection; Accuracy; Cameras; Databases; Plagiarism; Robustness; Visualization; camera-based document image processing; document retrieval; linear programming and optimization; plagiarism detection;
Conference_Titel :
Document Analysis and Recognition (ICDAR), 2013 12th International Conference on
Conference_Location :
Washington, DC
DOI :
10.1109/ICDAR.2013.134