DocumentCode :
3485474
Title :
VisualDiff: Document Image Verification and Change Detection
Author :
Jain, R. ; Doermann, David
Author_Institution :
Language & Multimedia Process. Lab., Univ. of Maryland, College Park, MD, USA
fYear :
2013
fDate :
25-28 Aug. 2013
Firstpage :
40
Lastpage :
44
Abstract :
This paper explores the related problems of verification and change detection in document images. The goal is to determine if two document images differ, and if so, to determine precisely what content may have been added, deleted, or otherwise modified. This problem has many potential applications, especially for important legal documents such as contractual agreements. These agreements are often edited, shared and stored as scanned or hardcopy documents, where small, undetected changes between edits could create major differences in the contractual language and thus have severe repercussions. One can view the problem of change detection as tracing the revision history of a set of documents. Thus, in order to validate the performance of this approach, we created the "Enron Revisions" dataset. This dataset contains realistic revisions obtained from attachments in the Enron Corpus, and a series of before and after snapshots of the revisions in images with varying levels of noise from resolution, binarization, and blur. The approach taken in this paper utilizes the SIFT descriptor to align two document images without the benefit of OCR and once aligned, to compare dense descriptors to determine changes that have occurred within the image. As a baseline, this "VisualDiff" is compared to a UNIX diff-like approach on text extracted through OCR and results demonstrate the effectiveness of this approach.
Keywords :
document image processing; image resolution; OCR; SIFT descriptor; UNIX diff-like approach; VisualDiff; change detection; contractual agreements; contractual language; document image verification; enron revision dataset; image binarization; image blur; image resolution; important legal documents; realistic revisions; Accuracy; Change detection algorithms; Contracts; Feature extraction; Image segmentation; Optical character recognition software; Robustness; Change Detection; Document Image; Document Verification;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition (ICDAR), 2013 12th International Conference on
Conference_Location :
Washington, DC
ISSN :
1520-5363
Type :
conf
DOI :
10.1109/ICDAR.2013.17
Filename :
6628582
Link To Document :
بازگشت