DocumentCode
2061134
Title
Document image similarity and equivalence detection
Author
Hull, Jonathan J. ; Cullen, John F.
Author_Institution
Ricoh California Res. Center, Menlo Park, CA, USA
Volume
1
fYear
1997
fDate
18-20 Aug 1997
Firstpage
308
Abstract
A hierarchical algorithm is presented for determining the similarity and equivalence of document images. Features extracted from the CCITT fax compressed representations of two images are compared to determine their visual similarity and whether they are equivalent. Pass codes in the compressed data are used as features. A fixed grid is imposed on the image and a feature vector is derived from the number of pass codes in each grid cell. The feature vectors are compared to locate a group of documents that are visually similar to the input image. The equivalence of two documents is determined by applying the Hausdorff distance to the two dimensional arrangement of pass codes in small patches of each image
Keywords
document image processing; facsimile; feature extraction; image coding; image representation; telecommunication standards; CCITT fax compressed representations; Hausdorff distance; compressed data; document image similarity; document images; equivalence detection; feature extraction; feature vector; fixed grid; grid cell; hierarchical algorithm; pass codes; small patches; two dimensional arrangement; visual similarity; Business; Data mining; Feature extraction; Grid computing; Image analysis; Image coding; Image databases; Spatial databases; Text analysis; Visual databases;
fLanguage
English
Publisher
ieee
Conference_Titel
Document Analysis and Recognition, 1997., Proceedings of the Fourth International Conference on
Conference_Location
Ulm
Print_ISBN
0-8186-7898-4
Type
conf
DOI
10.1109/ICDAR.1997.619862
Filename
619862
Link To Document