Title :
Document image similarity and equivalence detection
Author :
Hull, Jonathan J. ; Cullen, John F.
Author_Institution :
Ricoh California Res. Center, Menlo Park, CA, USA
Abstract :
A hierarchical algorithm is presented for determining the similarity and equivalence of document images. Features extracted from the CCITT fax compressed representations of two images are compared to determine their visual similarity and whether they are equivalent. Pass codes in the compressed data are used as features. A fixed grid is imposed on the image and a feature vector is derived from the number of pass codes in each grid cell. The feature vectors are compared to locate a group of documents that are visually similar to the input image. The equivalence of two documents is determined by applying the Hausdorff distance to the two dimensional arrangement of pass codes in small patches of each image
Keywords :
document image processing; facsimile; feature extraction; image coding; image representation; telecommunication standards; CCITT fax compressed representations; Hausdorff distance; compressed data; document image similarity; document images; equivalence detection; feature extraction; feature vector; fixed grid; grid cell; hierarchical algorithm; pass codes; small patches; two dimensional arrangement; visual similarity; Business; Data mining; Feature extraction; Grid computing; Image analysis; Image coding; Image databases; Spatial databases; Text analysis; Visual databases;
Conference_Titel :
Document Analysis and Recognition, 1997., Proceedings of the Fourth International Conference on
Conference_Location :
Ulm
Print_ISBN :
0-8186-7898-4
DOI :
10.1109/ICDAR.1997.619862