• DocumentCode
    2061134
  • Title

    Document image similarity and equivalence detection

  • Author

    Hull, Jonathan J. ; Cullen, John F.

  • Author_Institution
    Ricoh California Res. Center, Menlo Park, CA, USA
  • Volume
    1
  • fYear
    1997
  • fDate
    18-20 Aug 1997
  • Firstpage
    308
  • Abstract
    A hierarchical algorithm is presented for determining the similarity and equivalence of document images. Features extracted from the CCITT fax compressed representations of two images are compared to determine their visual similarity and whether they are equivalent. Pass codes in the compressed data are used as features. A fixed grid is imposed on the image and a feature vector is derived from the number of pass codes in each grid cell. The feature vectors are compared to locate a group of documents that are visually similar to the input image. The equivalence of two documents is determined by applying the Hausdorff distance to the two dimensional arrangement of pass codes in small patches of each image
  • Keywords
    document image processing; facsimile; feature extraction; image coding; image representation; telecommunication standards; CCITT fax compressed representations; Hausdorff distance; compressed data; document image similarity; document images; equivalence detection; feature extraction; feature vector; fixed grid; grid cell; hierarchical algorithm; pass codes; small patches; two dimensional arrangement; visual similarity; Business; Data mining; Feature extraction; Grid computing; Image analysis; Image coding; Image databases; Spatial databases; Text analysis; Visual databases;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition, 1997., Proceedings of the Fourth International Conference on
  • Conference_Location
    Ulm
  • Print_ISBN
    0-8186-7898-4
  • Type

    conf

  • DOI
    10.1109/ICDAR.1997.619862
  • Filename
    619862