• DocumentCode
    1366729
  • Title

    Large-Scale Discovery of Spatially Related Images

  • Author

    Chum, Ondrej ; Matas, Jirí

  • Author_Institution
    Fac. of Electr. Eng., Czech Tech. Univ., Prague, Czech Republic
  • Volume
    32
  • Issue
    2
  • fYear
    2010
  • Firstpage
    371
  • Lastpage
    377
  • Abstract
    We propose a randomized data mining method that finds clusters of spatially overlapping images. The core of the method relies on the min-Hash algorithm for fast detection of pairs of images with spatial overlap, the so-called cluster seeds. The seeds are then used as visual queries to obtain clusters which are formed as transitive closures of sets of partially overlapping images that include the seed. We show that the probability of finding a seed for an image cluster rapidly increases with the size of the cluster. The properties and performance of the algorithm are demonstrated on data sets with 104, 105, and 5 ?? 106 images. The speed of the method depends on the size of the database and the number of clusters. The first stage of seed generation is close to linear for databases sizes up to approximately 234 ?? 1010 images. On a single 2.4 GHz PC, the clustering process took only 24 minutes for a standard database of more than 100,000 images, i.e., only 0.014 seconds per image.
  • Keywords
    data mining; image retrieval; pattern clustering; probability; image clustering process; large-scale discovery; min-Hash algorithm; randomized data mining; spatially overlapping image; visual queries; bag of words.; image clustering; image retrieval; minHash;
  • fLanguage
    English
  • Journal_Title
    Pattern Analysis and Machine Intelligence, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0162-8828
  • Type

    jour

  • DOI
    10.1109/TPAMI.2009.166
  • Filename
    5235143