• DocumentCode
    2716405
  • Title

    Image categorization using Fisher kernels of non-iid image models

  • Author

    Cinbis, Ramazan Gokberk ; Verbeek, Jakob ; Schmid, Cordelia

  • Author_Institution
    LEAR, INRIA Grenoble, Grenoble, France
  • fYear
    2012
  • fDate
    16-21 June 2012
  • Firstpage
    2184
  • Lastpage
    2191
  • Abstract
    The bag-of-words (BoW) model treats images as an unordered set of local regions and represents them by visual word histograms. Implicitly, regions are assumed to be identically and independently distributed (iid), which is a poor assumption from a modeling perspective. We introduce non-iid models by treating the parameters of BoW models as latent variables which are integrated out, rendering all local regions dependent. Using the Fisher kernel we encode an image by the gradient of the data log-likelihood w.r.t. hyper-parameters that control priors on the model parameters. Our representation naturally involves discounting transformations similar to taking square-roots, providing an explanation of why such transformations have proven successful. Using variational inference we extend the basic model to include Gaussian mixtures over local descriptors, and latent topic models to capture the co-occurrence structure of visual words, both improving performance. Our models yield state-of-the-art categorization performance using linear classifiers; without using non-linear transformations such as taking square-roots of features, or using (approximate) explicit embeddings of non-linear kernels.
  • Keywords
    Gaussian processes; gradient methods; image classification; image coding; inference mechanisms; BoW models; Fisher kernels; Gaussian mixtures; bag-of-words model; data log-likelihood gradient; hyper-parameters; identically and independently distributed regions; image categorization; image encoding; latent topic models; linear classifiers; non-iid image models; square-roots; variational inference; visual word histograms; Computational modeling; Histograms; Image representation; Kernel; Mathematical model; Vectors; Visualization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on
  • Conference_Location
    Providence, RI
  • ISSN
    1063-6919
  • Print_ISBN
    978-1-4673-1226-4
  • Electronic_ISBN
    1063-6919
  • Type

    conf

  • DOI
    10.1109/CVPR.2012.6247926
  • Filename
    6247926