• DocumentCode
    3424951
  • Title

    Action and Event Recognition with Fisher Vectors on a Compact Feature Set

  • Author

    Oneata, Dan ; Verbeek, Jakob ; Schmid, Cordelia

  • Author_Institution
    France Lab. Jean Kuntzmann, INRIA Grenoble, Grenoble, France
  • fYear
    2013
  • fDate
    1-8 Dec. 2013
  • Firstpage
    1817
  • Lastpage
    1824
  • Abstract
    Action recognition in uncontrolled video is an important and challenging computer vision problem. Recent progress in this area is due to new local features and models that capture spatio-temporal structure between local features, or human-object interactions. Instead of working towards more complex models, we focus on the low-level features and their encoding. We evaluate the use of Fisher vectors as an alternative to bag-of-word histograms to aggregate a small set of state-of-the-art low-level descriptors, in combination with linear classifiers. We present a large and varied set of evaluations, considering (i) classification of short actions in five datasets, (ii) localization of such actions in feature-length movies, and (iii) large-scale recognition of complex events. We find that for basic action recognition and localization MBH features alone are enough for state-of-the-art performance. For complex events we find that SIFT and MFCC features provide complementary cues. On all three problems we obtain state-of-the-art results, while using fewer features and less complex models.
  • Keywords
    computer vision; feature extraction; image classification; video signal processing; Fisher vectors; MBH feature localization; SIFT features; action recognition; bag-of-word histograms; compact feature set; complementary cues; complex event recognition; computer vision problem; feature-length movies; human-object interactions; linear classifiers; local features; low-level features; spatio-temporal structure; uncontrolled video; Encoding; Feature extraction; Hidden Markov models; Histograms; Motion pictures; Vectors; Visualization; Fisher vectors; action localization; action recognition; bag of visual words; dense trajectories; evaluation; event recognition; uncontrolled realistic videos;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Vision (ICCV), 2013 IEEE International Conference on
  • Conference_Location
    Sydney, NSW
  • ISSN
    1550-5499
  • Type

    conf

  • DOI
    10.1109/ICCV.2013.228
  • Filename
    6751336