• DocumentCode
    1403450
  • Title

    Learning Sparse Representations for Human Action Recognition

  • Author

    Guha, Tanaya ; Ward, Rabab Kreidieh

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Univ. of British Columbia, Vancouver, BC, Canada
  • Volume
    34
  • Issue
    8
  • fYear
    2012
  • Firstpage
    1576
  • Lastpage
    1588
  • Abstract
    This paper explores the effectiveness of sparse representations obtained by learning a set of overcomplete basis (dictionary) in the context of action recognition in videos. Although this work concentrates on recognizing human movements-physical actions as well as facial expressions-the proposed approach is fairly general and can be used to address other classification problems. In order to model human actions, three overcomplete dictionary learning frameworks are investigated. An overcomplete dictionary is constructed using a set of spatio-temporal descriptors (extracted from the video sequences) in such a way that each descriptor is represented by some linear combination of a small number of dictionary elements. This leads to a more compact and richer representation of the video sequences compared to the existing methods that involve clustering and vector quantization. For each framework, a novel classification algorithm is proposed. Additionally, this work also presents the idea of a new local spatio-temporal feature that is distinctive, scale invariant, and fast to compute. The proposed approach repeatedly achieves state-of-the-art results on several public data sets containing various physical actions and facial expressions.
  • Keywords
    dictionaries; face recognition; gesture recognition; image classification; image representation; image sequences; learning (artificial intelligence); pattern clustering; vector quantisation; video signal processing; classification problem; clustering; dictionary element; facial expression; human action model; human action recognition; human movement recognition; overcomplete dictionary learning framework; physical action; sparse representation; spatio-temporal descriptor; vector quantization; video sequence representation; Detectors; Dictionaries; Feature extraction; Humans; Vectors; Video sequences; Videos; Action recognition; dictionary learning; expression recognition; orthogonal matching pursuit; overcomplete; sparse representation; spatio-temporal descriptors.; Algorithms; Artificial Intelligence; Dancing; Databases, Factual; Facial Expression; Humans; Image Processing, Computer-Assisted; Models, Theoretical; Movement; Pattern Recognition, Automated; Sports; Terminology as Topic; Video Recording;
  • fLanguage
    English
  • Journal_Title
    Pattern Analysis and Machine Intelligence, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0162-8828
  • Type

    jour

  • DOI
    10.1109/TPAMI.2011.253
  • Filename
    6109282