• DocumentCode
    79738
  • Title

    Discovering Motion Primitives for Unsupervised Grouping and One-Shot Learning of Human Actions, Gestures, and Expressions

  • Author

    Yang Yang ; Saleemi, Imran ; Shah, Mubarak

  • Author_Institution
    Dept. of Electr. Eng. & Comput. Sci. (EECS), Univ. of Central Florida (UCF), Orlando, FL, USA
  • Volume
    35
  • Issue
    7
  • fYear
    2013
  • fDate
    Jul-13
  • Firstpage
    1635
  • Lastpage
    1648
  • Abstract
    This paper proposes a novel representation of articulated human actions and gestures and facial expressions. The main goals of the proposed approach are: 1) to enable recognition using very few examples, i.e., one or k-shot learning, and 2) meaningful organization of unlabeled datasets by unsupervised clustering. Our proposed representation is obtained by automatically discovering high-level subactions or motion primitives, by hierarchical clustering of observed optical flow in four-dimensional, spatial, and motion flow space. The completely unsupervised proposed method, in contrast to state-of-the-art representations like bag of video words, provides a meaningful representation conducive to visual interpretation and textual labeling. Each primitive action depicts an atomic subaction, like directional motion of limb or torso, and is represented by a mixture of four-dimensional Gaussian distributions. For one--shot and k-shot learning, the sequence of primitive labels discovered in a test video are labeled using KL divergence, and can then be represented as a string and matched against similar strings of training videos. The same sequence can also be collapsed into a histogram of primitives or be used to learn a Hidden Markov model to represent classes. We have performed extensive experiments on recognition by one and k-shot learning as well as unsupervised action clustering on six human actions and gesture datasets, a composite dataset, and a database of facial expressions. These experiments confirm the validity and discriminative nature of the proposed representation.
  • Keywords
    Gaussian distribution; face recognition; gesture recognition; hidden Markov models; image matching; image motion analysis; image representation; image sequences; learning (artificial intelligence); pattern clustering; video signal processing; KL divergence; atomic subaction; composite dataset; facial expression database; four-dimensional Gaussian distributions; four-dimensional space; gesture datasets; hidden Markov model; hierarchical clustering; high-level subaction discovery; human action representation; human actions; k-shot learning; motion flow space; motion primitive discovery; one-shot learning; optical flow; primitive histogram; spatial space; string matching; textual labeling; training videos; unsupervised clustering; unsupervised grouping; visual interpretation; Histograms; Humans; Joints; Optical imaging; Spatiotemporal phenomena; Training; Vectors; Hidden Markov model; Human actions; action recognition; action representation; facial expressions; gestures; histogram of motion primitives; motion patterns; motion primitives; motion primitives strings; one-shot learning; unsupervised clustering; Algorithms; Cluster Analysis; Databases, Factual; Facial Expression; Gestures; Human Activities; Humans; Image Processing, Computer-Assisted; Markov Chains; Movement; Pattern Recognition, Automated; Video Recording;
  • fLanguage
    English
  • Journal_Title
    Pattern Analysis and Machine Intelligence, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0162-8828
  • Type

    jour

  • DOI
    10.1109/TPAMI.2012.253
  • Filename
    6365192