• DocumentCode
    253698
  • Title

    The Language of Actions: Recovering the Syntax and Semantics of Goal-Directed Human Activities

  • Author

    Kuehne, Hilde ; Arslan, A. ; Serre, Thomas

  • Author_Institution
    Fraunhofer FKIE, Bonn, Germany
  • fYear
    2014
  • fDate
    23-28 June 2014
  • Firstpage
    780
  • Lastpage
    787
  • Abstract
    This paper describes a framework for modeling human activities as temporally structured processes. Our approach is motivated by the inherently hierarchical nature of human activities and the close correspondence between human actions and speech: We model action units using Hidden Markov Models, much like words in speech. These action units then form the building blocks to model complex human activities as sentences using an action grammar. To evaluate our approach, we collected a large dataset of daily cooking activities: The dataset includes a total of 52 participants, each performing a total of 10 cooking activities in multiple real-life kitchens, resulting in over 77 hours of video footage. We evaluate the HTK toolkit, a state-of-the-art speech recognition engine, in combination with multiple video feature descriptors, for both the recognition of cooking activities (e.g., making pancakes) as well as the semantic parsing of videos into action units (e.g., cracking eggs). Our results demonstrate the benefits of structured temporal generative approaches over existing discriminative approaches in coping with the complexity of human daily life activities.
  • Keywords
    context-free grammars; hidden Markov models; speech recognition; video signal processing; HTK toolkit; action grammar; cooking activity recognition; goal-directed human activity semantics recovery; goal-directed human activity syntax recovery; hidden Markov models; human action units; human activity modeling; semantic video parsing; speech recognition engine; structured temporal generative approaches; video feature descriptors; video footage; Accuracy; Dairy products; Grammar; Hidden Markov models; Speech; Speech recognition; Sugar;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on
  • Conference_Location
    Columbus, OH
  • Type

    conf

  • DOI
    10.1109/CVPR.2014.105
  • Filename
    6909500