DocumentCode
932799
Title
Modeling individual and group actions in meetings with layered HMMs
Author
Zhang, Dong ; Gatica-Perez, Daniel ; Bengio, Samy ; McCowan, Iain
Volume
8
Issue
3
fYear
2006
fDate
6/1/2006 12:00:00 AM
Firstpage
509
Lastpage
520
Abstract
We address the problem of recognizing sequences of human interaction patterns in meetings, with the goal of structuring them in semantic terms. The investigated patterns are inherently group-based (defined by the individual activities of meeting participants, and their interplay), and multimodal (as captured by cameras and microphones). By defining a proper set of individual actions, group actions can be modeled as a two-layer process, one that models basic individual activities from low-level audio-visual (AV) features,and another one that models the interactions. We propose a two-layer hidden Markov model (HMM) framework that implements such concept in a principled manner, and that has advantages over previous works. First, by decomposing the problem hierarchically, learning is performed on low-dimensional observation spaces, which results in simpler models. Second, our framework is easier to interpret, as both individual and group actions have a clear meaning, and thus easier to improve. Third, different HMMs can be used in each layer, to better reflect the nature of each subproblem. Our framework is general and extensible, and we illustrate it with a set of eight group actions, using a public 5-hour meeting corpus. Experiments and comparison with a single-layer HMM baseline system show its validity.
Keywords
hidden Markov models; human computer interaction; speech processing; speech recognition; HMM; human interaction pattern sequence recognition; low-level audio-visual features; multimodal processing; public 5-hour meeting corpus; two-layer hidden Markov model framework; Cameras; Computer vision; Hidden Markov models; Humans; Information analysis; Information retrieval; Microphones; Pattern recognition; Speech analysis; Speech processing; Human interaction recognition; multimodal processing and multimedia applications; statistical models;
fLanguage
English
Journal_Title
Multimedia, IEEE Transactions on
Publisher
ieee
ISSN
1520-9210
Type
jour
DOI
10.1109/TMM.2006.870735
Filename
1632036
Link To Document