• DocumentCode
    109969
  • Title

    Two-Level Hierarchical Alignment for Semi-Coupled HMM-Based Audiovisual Emotion Recognition With Temporal Course

  • Author

    Chung-Hsien Wu ; Jen-Chun Lin ; Wen-Li Wei

  • Author_Institution
    Dept. of Comput. Sci. & Inf. Eng., Nat. Cheng Kung Univ., Tainan, Taiwan
  • Volume
    15
  • Issue
    8
  • fYear
    2013
  • fDate
    Dec. 2013
  • Firstpage
    1880
  • Lastpage
    1895
  • Abstract
    A complete emotional expression typically contains a complex temporal course in face-to-face natural conversation. To address this problem, a bimodal hidden Markov model (HMM)-based emotion recognition scheme, constructed in terms of sub-emotional states, which are defined to represent temporal phases of onset, apex, and offset, is adopted to model the temporal course of an emotional expression for audio and visual signal streams. A two-level hierarchical alignment mechanism is proposed to align the relationship within and between the temporal phases in the audio and visual HMM sequences at the model and state levels in a proposed semi-coupled hidden Markov model (SC-HMM). Furthermore, by integrating a sub-emotion language model, which considers the temporal transition between sub-emotional states, the proposed two-level hierarchical alignment-based SC-HMM (2H-SC-HMM) can provide a constraint on allowable temporal structures to determine an optimal emotional state. Experimental results show that the proposed approach can yield satisfactory results in both the posed MHMC and the naturalistic SEMAINE databases, and shows that modeling the complex temporal structure is useful to improve the emotion recognition performance, especially for the naturalistic database (i.e., natural conversation). The experimental results also confirm that the proposed 2H-SC-HMM can achieve an acceptable performance for the systems with sparse training data or noisy conditions.
  • Keywords
    audio signal processing; audio streaming; audio-visual systems; emotion recognition; hidden Markov models; human computer interaction; video streaming; 2H-SC-HMM; MHMC; audio HMM sequence; audio signal streaming; audiovisual emotion recognition; emotional expression recognition; face-to-face natural conversation; hidden Markov model; hierarchical alignment mechanism; naturalistic SEMAINE database; naturalistic database; semicoupled HMM; sparse training data; subemotion language model; subemotional state; temporal course; temporal phases represent; temporal structure; temporal transition; visual HMM sequence; visual signal streaming; Emotion recognition; semi-coupled hidden Markov model (SC-HMM); temporal course;
  • fLanguage
    English
  • Journal_Title
    Multimedia, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1520-9210
  • Type

    jour

  • DOI
    10.1109/TMM.2013.2269314
  • Filename
    6542683