• DocumentCode
    1524951
  • Title

    Generating Human-Like Behaviors Using Joint, Speech-Driven Models for Conversational Agents

  • Author

    Mariooryad, Soroosh ; Busso, Carlos

  • Author_Institution
    Multimodal Signal Processing (MSP) laboratory, The University of Texas at Dallas, Richardson, TX, USA
  • Volume
    20
  • Issue
    8
  • fYear
    2012
  • Firstpage
    2329
  • Lastpage
    2340
  • Abstract
    During human communication, every spoken message is intrinsically modulated within different verbal and nonverbal cues that are externalized through various aspects of speech and facial gestures. These communication channels are strongly interrelated, which suggests that generating human-like behavior requires a careful study of their relationship. Neglecting the mutual influence of different communicative channels in the modeling of natural behavior for a conversational agent may result in unrealistic behaviors that can affect the intended visual perception of the animation. This relationship exists both between audiovisual information and within different visual aspects. This paper explores the idea of using joint models to preserve the coupling not only between speech and facial expression, but also within facial gestures. As a case study, the paper focuses on building a speech-driven facial animation framework to generate natural head and eyebrow motions. We propose three dynamic Bayesian networks (DBNs), which make different assumptions about the coupling between speech, eyebrow and head motion. Synthesized animations are produced based on the MPEG-4 facial animation standard, using the audiovisual IEMOCAP database. The experimental results based on perceptual evaluations reveal that the proposed joint models (speech/eyebrow/head) outperform audiovisual models that are separately trained (speech/head and speech/eyebrow).
  • Keywords
    Eyebrows; Facial animation; Hidden Markov models; Humans; Speech; Speech processing; Conversational agent (CA); dynamic Bayesian network (DBN); facial animation; visual prosody;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1558-7916
  • Type

    jour

  • DOI
    10.1109/TASL.2012.2201476
  • Filename
    6205334