• DocumentCode
    179225
  • Title

    Speech driven talking head from estimated articulatory features

  • Author

    Ben-Youssef, Atef ; Shimodaira, Hiroshi ; Braude, David A.

  • Author_Institution
    Centre for Speech Technol. Res., Univ. of Edinburgh, Edinburgh, UK
  • fYear
    2014
  • fDate
    4-9 May 2014
  • Firstpage
    4573
  • Lastpage
    4577
  • Abstract
    In this paper, we present a talking head in which the lips and head motion are controlled using articulatory movements estimated from speech. A phone-size HMM-based inversion mapping is employed and trained in a semi-supervised fashion. The advantage of the use of articulatory features is that they can drive the lips motions and they have a close link with head movements. Speech inversion normally requires the training data recorded with electromagnetic articulograph (EMA), which restricts the naturalness of head movements. The present study considers a more realistic recording condition where the training data for the target speaker are recorded with a usual motion capture system rather than EMA. Different temporal clustering techniques are investigated for HMM-based mapping as well as a GMM-based frame-wise mapping as a baseline system. Objective and subjective experiments show that the synthesised motions are more natural using an HMM system than a GMM one, and estimated EMA features outperform prosodic features.
  • Keywords
    Gaussian processes; data recording; hidden Markov models; mixture models; speech processing; EMA features; GMM-based frame-wise mapping; Gaussian mixture model; baseline system; electromagnetic articulograph; estimated articulatory features; head motion; hidden Markov model; lips motion; motion capture system; phone-size HMM-based inversion mapping; realistic recording condition; semisupervised fashion; speech driven talking head; speech inversion; target speaker; temporal clustering techniques; training data record; Acoustics; Animation; Hidden Markov models; Lips; Magnetic heads; Motion segmentation; Speech; clustering; head motion synthesis; inversion mapping;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
  • Conference_Location
    Florence
  • Type

    conf

  • DOI
    10.1109/ICASSP.2014.6854468
  • Filename
    6854468