Speech driven talking head from estimated articulatory features

Author

Ben-Youssef, Atef ; Shimodaira, Hiroshi ; Braude, David A.

Author_Institution

Centre for Speech Technol. Res., Univ. of Edinburgh, Edinburgh, UK

fYear

2014

fDate

4-9 May 2014

Firstpage

4573

Lastpage

4577

Abstract

In this paper, we present a talking head in which the lips and head motion are controlled using articulatory movements estimated from speech. A phone-size HMM-based inversion mapping is employed and trained in a semi-supervised fashion. The advantage of the use of articulatory features is that they can drive the lips motions and they have a close link with head movements. Speech inversion normally requires the training data recorded with electromagnetic articulograph (EMA), which restricts the naturalness of head movements. The present study considers a more realistic recording condition where the training data for the target speaker are recorded with a usual motion capture system rather than EMA. Different temporal clustering techniques are investigated for HMM-based mapping as well as a GMM-based frame-wise mapping as a baseline system. Objective and subjective experiments show that the synthesised motions are more natural using an HMM system than a GMM one, and estimated EMA features outperform prosodic features.

Keywords

Gaussian processes; data recording; hidden Markov models; mixture models; speech processing; EMA features; GMM-based frame-wise mapping; Gaussian mixture model; baseline system; electromagnetic articulograph; estimated articulatory features; head motion; hidden Markov model; lips motion; motion capture system; phone-size HMM-based inversion mapping; realistic recording condition; semisupervised fashion; speech driven talking head; speech inversion; target speaker; temporal clustering techniques; training data record; Acoustics; Animation; Hidden Markov models; Lips; Magnetic heads; Motion segmentation; Speech; clustering; head motion synthesis; inversion mapping;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on

Conference_Location

Florence

Type

conf

DOI

10.1109/ICASSP.2014.6854468

Filename

6854468