DocumentCode
3153122
Title
Hybrid phoneme based clustering approach for audio driven facial animation
Author
Havell, Benjamin ; Rosin, Paul L. ; Sanei, Saeid ; Aubrey, Andrew ; Marshall, David ; Hicks, Yulia
Author_Institution
Sch. of Comput. Sci. & Inf., Cardiff Univ., Cardiff, UK
fYear
2012
fDate
25-30 March 2012
Firstpage
2261
Lastpage
2264
Abstract
We consider the problem of producing accurate facial animation corresponding to a given input speech signal. A popular technique previously used for Audio Driven Facial Animation is to build a joint audio-visual model using Active Appearance Models (AAMs) to represent possible facial variations and Hidden Markov Models (HMMs) to select the correct appearance based on the input audio. However there are several questions that remained unanswered. In particular the choice of clustering technique and the choice of the number of clusters in the HMM may have significant influence over the quality of the produced videos. We have investigated a range of clustering techniques in order to improve the quality of the HMM produced, and proposed a new structure based on using Gaussian Mixture Models (GMMs) to model each phoneme separately. We compared our approach to several alternatives using a public dataset of 300 phonetically labeled sentences spoken by a single person and found that our approach produces more accurate animation. In addition, we use a hybrid approach where the training data is phonetically labeled thus producing a model with better separation of phonemes, but test audio data is not labeled, thus making our approach for generating facial animation less laborious and fully automatic.
Keywords
Gaussian processes; computer animation; hidden Markov models; speech processing; AAM; GMM; Gaussian mixture models; HMM; active appearance models; audio driven facial animation; hidden Markov models; hybrid phoneme based clustering approach; input speech signal; joint audio-visual model; Active appearance model; Data models; Facial animation; Hidden Markov models; Principal component analysis; Speech;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on
Conference_Location
Kyoto
ISSN
1520-6149
Print_ISBN
978-1-4673-0045-2
Electronic_ISBN
1520-6149
Type
conf
DOI
10.1109/ICASSP.2012.6288364
Filename
6288364
Link To Document