DocumentCode
179224
Title
Synthesizing real-time speech-driven facial animation
Author
Changwei Luo ; Jun Yu ; Zengfu Wang
Author_Institution
Dept. of Autom., Univ. of Sci. & Technol. of China, Hefei, China
fYear
2014
fDate
4-9 May 2014
Firstpage
4568
Lastpage
4572
Abstract
We present a real-time speech-driven facial animation system. In this system, Gaussian Mixture Models (GMM) are employed to perform the audio-to-visual conversion. The conventional GMM-based method performs the conversion frame by frame using minimum mean square error (MMSE) estimation. The method is reasonably effective. However, discontinuities often appear in the sequences of estimated visual features. To solve this problem, we incorporate previous visual features into the conversion so that the conversion procedure is performed in the manner of a Markov chain. After audio-to-visual conversion, the estimated visual features are transformed to blendshape weights to synthesize facial animation. Experiments show that our system can accurately convert audio features into visual features. The conversion accuracy is comparable to a current state-of-the-art trajectory-based approach. Moreover, our system runs in real time and outputs high quality lip-sync animations.
Keywords
Gaussian processes; Markov processes; audio signal processing; computer animation; mixture models; GMM-based method; Gaussian mixture models; Markov chain; audio features; audio-to-visual conversion; blendshape weights; facial animation synthesis; high-quality lip-sync animations; real-time speech-driven facial animation synthesis; visual feature sequences; Facial animation; Real-time systems; Shape; Speech; Vectors; Visualization; GMM; audio-to-visual conversion; blendshape; facial animation;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
Conference_Location
Florence
Type
conf
DOI
10.1109/ICASSP.2014.6854467
Filename
6854467
Link To Document