Title :
Synthesizing real-time speech-driven facial animation
Author :
Changwei Luo ; Jun Yu ; Zengfu Wang
Author_Institution :
Dept. of Autom., Univ. of Sci. & Technol. of China, Hefei, China
Abstract :
We present a real-time speech-driven facial animation system. In this system, Gaussian Mixture Models (GMM) are employed to perform the audio-to-visual conversion. The conventional GMM-based method performs the conversion frame by frame using minimum mean square error (MMSE) estimation. The method is reasonably effective. However, discontinuities often appear in the sequences of estimated visual features. To solve this problem, we incorporate previous visual features into the conversion so that the conversion procedure is performed in the manner of a Markov chain. After audio-to-visual conversion, the estimated visual features are transformed to blendshape weights to synthesize facial animation. Experiments show that our system can accurately convert audio features into visual features. The conversion accuracy is comparable to a current state-of-the-art trajectory-based approach. Moreover, our system runs in real time and outputs high quality lip-sync animations.
Keywords :
Gaussian processes; Markov processes; audio signal processing; computer animation; mixture models; GMM-based method; Gaussian mixture models; Markov chain; audio features; audio-to-visual conversion; blendshape weights; facial animation synthesis; high-quality lip-sync animations; real-time speech-driven facial animation synthesis; visual feature sequences; Facial animation; Real-time systems; Shape; Speech; Vectors; Visualization; GMM; audio-to-visual conversion; blendshape; facial animation;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
Conference_Location :
Florence
DOI :
10.1109/ICASSP.2014.6854467