Title :
Speech driven photo-realistic face animation with mouth and jaw dynamics
Author :
Ying He ; Yong Zhao ; Dongmei Jiang ; Sahli, Hichem
Author_Institution :
VUB-NPU Joint Res. Group on AVSP, Northwestern Polytech. Univ., Xi´an, China
fDate :
Oct. 29 2013-Nov. 1 2013
Abstract :
This paper proposes a system that transforms speech waveform to photo-realistic speech-synchronized talking face animations. We expand the multi-modal diviseme unit selection based mouth animation system of [8] to a full photo realistic facial animation system based on (i) modeling of the non-rigid deformations of the mouth and jaw via a general regression neural network, (ii) multi-resolution image blending approach for fusing the synthesized mouth image to the full face image, and (iii) synthesizing natural head poses or deflections using a modified version of the generalized procrustes analysis for face image alignment. The paper describes the main principles of the proposed method and illustrates its results on a set of testing speech sequences, together with qualitative and quantitative comparisons with results from the approach of the recognized system Video Rewrite. Experimental results show that the proposed method obtains realistic facial animations with very natural mouth and jaw movements coincident with the input speech.
Keywords :
computer animation; face recognition; image fusion; image resolution; neural nets; realistic images; regression analysis; speech processing; synchronisation; face image alignment; full-face image; general regression neural network; generalized procrustes analysis; image fusion; input speech sequences; jaw dynamics; mouth dynamics; multimodal diviseme unit selection-based mouth animation system; multiresolution image blending approach; natural head deflection synthesis; natural head pose synthesis; natural jaw movements; natural mouth movements; nonrigid deformation modeling; photo-realistic speech-synchronized talking face animations; qualitative analysis; quantitative analysis; speech driven photo-realistic face animation; speech waveform; synthesized mouth image; video rewrite recognition system; Face; Facial animation; Mouth; Shape; Speech; Visualization; Diviseme unit selection; Face image alignment; General regression neural network; Multi-resolution image blending;
Conference_Titel :
Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2013 Asia-Pacific
Conference_Location :
Kaohsiung
DOI :
10.1109/APSIPA.2013.6694186