Speech driven photo-realistic face animation with mouth and jaw dynamics

Author

Ying He ; Yong Zhao ; Dongmei Jiang ; Sahli, Hichem

Author_Institution

VUB-NPU Joint Res. Group on AVSP, Northwestern Polytech. Univ., Xi´an, China

fYear

2013

fDate

Oct. 29 2013-Nov. 1 2013

Firstpage

1

Lastpage

4

Abstract

This paper proposes a system that transforms speech waveform to photo-realistic speech-synchronized talking face animations. We expand the multi-modal diviseme unit selection based mouth animation system of [8] to a full photo realistic facial animation system based on (i) modeling of the non-rigid deformations of the mouth and jaw via a general regression neural network, (ii) multi-resolution image blending approach for fusing the synthesized mouth image to the full face image, and (iii) synthesizing natural head poses or deflections using a modified version of the generalized procrustes analysis for face image alignment. The paper describes the main principles of the proposed method and illustrates its results on a set of testing speech sequences, together with qualitative and quantitative comparisons with results from the approach of the recognized system Video Rewrite. Experimental results show that the proposed method obtains realistic facial animations with very natural mouth and jaw movements coincident with the input speech.

Keywords

computer animation; face recognition; image fusion; image resolution; neural nets; realistic images; regression analysis; speech processing; synchronisation; face image alignment; full-face image; general regression neural network; generalized procrustes analysis; image fusion; input speech sequences; jaw dynamics; mouth dynamics; multimodal diviseme unit selection-based mouth animation system; multiresolution image blending approach; natural head deflection synthesis; natural head pose synthesis; natural jaw movements; natural mouth movements; nonrigid deformation modeling; photo-realistic speech-synchronized talking face animations; qualitative analysis; quantitative analysis; speech driven photo-realistic face animation; speech waveform; synthesized mouth image; video rewrite recognition system; Face; Facial animation; Mouth; Shape; Speech; Visualization; Diviseme unit selection; Face image alignment; General regression neural network; Multi-resolution image blending;

fLanguage

English

Publisher

ieee

Conference_Titel

Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2013 Asia-Pacific

Conference_Location

Kaohsiung

Type

conf

DOI

10.1109/APSIPA.2013.6694186

Filename

6694186