مرکز منطقه ای اطلاع رساني علوم و فناوري - High quality lip-sync animation for 3D photo-realistic talking head

DocumentCode :

3164163

Title :

High quality lip-sync animation for 3D photo-realistic talking head

Author :

Wang, Lijuan ; Han, Wei ; Soong, Frank K.

Author_Institution :

Microsoft Res. Asia, Beijing, China

fYear :

2012

fDate :

25-30 March 2012

Firstpage :

4529

Lastpage :

4532

Abstract :

We propose a new 3D photo-realistic talking head with high quality, lip-sync animation. It extends our prior high-quality 2D photo-realistic talking head to 3D. An a/v recording of a person speaking a set of prompted sentences with good phonetic coverage for ~20-minutes is first made. We then use a 2D-to-3D reconstruction algorithm to automatically adapt a general 3D head mesh model to the person. In training, super feature vectors consisting of 3D geometry, texture and speech are augmented together to train a statistical, multi-streamed, Hidden Markov Model (HMM). The HMM is then used to synthesize both the trajectories of head motion animation and the corresponding dynamics of texture. The resultant 3D talking head animation can be controlled by the model predicted geometric trajectory while the articulator movements, e.g., lips, are rendered with dynamic 2D texture image sequences. Head motions and facial expression can also be separately controlled by manipulating corresponding parameters. In a real-time demonstration, the life-like 3D talking head can take any input text, convert it into speech and render lip-synced speech animation photo-realistically.

Keywords :

computer animation; face recognition; geometry; hidden Markov models; image sequences; image texture; motion estimation; speech processing; video recording; 2D-to-3D reconstruction algorithm; 3D geometry; 3D head mesh model; 3D image texture; 3D photo-realistic talking head; 3D speech processing; HMM; Hidden Markov Model; a-v recording; articulator movements; dynamic 2D texture image sequences; facial expression; head motions; high quality lip-sync animation; high-quality 2D photo-realistic talking head; phonetic coverage; super feature vectors; Animation; Face; Geometry; Hidden Markov models; Solid modeling; Visualization; 3D; audio/visual synthesis; lip-sync; photorealistic; talking head;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on

Conference_Location :

Kyoto

ISSN :

1520-6149

Print_ISBN :

978-1-4673-0045-2

Electronic_ISBN :

1520-6149

Type :

conf

DOI :

10.1109/ICASSP.2012.6288925

Filename :

6288925

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3164163