DocumentCode :
3164163
Title :
High quality lip-sync animation for 3D photo-realistic talking head
Author :
Wang, Lijuan ; Han, Wei ; Soong, Frank K.
Author_Institution :
Microsoft Res. Asia, Beijing, China
fYear :
2012
fDate :
25-30 March 2012
Firstpage :
4529
Lastpage :
4532
Abstract :
We propose a new 3D photo-realistic talking head with high quality, lip-sync animation. It extends our prior high-quality 2D photo-realistic talking head to 3D. An a/v recording of a person speaking a set of prompted sentences with good phonetic coverage for ~20-minutes is first made. We then use a 2D-to-3D reconstruction algorithm to automatically adapt a general 3D head mesh model to the person. In training, super feature vectors consisting of 3D geometry, texture and speech are augmented together to train a statistical, multi-streamed, Hidden Markov Model (HMM). The HMM is then used to synthesize both the trajectories of head motion animation and the corresponding dynamics of texture. The resultant 3D talking head animation can be controlled by the model predicted geometric trajectory while the articulator movements, e.g., lips, are rendered with dynamic 2D texture image sequences. Head motions and facial expression can also be separately controlled by manipulating corresponding parameters. In a real-time demonstration, the life-like 3D talking head can take any input text, convert it into speech and render lip-synced speech animation photo-realistically.
Keywords :
computer animation; face recognition; geometry; hidden Markov models; image sequences; image texture; motion estimation; speech processing; video recording; 2D-to-3D reconstruction algorithm; 3D geometry; 3D head mesh model; 3D image texture; 3D photo-realistic talking head; 3D speech processing; HMM; Hidden Markov Model; a-v recording; articulator movements; dynamic 2D texture image sequences; facial expression; head motions; high quality lip-sync animation; high-quality 2D photo-realistic talking head; phonetic coverage; super feature vectors; Animation; Face; Geometry; Hidden Markov models; Solid modeling; Visualization; 3D; audio/visual synthesis; lip-sync; photorealistic; talking head;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on
Conference_Location :
Kyoto
ISSN :
1520-6149
Print_ISBN :
978-1-4673-0045-2
Electronic_ISBN :
1520-6149
Type :
conf
DOI :
10.1109/ICASSP.2012.6288925
Filename :
6288925
Link To Document :
بازگشت