DocumentCode
2150925
Title
Sample-based synthesis of photo-realistic talking heads
Author
Cosatto, Eric ; Graf, Hans Peter
Author_Institution
AT&T Labs.-Res., Red Bank, NJ, USA
fYear
1998
fDate
8-10 Jun 1998
Firstpage
103
Lastpage
110
Abstract
The paper describes a system that generates photo-realistic video animations of talking heads. First the system derives head models from existing video footage using image recognition techniques. It locates, extracts and labels facial parts such as mouth, eyes, and eyebrows into a compact library. Then, using these face models and a text-to-speech synthesizer, it synthesizes new video sequences of the head where the lips are in synchrony with the accompanying soundtrack. Emotional cues and conversational signals are produced by combining head movements, raising eyebrows, wide open eyes, etc. with the mouth animation. For these animations to be believable, care has to be taken aligning the facial parts so that they blend smoothly into each other and produce seamless animations. Our system uses precise multi channel facial recognition techniques to track facial parts, and it derives the exact 3D position of the head, enabling the automatic extraction of normalized face parts. Such talking head animations are useful because they generally increase intelligibility of the human machine interface in applications where content needs to be narrated to the user, such as educative software
Keywords
computer animation; face recognition; feature extraction; interactive systems; realistic images; speech synthesis; user interfaces; automatic extraction; compact library; conversational signals; educative software; emotional cues; exact 3D position; face models; facial parts; head models; head movements; human machine interface; image recognition techniques; intelligibility; mouth animation; normalized face parts; photo-realistic talking heads; photo-realistic video animations; precise multi channel facial recognition techniques; sample based synthesis; seamless animations; soundtrack; talking head animations; text-to-speech synthesizer; video footage; video sequences; Eyebrows; Eyes; Facial animation; Image recognition; Libraries; Magnetic heads; Mouth; Signal synthesis; Speech synthesis; Synthesizers;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer Animation 98. Proceedings
Conference_Location
Philadelphia, PA
ISSN
1087-4844
Print_ISBN
0-8186-8541-7
Type
conf
DOI
10.1109/CA.1998.681914
Filename
681914
Link To Document