Acoustic driven viseme identification for face animation

Author

Zhong, Jialin ; Chou, Wu ; Petajan, Eric

Author_Institution

Lucent Technol., AT&T Bell Labs., Murray Hill, NJ, USA

fYear

1997

fDate

23-25 Jun 1997

Firstpage

7

Lastpage

12

Abstract

Unlike other image templates, visemes have identities in two different media. In audio domain, they are often related to basic linguistic units such as phonemes. In image domain, they are defined by the images of human articulators, such as mouth shapes, chin movements, etc. In this paper, an approach of extracting visemes from both image and acoustic domains is presented. In image domain, the mouth shapes, represented by feature points on inner lip contours, are extracted through face tracking and mouth image analysis. In acoustic domain, viseme segments are obtained automatically by aligning phoneme strings to audio signals through a Viterbi alignment process

Keywords

Viterbi detection; computer animation; feature extraction; image matching; speech processing; speech synthesis; Viterbi alignment process; acoustic driven viseme identification; audio signals; basic linguistic units; chin movements; face animation; human articulators; inner lip contours; mouth shapes; phoneme strings; phonemes; Face; Facial animation; Hidden Markov models; Humans; Image segmentation; Image sequences; Mouth; Shape; Speech synthesis; Viterbi algorithm;

fLanguage

English

Publisher

ieee

Conference_Titel

Multimedia Signal Processing, 1997., IEEE First Workshop on

Conference_Location

Princeton, NJ

Print_ISBN

0-7803-3780-8

Type

conf

DOI

10.1109/MMSP.1997.602605

Filename

602605