DocumentCode
1848456
Title
Synthesizing speech animation by learning compact speech co-articulation models
Author
Deng, Zhigang ; Lewis, J.P. ; Neumann, Ulrich
Author_Institution
Dept. of Comput. Sci., Univ. of Southern California, Los Angeles, CA, USA
fYear
2005
fDate
22-24 June 2005
Firstpage
19
Lastpage
25
Abstract
While speech animation fundamentally consists of a sequence of phonemes over time, sophisticated animation requires smooth interpolation and co-articulation effects, where the preceding and following phonemes influence the shape of a phoneme. Co-articulation has been approached in speech animation research in several ways, most often by simply smoothing the mouth geometry motion over time. Data-driven approaches tend to generate realistic speech animation, but they need to store a large facial motion database, which is not feasible for real time gaming and interactive applications on platforms such as PDAs and cell phones. In this paper we show that accurate speech co-articulation model with compact size can be learned from facial motion capture data. An initial phoneme sequence is generated automatically from text-to-speech (TTS) systems. Then, our learned co-articulation model is applied to the resulting phoneme sequence, producing natural and detailed motion. The contribution of this work is that speech co-articulation models "learned" from real human motion data can be used to generate natural-looking speech motion while simultaneously preserving the expressiveness of the animation via keyframing control. Simultaneously, this approach can be effectively applied to interactive applications due to its compact size.
Keywords
computational geometry; computer animation; face recognition; image motion analysis; interpolation; knowledge acquisition; learning (artificial intelligence); solid modelling; speech processing; speech synthesis; visual databases; dynamic programming; facial motion capture database; human motion data; interpolation; keyframing control; mouth geometry motion; phoneme sequence; realistic speech animation synthesis; speech coarticulation model learning; speech motion; text-to-speech system; Cellular phones; Facial animation; Geometry; Interpolation; Mouth; Personal digital assistants; Shape; Smoothing methods; Spatial databases; Speech synthesis;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer Graphics International 2005
ISSN
1530-1052
Print_ISBN
0-7803-9330-9
Type
conf
DOI
10.1109/CGI.2005.1500361
Filename
1500361
Link To Document