DocumentCode :
1848456
Title :
Synthesizing speech animation by learning compact speech co-articulation models
Author :
Deng, Zhigang ; Lewis, J.P. ; Neumann, Ulrich
Author_Institution :
Dept. of Comput. Sci., Univ. of Southern California, Los Angeles, CA, USA
fYear :
2005
fDate :
22-24 June 2005
Firstpage :
19
Lastpage :
25
Abstract :
While speech animation fundamentally consists of a sequence of phonemes over time, sophisticated animation requires smooth interpolation and co-articulation effects, where the preceding and following phonemes influence the shape of a phoneme. Co-articulation has been approached in speech animation research in several ways, most often by simply smoothing the mouth geometry motion over time. Data-driven approaches tend to generate realistic speech animation, but they need to store a large facial motion database, which is not feasible for real time gaming and interactive applications on platforms such as PDAs and cell phones. In this paper we show that accurate speech co-articulation model with compact size can be learned from facial motion capture data. An initial phoneme sequence is generated automatically from text-to-speech (TTS) systems. Then, our learned co-articulation model is applied to the resulting phoneme sequence, producing natural and detailed motion. The contribution of this work is that speech co-articulation models "learned" from real human motion data can be used to generate natural-looking speech motion while simultaneously preserving the expressiveness of the animation via keyframing control. Simultaneously, this approach can be effectively applied to interactive applications due to its compact size.
Keywords :
computational geometry; computer animation; face recognition; image motion analysis; interpolation; knowledge acquisition; learning (artificial intelligence); solid modelling; speech processing; speech synthesis; visual databases; dynamic programming; facial motion capture database; human motion data; interpolation; keyframing control; mouth geometry motion; phoneme sequence; realistic speech animation synthesis; speech coarticulation model learning; speech motion; text-to-speech system; Cellular phones; Facial animation; Geometry; Interpolation; Mouth; Personal digital assistants; Shape; Smoothing methods; Spatial databases; Speech synthesis;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Graphics International 2005
ISSN :
1530-1052
Print_ISBN :
0-7803-9330-9
Type :
conf
DOI :
10.1109/CGI.2005.1500361
Filename :
1500361
Link To Document :
بازگشت