مرکز منطقه ای اطلاع رساني علوم و فناوري - Synthesizing speech animation by learning compact speech co-articulation models

DocumentCode :

1848456

Title :

Synthesizing speech animation by learning compact speech co-articulation models

Author :

Deng, Zhigang ; Lewis, J.P. ; Neumann, Ulrich

Author_Institution :

Dept. of Comput. Sci., Univ. of Southern California, Los Angeles, CA, USA

fYear :

2005

fDate :

22-24 June 2005

Firstpage :

Lastpage :

Abstract :

While speech animation fundamentally consists of a sequence of phonemes over time, sophisticated animation requires smooth interpolation and co-articulation effects, where the preceding and following phonemes influence the shape of a phoneme. Co-articulation has been approached in speech animation research in several ways, most often by simply smoothing the mouth geometry motion over time. Data-driven approaches tend to generate realistic speech animation, but they need to store a large facial motion database, which is not feasible for real time gaming and interactive applications on platforms such as PDAs and cell phones. In this paper we show that accurate speech co-articulation model with compact size can be learned from facial motion capture data. An initial phoneme sequence is generated automatically from text-to-speech (TTS) systems. Then, our learned co-articulation model is applied to the resulting phoneme sequence, producing natural and detailed motion. The contribution of this work is that speech co-articulation models "learned" from real human motion data can be used to generate natural-looking speech motion while simultaneously preserving the expressiveness of the animation via keyframing control. Simultaneously, this approach can be effectively applied to interactive applications due to its compact size.

Keywords :

computational geometry; computer animation; face recognition; image motion analysis; interpolation; knowledge acquisition; learning (artificial intelligence); solid modelling; speech processing; speech synthesis; visual databases; dynamic programming; facial motion capture database; human motion data; interpolation; keyframing control; mouth geometry motion; phoneme sequence; realistic speech animation synthesis; speech coarticulation model learning; speech motion; text-to-speech system; Cellular phones; Facial animation; Geometry; Interpolation; Mouth; Personal digital assistants; Shape; Smoothing methods; Spatial databases; Speech synthesis;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Computer Graphics International 2005

ISSN :

1530-1052

Print_ISBN :

0-7803-9330-9

Type :

conf

DOI :

10.1109/CGI.2005.1500361

Filename :

1500361

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1848456