DocumentCode :
2872011
Title :
Triphone based unit selection for concatenative visual speech synthesis
Author :
Huang, Fu Jie ; Cosatto, Eric ; Graf, Hans Peter
Author_Institution :
AT&T Labs-Research, USA
Volume :
2
fYear :
2002
fDate :
13-17 May 2002
Abstract :
Concatenative visual speech synthesis selects frames from a large recorded video database of mouth shapes to generate photo-realistic talking head sequences. The synthesized sequence must exhibit precise lip-sound synchronization and smooth articulation. The selection process for finding the best lip shapes has been computationally expensive [1], limiting the speed of the synthesis to far less than real time. In this paper, we propose a rapid unit selection approach based on triphone units. Experiments show that this algorithm can make the synthesis, excluding the rendering, 50 times faster than real-time on a standard desktop PC. We also developed a metric to test the quality of the synthesis objectively, and show that this measurement is consistent with subjective measurement results.
Keywords :
Computers; Manuals; Nonvolatile memory; Speech; Speech processing; Switches; Visualization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech, and Signal Processing (ICASSP), 2002 IEEE International Conference on
Conference_Location :
Orlando, FL, USA
ISSN :
1520-6149
Print_ISBN :
0-7803-7402-9
Type :
conf
DOI :
10.1109/ICASSP.2002.5745033
Filename :
5745033
Link To Document :
بازگشت