مرکز منطقه ای اطلاع رساني علوم و فناوري - Audio-visual speech translation with automatic lip syncqronization and face tracking based on 3-D head model

DocumentCode :

542710

Title :

Audio-visual speech translation with automatic lip syncqronization and face tracking based on 3-D head model

Author :

Morishima, Shigeo ; Ogata, Shin ; Murai, Kazumasa ; Nakamura, Satoshi

Author_Institution :

Faculty of Engineering, Seikei University, 3-3-1, Kichijoji-Kitamachi, Musashino city, Tokyo, 180-8633, Japan

Volume :

fYear :

2002

fDate :

13-17 May 2002

Abstract :

Speech-to-speech translation has been studied to realize natural human communication beyond language barriers. Toward further multi-modal natural communication, visual information such as face and lip movements will be necessary. In this paper, we introduce a multi-modal English-to-Japanese and Japanese-to-English translation system that also translates the speaker´s speech motion while synchronizing it to the translated speech. To retain the speaker´s facial expression, we substitute only the speech organ´s image with the synthesized one, which is made by a three-dimensional wire-frame model that is adaptable to any speaker. Our approach enables image synthesis and translation with an extremely small database. We conduct subjective evaluation by connected digit discrimination using data with and without audiovisual lip-synchronicity. The results confirm the sufficient quality of the proposed audio-visual translation system.

Keywords :

Adaptation model; Data models; Face; Manuals; Mouth; Web sites;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech, and Signal Processing (ICASSP), 2002 IEEE International Conference on

Conference_Location :

Orlando, FL, USA

ISSN :

1520-6149

Print_ISBN :

0-7803-7402-9

Type :

conf

DOI :

10.1109/ICASSP.2002.5745053

Filename :

5745053

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=542710