• DocumentCode
    2989873
  • Title

    Development of text-to-audiovisual speech synthesis to support interactive language learning on a mobile device

  • Author

    Wai-Kim Leung ; Ka-Wa Yuen ; Ka-Ho Wong ; Meng, Hsiang-Yun

  • Author_Institution
    Dept. of Syst. Eng. & Eng. Manage., Chinese Univ. of Hong Kong, Hong Kong, China
  • fYear
    2013
  • fDate
    2-5 Dec. 2013
  • Firstpage
    583
  • Lastpage
    588
  • Abstract
    We have developed distributed text-to-audiovisual-speech synthesizer (TTAVS) to support interactivity in computer-aided pronunciation training (CAPT) on a mobile platform. The TTAVS serves to generate audiovisual corrective feedback based on detected mispronunciations from the second language learner´s speech. Our approach encodes key visemes in SVG format that are compressed by GZIP and transmitted to the client, where the browser can perform real-time morphing to render the visual speech. We have also developed a TTAVS animation player that can play the audio and visual speech synchronously while enabling user controls in play/pause/resume. Evaluation shows that this newly proposed approach, vis-à-vis our original approach that involves generation of an Ogg video on the server-side which is streamed to the client, achieves a significant reduction (66%) in average size of the output files that are transmitted from the server to the client, reduction of (83%) in client waiting times, as well as preserve the quality of the image.
  • Keywords
    audio streaming; audio-visual systems; client-server systems; computer animation; computer based training; interactive video; mobile computing; natural language processing; rendering (computer graphics); speech synthesis; video streaming; GZIP; Ogg video generation; Ogg video streaming; SVG format compression; TTAVS animation player; audio speech rendering; audiovisual corrective feedback; browser; client waiting time reduction; client-server system; computer aided pronunciation training; distributed TTAVS; image quality preservation; interactive language learning; mispronunciation detection; mispronunciation generation; mobile device; play-pause-resume; real-time morphing; second language learner speech; text-to-audiovisual speech synthesis; user control; vis-à-vis; visual speech rendering; Animation; Generators; Servers; Speech; Speech synthesis; Streaming media; Visualization; computer aided-pronunciation training system (CAPT); language learning; visual speech synthesizer;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cognitive Infocommunications (CogInfoCom), 2013 IEEE 4th International Conference on
  • Conference_Location
    Budapest
  • Print_ISBN
    978-1-4799-1543-9
  • Type

    conf

  • DOI
    10.1109/CogInfoCom.2013.6719170
  • Filename
    6719170