HMM-based emphatic speech synthesis for corrective feedback in computer-aided pronunciation training

Author

Yishuang Ning ; Zhiyong Wu ; Jia Jia ; Fanbo Meng ; Meng, Helen ; Lianhong Cai

Author_Institution

Shenzhen Key Lab. of Inf. Sci. & Technol., Tsinghua Univ., Shenzhen, China

fYear

2015

fDate

19-24 April 2015

Firstpage

4934

Lastpage

4938

Abstract

This paper investigates the incorporation of hidden Markov model (HMM) based emphatic speech synthesis for audio exaggeration into an audio-visual speech synthesis framework for the corrective feedback in computer-aided pronunciation training (CAPT). To improve the voice quality of the synthetic emphatic speech, this paper proposes a new method for HMM training. In this method, the contextual questions for decision tree building are extended by considering the emphasis-related information. HMMs are then trained using a small scale emphatic corpus together with a large scale neutral corpus. The emphatic corpus is used to ensure the quality of the emphatic speech segments whereas the neutral corpus is to further improve the quality of both the non-emphatic speech segments and the emphatic ones. Finally, emphatic speech synthesis is achieved by extending the Flite+hts_engine. Experimental results show that our method can synthesize emphatic speech with high quality and make the feedback more discriminatively perceptible.

Keywords

audio-visual systems; decision trees; feedback; hidden Markov models; speech synthesis; CAPT; Flite+hts_engine; HMM-based emphatic speech synthesis; audio exaggeration; audio-visual speech synthesis framework; computer-aided pronunciation training; corrective feedback; decision tree; emphatic speech segmentation quality; hidden Markov model; large scale neutral corpus; small scale emphatic corpus; voice quality improvement; Acoustics; Context; Context modeling; Hidden Markov models; Speech; Speech synthesis; Training; computer-aided pronunciation training (CAPT); emphatic speech synthesis; hidden Markov model (HMM);

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on

Conference_Location

South Brisbane, QLD

Type

conf

DOI

10.1109/ICASSP.2015.7178909

Filename

7178909