مرکز منطقه ای اطلاع رساني علوم و فناوري - A DNN-based acoustic modeling of tonal language and its application to Mandarin pronunciation training

DocumentCode :

178688

Title :

A DNN-based acoustic modeling of tonal language and its application to Mandarin pronunciation training

Author :

Wenping Hu ; Yao Qian ; Soong, Frank K.

Author_Institution :

Univ. of Sci. & Technol. of China, Hefei, China

fYear :

2014

fDate :

4-9 May 2014

Firstpage :

3206

Lastpage :

3210

Abstract :

In this paper we investigate a Deep Neural Network (DNN) based approach to acoustic modeling of tonal language and assess its speech recognition performance with different features and modeling techniques. Mandarin Chinese, the most widely spoken tonal language, is chosen for testing the tone related ASR performance. Furthermore, the DNN-trained, tone-sensitive model is evaluated in automatic detection of mispronunciation among L2 Mandarin learners. The best DNN-HMM acoustic model of tonal syllable (initial and tonal final), trained with embedded F0 features, has shown improved ASR performance, when compared with the baseline DNN system of 39 MFCC features. The proposed system achieves better ASR performance than the baseline system, i.e., by 32% and 35% in relative tone error rate reduction and 20% and 23% in relative tonal syllable error rate reduction, for female and male speakers, respectively. In a speech database of L2 Mandarin learners (native speakers of European languages), 2% equal error rate reduction, from 27.5% to 25.5%, has been obtained with our DNN-HMM system in detecting mispronunciations, compared with the baseline system.

Keywords :

natural languages; neural nets; speaker recognition; ASR performance; DNN-HMM system; DNN-based acoustic modeling; DNN-trained tone-sensitive model; L2 Mandarin learners; MFCC features; Mandarin Chinese; Mandarin pronunciation training; automatic mispronunciation detection; deep neural network approach; embedded F0 features; relative tonal syllable error rate reduction; speech database; speech recognition performance; tonal language; Error analysis; Hidden Markov models; Mel frequency cepstral coefficient; Speech; Speech recognition; Training; Acoustic Model; Computer-Aided Pronunciation Training; Deep Neural Network; F0; Mandarin;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on

Conference_Location :

Florence

Type :

conf

DOI :

10.1109/ICASSP.2014.6854192

Filename :

6854192

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=178688