مرکز منطقه ای اطلاع رساني علوم و فناوري - Reliable Accent-Specific Unit Generation With Discriminative Dynamic Gaussian Mixture Selection for Multi-Accent Chinese Speech Recognition

DocumentCode :

79795

Title :

Reliable Accent-Specific Unit Generation With Discriminative Dynamic Gaussian Mixture Selection for Multi-Accent Chinese Speech Recognition

Author :

Chao Zhang ; Yi Liu ; Yunqing Xia ; Xuan Wang ; Chin-Hui Lee

Author_Institution :

Eng. Dept., Univ. of Cambridge, Cambridge, UK

Volume :

Issue :

fYear :

2013

fDate :

Oct. 2013

Firstpage :

2073

Lastpage :

2084

Abstract :

In this paper, we propose a discriminative dynamic Gaussian mixture selection (DGMS) strategy to generate reliable accent-specific units (ASUs) for multi-accent speech recognition. Time-aligned phone recognition is used to generate the ASUs that model accent variations explicitly and accurately. DGMS reconstructs and adjusts a pre-trained set of hidden Markov model (HMM) state densities to build dynamic observation densities for each input speech frame. A discriminative minimum classification error criterion is adopted to optimize the sizes of the HMM state observation densities with a genetic algorithm (GA). To the author´s knowledge, the discriminative optimization for DGMS accomplishes discriminative training of discrete variables that is first proposed. We found the proposed framework is able to cover more multi-accent changes, thus reduce some performance loss in pruned beam search, without increasing the model size of the original acoustic model set. Evaluation on three typical Chinese accents, Chuan, Yue and Wu, shows that our approach outperforms traditional acoustic model reconstruction techniques with a syllable error rate reduction of 8.0%, 5.5% and 5.0%, respectively, while maintaining a good performance on standard Putonghua speech.

Keywords :

Gaussian processes; genetic algorithms; hidden Markov models; speech recognition; ASU; Chuan accent; DGMS; GA; HMM state observation density; Wu accent; Yue accent; acoustic model set; discrete variable; discriminative dynamic Gaussian mixture selection; discriminative minimum classification error criterion; discriminative optimization; genetic algorithm; hidden Markov model; multiaccent Chinese speech recognition; reliable accent-specillc unit generation; standard Putonghua speech; syllable error rate reduction; time-aligned phone recognition; Accented speech recognition; accent-specific unit; dynamic Gaussian mixture selection (DGMS); genetic algorithm;

fLanguage :

English

Journal_Title :

Audio, Speech, and Language Processing, IEEE Transactions on

Publisher :

ieee

ISSN :

1558-7916

Type :

jour

DOI :

10.1109/TASL.2013.2265087

Filename :

6521341

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=79795