Title :
Multilevel sampling and aggregation for discriminative training
Author :
Yunxin Zhao ; Tuo Zhao ; Xin Chen
Author_Institution :
Dept. of Comput. Sci., Univ. of Missouri, Columbia, MO, USA
Abstract :
We propose to use data sampling in the extended Baum-Welch (EBW) algorithm for maximum mutual information (MMI) based estimation of speech acoustic models, and to randomize the configurations of the sampled training sets and aggregate the numerator and denominator sufficient statistics for improving model robustness. We further combine data sampling based ensemble acoustic modeling with the data sampling based EBW, forming a two-level data sampling mechanism for acoustic model training. We conducted experiments on a telehealth conversational speech recognition task, where the two-level data sampling mechanism gave a statistically significant, absolute word accuracy gain of 3.56% over the conventional MMI baseline, corresponding to a 19.44% relative word error rate reduction.
Keywords :
acoustic signal processing; estimation theory; signal sampling; speech recognition; telemedicine; acoustic model training; data sampling based EBW; data sampling based ensemble acoustic modeling; denominator sufficient statistics; discriminative training; extended Baum-Welch algorithm; maximum mutual information based estimation; multilevel aggregation; multilevel sampling; numerator sufficient statistics; sampled training sets; speech acoustic models; telehealth conversational speech recognition task; two-level data sampling mechanism; Accuracy; Acoustics; Computational modeling; Data models; Hidden Markov models; Maximum likelihood estimation; Training; data sampling; discriminative training; ensemble acoustic model; extended Baum-Welch algorithm; speech recognition;
Conference_Titel :
Chinese Spoken Language Processing (ISCSLP), 2014 9th International Symposium on
Conference_Location :
Singapore
DOI :
10.1109/ISCSLP.2014.6936677