Title :
The NAIST ASR system for the 2015 Multi-Genre Broadcast challenge: On combination of deep learning systems using a rank-score function
Author :
Quoc Truong Do;Michael Heck;Sakriani Sakti;Graham Neubig;Tomoki Toda;Satoshi Nakamura
Author_Institution :
Augmented Human Communication Laboratory, Graduate School of Information Science, Nara Institute of Science and Technology, Nara, Japan
Abstract :
The Multi-Genre Broadcast challenge is an official challenge of the IEEE Automatic Speech Recognition and Understanding Workshop. This paper presents NAISTs contribution to the premiere of this challenge. The presented speech-to-text system for English makes use of various front-ends (e.g., MFCC, i-vector and FBANK), DNN acoustic models and several language models for decoding and rescoring (N-gram, RNNLM). Subsets of the training data with varying sizes were evaluated with respect to the overall training quality. Two speech segmentation systems were developed for the challenge, based on DNNs and GMM-HMMs. Recognition was performed in three stages: Decoding, lattice rescoring and system combination. This paper focuses on the system combination experiments and presents a rank-score based system weighting approach, which gave better performance compared to a normal system combination strategy. The DNN based ASR system trained on MFCC + i-vector features with the sMBR training criterion gives the best performance of 27.8% WER, and thus significantly outperforms the baseline DNN-HMM sMBR yielding 33.7% WER.
Keywords :
"Training","Decoding","Speech","Mel frequency cepstral coefficient","Standards","Data models"
Conference_Titel :
Automatic Speech Recognition and Understanding (ASRU), 2015 IEEE Workshop on
DOI :
10.1109/ASRU.2015.7404858