Title :
Standalone training of context-dependent deep neural network acoustic models
Author :
Zhang, Chenghui ; Woodland, Philip C.
Author_Institution :
Eng. Dept., Cambridge Univ., Cambridge, UK
Abstract :
Recently, context-dependent (CD) deep neural network (DNN) hidden Markov models (HMMs) have been widely used as acoustic models for speech recognition. However, the standard method to build such models requires target training labels from a system using HMMs with Gaussian mixture model output distributions (GMM-HMMs). In this paper, we introduce a method for training state-of-the-art CD-DNN-HMMs without relying on such a pre-existing system. We achieve this in two steps: build a context-independent (CI) DNN iteratively with word transcriptions, and then cluster the equivalent output distributions of the untied CD-DNN HMM states using the decision tree based state tying approach. Experiments have been performed on the Wall Street Journal corpus and the resulting system gave comparable word error rates (WER) to CD-DNNs built based on GMM-HMM alignments and state-clustering.
Keywords :
Gaussian processes; acoustic analysis; decision trees; hidden Markov models; iterative methods; learning (artificial intelligence); mixture models; neural nets; speech recognition; CD-DNN-HMM training; GMM-HMM; Gaussian mixture model output distributions; WER; Wall Street Journal corpus; context-dependent deep neural network acoustic models; context-dependent deep neural network hidden Markov models; decision tree based state tying approach; speech recognition; target training labels; word error rates; Acoustics; Decision trees; Hidden Markov models; Neural networks; Speech recognition; Training; Vectors;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
Conference_Location :
Florence
DOI :
10.1109/ICASSP.2014.6854674