Title :
Refinements of regression-based context-dependent modelling of deep neural networks for automatic speech recognition
Author :
Guangsen Wang ; Khe Chai Sim
Author_Institution :
Sch. of Comput., Nat. Univ. of Singapore, Singapore, Singapore
Abstract :
The data sparsity problem of context-dependent (CD) acoustic modelling of deep neural networks (DNNs) in speech recognition is addressed by using the decision tree state clusters as the training targets. The CD states within a cluster cannot be distinguished during decoding. This problem, referred to as the clustering problem, is not explicitly addressed in the current literature. In our previous work, a regression-based CD-DNN framework was proposed to address both the data sparsity and the clustering problems. This paper investigates several refinements for the regression-based CD-DNN including two more representative state approximation schemes and the incorporation of sequential learning. The two approximations are obtained based on the statistics learned from the training data. Sequential learning is applied to both broad phone DNN detectors and the regression NN. The proposed refinements are evaluated on a broadcast news transcription task. For the cross-entropy systems, the two approximations perform consistently better than our previous work. Consistent performance gain over the corresponding cross-entropy trained systems is also observed for both the baseline CD-DNN and the regression model with sequential learning.
Keywords :
decision trees; learning (artificial intelligence); neural nets; regression analysis; speech recognition; CD acoustic modelling; baseline CD-DNN; broad phone DNN detectors; broadcast news transcription task; clustering problem; context-dependent acoustic modelling; cross-entropy trained systems; data sparsity problem; decision tree state clusters; deep neural networks; regression-based CD-DNN framework; representative state approximation schemes; sequential learning; speech recognition; training targets; Approximation methods; Detectors; Hidden Markov models; Mathematical model; Neural networks; Speech recognition; Training; Articulatory Features; Canonical State Modelling; Context Dependent Modelling; Deep Neural Network; Logistic Regression; Sequential Learning;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
Conference_Location :
Florence
DOI :
10.1109/ICASSP.2014.6854155