DocumentCode :
672376
Title :
Context-dependent modelling of deep neural network using logistic regression
Author :
Guangsen Wang ; Khe Chai Sim
Author_Institution :
Comput. Sci. Dept., Nat. Univ. of Singapore, Singapore, Singapore
fYear :
2013
fDate :
8-12 Dec. 2013
Firstpage :
338
Lastpage :
343
Abstract :
The data sparsity problem of context-dependent acoustic modelling in automatic speech recognition is addressed by using the decision tree state clusters as the training targets in the standard context-dependent (CD) deep neural network (DNN) systems. As a result, the CD states within a cluster cannot be distinguished during decoding. This problem, referred to as the clustering problem, is not explicitly addressed in the current literature. In this paper, we formulate the CD DNN as an instance of the canonical state modelling technique based on a set of broad phone classes to address both the data sparsity and the clustering problems. The triphone is clustered into multiple sets of shorter biphones using broad phone contexts to address the data sparsity issue. A DNN is trained to discriminate the biphones within each set. The canonical states are represented by the concatenated log posteriors of all the broad phone DNNs. Logistic regression is used to transform the canonical states into the triphone state output probability. Clustering of the regression parameters is used to reduce model complexity while still achieving unique acoustic scores for all possible triphones. The experimental results on a broadcast news transcription task reveal that the proposed regression-based CD DNN significantly outperforms the standard CD DNN. The best system provides a 2.7% absolute WER reduction compared to the best standard CD DNN system.
Keywords :
decision trees; neural nets; pattern clustering; regression analysis; speech recognition; absolute WER reduction; acoustic scores; automatic speech recognition; biphones; broad phone DNN systems; broadcast news transcription task; canonical state modelling; clustering problem; context dependent acoustic modelling; data sparsity problem; decision tree state clusters; decoding; logistic regression; model complexity; regression based CD DNN; standard CD DNN system; standard context dependent deep neural network; triphone state output probability; triphones; Context; Context modeling; Decoding; Detectors; Hidden Markov models; Training; Vectors; Articulatory Features; Canonical State Modelling; Context-Dependent Modelling; Deep Neural Network; Logistic Regression;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Automatic Speech Recognition and Understanding (ASRU), 2013 IEEE Workshop on
Conference_Location :
Olomouc
Type :
conf
DOI :
10.1109/ASRU.2013.6707753
Filename :
6707753
Link To Document :
بازگشت