مرکز منطقه ای اطلاع رساني علوم و فناوري - Building context-dependent DNN acoustic models using Kullback-Leibler divergence-based state tying

DocumentCode :

730713

Title :

Building context-dependent DNN acoustic models using Kullback-Leibler divergence-based state tying

Author :

Gosztolya, Gabor ; Grosz, Tamas ; Toth, Laszlo ; Imseng, David

Author_Institution :

MTA-SZTE Res. Group on Artificial Intell., Szeged, Hungary

fYear :

2015

fDate :

19-24 April 2015

Firstpage :

4570

Lastpage :

4574

Abstract :

Deep neural network (DNN) based speech recognizers have recently replaced Gaussian mixture (GMM) based systems as the state-of-the-art. HMM/DNN systems have kept many refinements of the HMM/GMM framework, even though some of these may be suboptimal for them. One such example is the creation of context-dependent tied states, for which an efficient decision tree state tying method exists. The tied states used to train DNNs are usually obtained using the same tying algorithm, even though it is based on likelihoods of Gaussians. In this paper, we investigate an alternative state clustering method that uses the Kullback-Leibler (KL) divergence of DNN output vectors to build the decision tree. It has already been successfully applied within the framework of KL-HMM systems, and here we show that it is also beneficial for HMM/DNN hybrids. In a large vocabulary recognition task we report a 4% relative word error rate reduction using this state clustering method.

Keywords :

Gaussian distribution; acoustic signal processing; decision trees; hidden Markov models; learning (artificial intelligence); pattern clustering; speech recognition; DNN output vectors; KL-HMM systems; Kullback-Leibler divergence-based state tying method; context-dependent DNN acoustic models; decision tree state tying method; deep neural network based speech recognizers; relative word error rate reduction; state clustering method; vocabulary recognition task; Artificial neural networks; Context; Hidden Markov models; Speech; Kullback-Leibler divergence; Speech recognition; deep neural networks; state tying;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on

Conference_Location :

South Brisbane, QLD

Type :

conf

DOI :

10.1109/ICASSP.2015.7178836

Filename :

7178836

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=730713