Exploiting sparseness in deep neural networks for large vocabulary speech recognition

Author

Yu, Dong ; Seide, Frank ; Li, Gang ; Deng, Li

Author_Institution

Microsoft Res., Redmond, WA, USA

fYear

2012

fDate

25-30 March 2012

Firstpage

4409

Lastpage

4412

Abstract

Recently, we developed context-dependent deep neural network (DNN) hidden Markov models for large vocabulary speech recognition. While reducing errors by 33% compared to its discriminatively trained Gaussian-mixture counterpart on the switchboard benchmark task, DNN requires much more parameters. In this paper, we report our recent work on DNN for improved generalization, model size, and computation speed by exploiting parameter sparseness. We formulate the goal of enforcing sparseness as soft regularization and convex constraint optimization problems, and propose solutions under the stochastic gradient ascent setting. We also propose novel data structures to exploit the random sparseness patterns to reduce model size and computation time. The proposed solutions have been evaluated on the voice-search and switchboard datasets. They have decreased the number of nonzero connections to one third while reducing the error rate by 0.2-0.3% over the fully connected model on both datasets. The nonzero connections have been further reduced to only 12% and 19% on the two respective datasets without sacrificing speech recognition performance. Under these conditions we can reduce the model size to 18% and 29%, and computation to 14% and 23%, respectively, on these two datasets.

Keywords

convex programming; gradient methods; neural nets; random processes; speech recognition; stochastic processes; vocabulary; convex constraint optimization problem; data structure; deep neural network; large vocabulary speech recognition; model size reduction; nonzero connection; parameter sparseness; random sparseness pattern; soft regularization; stochastic gradient ascent; switchboard dataset; voice-search dataset; Computational modeling; Data structures; Hidden Markov models; Indexes; Speech; Speech recognition; Training; deep belief networks; deep neural networks; sparseness; speech recognition;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on

Conference_Location

Kyoto

ISSN

1520-6149

Print_ISBN

978-1-4673-0045-2

Electronic_ISBN

1520-6149

Type

conf

DOI

10.1109/ICASSP.2012.6288897

Filename

6288897