Multiframe deep neural networks for acoustic modeling

Author

Vanhoucke, V. ; Devin, M. ; Heigold, Georg

Author_Institution

Google, Inc., Mountain View, CA, USA

fYear

2013

Firstpage

7582

Lastpage

7585

Abstract

Deep neural networks have been shown to perform very well as acoustic models for automatic speech recognition. Compared to Gaussian mixtures however, they tend to be very expensive computationally, making them challenging to use in real-time applications. One key advantage of such neural networks is their ability to learn from very long observation windows going up to 400 ms. Given this very long temporal context, it is tempting to wonder whether one can run neural networks at a lower frame rate than the typical 10 ms, and whether there might be computational benefits to doing so. This paper describes a method of tying the neural network parameters over time which achieves comparable performance to the typical frame-synchronous model, while achieving up to a 4X reduction in the computational cost of the neural network activations.

Keywords

Gaussian processes; neural nets; speech recognition; Gaussian mixtures; acoustic modeling; automatic speech recognition; computational cost; frame synchronous model; multiframe deep neural networks; Acoustics; Complexity theory; Computational modeling; Context; Error analysis; Hidden Markov models; Neural networks; acoustic modeling; deep neural networks;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on

Conference_Location

Vancouver, BC

ISSN

1520-6149

Type

conf

DOI

10.1109/ICASSP.2013.6639137

Filename

6639137