• DocumentCode
    3163556
  • Title

    Exploiting sparseness in deep neural networks for large vocabulary speech recognition

  • Author

    Yu, Dong ; Seide, Frank ; Li, Gang ; Deng, Li

  • Author_Institution
    Microsoft Res., Redmond, WA, USA
  • fYear
    2012
  • fDate
    25-30 March 2012
  • Firstpage
    4409
  • Lastpage
    4412
  • Abstract
    Recently, we developed context-dependent deep neural network (DNN) hidden Markov models for large vocabulary speech recognition. While reducing errors by 33% compared to its discriminatively trained Gaussian-mixture counterpart on the switchboard benchmark task, DNN requires much more parameters. In this paper, we report our recent work on DNN for improved generalization, model size, and computation speed by exploiting parameter sparseness. We formulate the goal of enforcing sparseness as soft regularization and convex constraint optimization problems, and propose solutions under the stochastic gradient ascent setting. We also propose novel data structures to exploit the random sparseness patterns to reduce model size and computation time. The proposed solutions have been evaluated on the voice-search and switchboard datasets. They have decreased the number of nonzero connections to one third while reducing the error rate by 0.2-0.3% over the fully connected model on both datasets. The nonzero connections have been further reduced to only 12% and 19% on the two respective datasets without sacrificing speech recognition performance. Under these conditions we can reduce the model size to 18% and 29%, and computation to 14% and 23%, respectively, on these two datasets.
  • Keywords
    convex programming; gradient methods; neural nets; random processes; speech recognition; stochastic processes; vocabulary; convex constraint optimization problem; data structure; deep neural network; large vocabulary speech recognition; model size reduction; nonzero connection; parameter sparseness; random sparseness pattern; soft regularization; stochastic gradient ascent; switchboard dataset; voice-search dataset; Computational modeling; Data structures; Hidden Markov models; Indexes; Speech; Speech recognition; Training; deep belief networks; deep neural networks; sparseness; speech recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on
  • Conference_Location
    Kyoto
  • ISSN
    1520-6149
  • Print_ISBN
    978-1-4673-0045-2
  • Electronic_ISBN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2012.6288897
  • Filename
    6288897