• DocumentCode
    179592
  • Title

    Data Augmentation for deep neural network acoustic modeling

  • Author

    Xiaodong Cui ; Goel, Vikas ; Kingsbury, Brian

  • Author_Institution
    IBM T. J. Watson Res. Center, Yorktown Heights, NY, USA
  • fYear
    2014
  • fDate
    4-9 May 2014
  • Firstpage
    5582
  • Lastpage
    5586
  • Abstract
    Data augmentation using label preserving transformations has been shown to be effective for neural network training to make invariant predictions. In this paper we focus on data augmentation approaches to acoustic modeling using deep neural networks (DNNs) for automatic speech recognition (ASR). We first investigate a modified version of a previously studied approach using vocal tract length perturbation (VTLP) and then propose a novel data augmentation approach based on stochastic feature mapping (SFM) in a speaker adaptive feature space. Experiments were conducted on Bengali and Assamese limited language packs (LLPs) from the IARPA Babel program. Improved recognition performance has been observed after both cross-entropy (CE) and state-level minimum Bayes risk (sMBR) training of DNN models.
  • Keywords
    Bayes methods; acoustic analysis; entropy; feature extraction; learning (artificial intelligence); neural nets; risk analysis; speech recognition; ASR; Assamese limited language packs; Bengali limited language packs; DNN models; IARPA Babel program; LLP; MBR training; SFM; VTLP; automatic speech recognition; data augmentation approach; deep neural network acoustic modeling; label preserving transformations; neural network training; speaker adaptive feature space; state-level minimum Bayes risk training; stochastic feature mapping; vocal tract length perturbation; Acoustics; Data models; Hidden Markov models; Neural networks; Speech; Training; Training data; automatic speech recognition; data augmentation; deep neural networks; stochastic feature mapping; vocal tract length perturbation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
  • Conference_Location
    Florence
  • Type

    conf

  • DOI
    10.1109/ICASSP.2014.6854671
  • Filename
    6854671