• DocumentCode
    730719
  • Title

    Speaker adaptive training for deep neural networks embedding linear transformation networks

  • Author

    Ochiai, Tsubasa ; Matsuda, Shigeki ; Watanabe, Hideyuki ; Xugang Lu ; Hori, Chiori ; Katagiri, Shigeru

  • Author_Institution
    Nat. Inst. of Inf. & Commun. Technol., Kyoto, Japan
  • fYear
    2015
  • fDate
    19-24 April 2015
  • Firstpage
    4605
  • Lastpage
    4609
  • Abstract
    Recently, a novel speaker adaptation method was proposed that applied the Speaker Adaptive Training (SAT) concept to a speech recognizer consisting of a Deep Neural Network (DNN) and a Hidden Markov Model (HMM), and its utility was demonstrated. This method implements the SAT scheme by allocating one Speaker Dependent (SD) module for each training speaker to one of the intermediate layers of the front-end DNN. It then jointly optimizes the SD modules and the other part of network, which is shared by all the speakers. In this paper, we propose an improved version of the above SAT-based adaptation scheme for a DNN-HMM recognizer. Our new training adopts a Linear Transformation Network (LTN) for the SD module, and such LTN employment leads to more appropriate regularization in both the SAT and adaptation stages by replacing an empirically selected anchorage of a network for regularization in the preceding SAT-DNN-HMM with a SAT-optimized anchorage. We elaborate the effectiveness of our proposed method over TED Talks corpus data. Our experimental results show that a speaker-adapted recognizer using our method achieves a significant word error rate reduction of 9.2 points from a baseline SI-DNN recognizer and also steadily outperforms speaker-adapted recognizers, each of which originates from the preceding SAT-based DNN-HMM.
  • Keywords
    hidden Markov models; neural nets; speaker recognition; LTN; SAT-DNN-HMM; SD module; SI-DNN recognizer; deep neural network embedding linear transformation network; front-end DNN intermediate layer; hidden Markov model; speaker adaptive training; speaker dependent module; speaker-adapted recognizer; word error rate reduction; Acoustics; Adaptation models; Hidden Markov models; Neural networks; Speech; Speech recognition; Training; Deep Neural Network; Linear Transformation Network; Speaker Adaptive Training;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on
  • Conference_Location
    South Brisbane, QLD
  • Type

    conf

  • DOI
    10.1109/ICASSP.2015.7178843
  • Filename
    7178843