• DocumentCode
    3484629
  • Title

    Feature engineering in Context-Dependent Deep Neural Networks for conversational speech transcription

  • Author

    Seide, Frank ; Li, Gang ; Chen, Xie ; Yu, Dong

  • Author_Institution
    Microsoft Res. Asia, Beijing, China
  • fYear
    2011
  • fDate
    11-15 Dec. 2011
  • Firstpage
    24
  • Lastpage
    29
  • Abstract
    We investigate the potential of Context-Dependent Deep-Neural-Network HMMs, or CD-DNN-HMMs, from a feature-engineering perspective. Recently, we had shown that for speaker-independent transcription of phone calls (NIST RT03S Fisher data), CD-DNN-HMMs reduced the word error rate by as much as one third-from 27.4%, obtained by discriminatively trained Gaussian-mixture HMMs with HLDA features, to 18.5%-using 300+ hours of training data (Switchboard), 9000+ tied triphone states, and up to 9 hidden network layers.
  • Keywords
    Gaussian processes; hidden Markov models; neural nets; speaker recognition; speech synthesis; Gaussian-mixture HMM; context-dependent deep neural networks; conversational speech transcription; hidden Markov model; speaker-independent transcription; speech-to-text transcription; tied triphone states; word error rate; Accuracy; Adaptation models; Feature extraction; Hidden Markov models; Training; Transforms; Vectors;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Automatic Speech Recognition and Understanding (ASRU), 2011 IEEE Workshop on
  • Conference_Location
    Waikoloa, HI
  • Print_ISBN
    978-1-4673-0365-1
  • Electronic_ISBN
    978-1-4673-0366-8
  • Type

    conf

  • DOI
    10.1109/ASRU.2011.6163899
  • Filename
    6163899