• DocumentCode
    3752216
  • Title

    Deep neural network based acoustic model using speaker-class information for short time utterance

  • Author

    Hiroshi Seki;Kazumasa Yamamoto;Seiichi Nakagawa

  • Author_Institution
    Toyohashi University of Technology, Aichi, Japan
  • fYear
    2015
  • Firstpage
    1222
  • Lastpage
    1225
  • Abstract
    In speech recognition, it is preferable not to hypothesize the details, e.g., specific age and gender, of a target user. However, speaker independence is one of the things that degrades ASR performance. In this work, we propose a speaker adaptation method to recognize a short time utterance. There have been several studies on speaker-independent DNN-HMM in which i-vector is computed, and the additional information is combined with acoustic features. However, it is difficult to calculate i-vector accurately or apply speaker adaptation (e.g. fMLLR) when the utterance time is short (0.5sec~). In our approach, we calculate the similarity score between the speaker class and the target utterance and utilize speaker class information configured in advance. As a precondition, we restrict the available time period to the first 50 frames per utterance for the recognition of short utterances. In experimental tests, we obtained a 4.0% relative WER gain compared to conventional DNN-HMM.
  • Keywords
    "Training data","Acoustics","Hidden Markov models","Speech recognition","Speech","Data models","Databases"
  • Publisher
    ieee
  • Conference_Titel
    Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2015 Asia-Pacific
  • Type

    conf

  • DOI
    10.1109/APSIPA.2015.7415467
  • Filename
    7415467