• DocumentCode
    590871
  • Title

    Speaking rate dependent multiple acoustic models using continuous frame rate normalization

  • Author

    Sung Min Ban ; Hyung Soon Kim

  • Author_Institution
    Pusan Nat. Univ., Busan, South Korea
  • fYear
    2012
  • fDate
    3-6 Dec. 2012
  • Firstpage
    1
  • Lastpage
    4
  • Abstract
    This paper proposes a method using speaking rate dependent multiple acoustic models for speech recognition. In this method, multiple acoustic models with various speaking rates are generated. Among them, the optimal acoustic model relevant to the speaking rate of test data is selected and used in recognition. To simulate the various speaking rates for the multiple acoustic models, we use the variable frame shift size considering the speaking rate of each utterance instead of applying a flat frame shift size to all training utterances. The continuous frame rate normalization (CFRN) is applied to each of training utterances to control the frame shift size. Experimental results show that the proposed method outperforms both the baseline and the conventional CFRN on test utterances.
  • Keywords
    speech recognition; CFRN; continuous frame rate normalization; speaking rate dependent multiple acoustic models; speech recognition; test utterances; training utterances; variable frame shift size; Acoustics; Data models; Hidden Markov models; Speech; Speech recognition; Training; Training data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Signal & Information Processing Association Annual Summit and Conference (APSIPA ASC), 2012 Asia-Pacific
  • Conference_Location
    Hollywood, CA
  • Print_ISBN
    978-1-4673-4863-8
  • Type

    conf

  • Filename
    6412018