Speaking rate dependent multiple acoustic models using continuous frame rate normalization

Author

Sung Min Ban ; Hyung Soon Kim

Author_Institution

Pusan Nat. Univ., Busan, South Korea

fYear

2012

fDate

3-6 Dec. 2012

Firstpage

1

Lastpage

4

Abstract

This paper proposes a method using speaking rate dependent multiple acoustic models for speech recognition. In this method, multiple acoustic models with various speaking rates are generated. Among them, the optimal acoustic model relevant to the speaking rate of test data is selected and used in recognition. To simulate the various speaking rates for the multiple acoustic models, we use the variable frame shift size considering the speaking rate of each utterance instead of applying a flat frame shift size to all training utterances. The continuous frame rate normalization (CFRN) is applied to each of training utterances to control the frame shift size. Experimental results show that the proposed method outperforms both the baseline and the conventional CFRN on test utterances.

Keywords

speech recognition; CFRN; continuous frame rate normalization; speaking rate dependent multiple acoustic models; speech recognition; test utterances; training utterances; variable frame shift size; Acoustics; Data models; Hidden Markov models; Speech; Speech recognition; Training; Training data;

fLanguage

English

Publisher

ieee

Conference_Titel

Signal & Information Processing Association Annual Summit and Conference (APSIPA ASC), 2012 Asia-Pacific

Conference_Location

Hollywood, CA

Print_ISBN

978-1-4673-4863-8

Type

conf

Filename

6412018