DocumentCode
590871
Title
Speaking rate dependent multiple acoustic models using continuous frame rate normalization
Author
Sung Min Ban ; Hyung Soon Kim
Author_Institution
Pusan Nat. Univ., Busan, South Korea
fYear
2012
fDate
3-6 Dec. 2012
Firstpage
1
Lastpage
4
Abstract
This paper proposes a method using speaking rate dependent multiple acoustic models for speech recognition. In this method, multiple acoustic models with various speaking rates are generated. Among them, the optimal acoustic model relevant to the speaking rate of test data is selected and used in recognition. To simulate the various speaking rates for the multiple acoustic models, we use the variable frame shift size considering the speaking rate of each utterance instead of applying a flat frame shift size to all training utterances. The continuous frame rate normalization (CFRN) is applied to each of training utterances to control the frame shift size. Experimental results show that the proposed method outperforms both the baseline and the conventional CFRN on test utterances.
Keywords
speech recognition; CFRN; continuous frame rate normalization; speaking rate dependent multiple acoustic models; speech recognition; test utterances; training utterances; variable frame shift size; Acoustics; Data models; Hidden Markov models; Speech; Speech recognition; Training; Training data;
fLanguage
English
Publisher
ieee
Conference_Titel
Signal & Information Processing Association Annual Summit and Conference (APSIPA ASC), 2012 Asia-Pacific
Conference_Location
Hollywood, CA
Print_ISBN
978-1-4673-4863-8
Type
conf
Filename
6412018
Link To Document