DocumentCode :
3752216
Title :
Deep neural network based acoustic model using speaker-class information for short time utterance
Author :
Hiroshi Seki;Kazumasa Yamamoto;Seiichi Nakagawa
Author_Institution :
Toyohashi University of Technology, Aichi, Japan
fYear :
2015
Firstpage :
1222
Lastpage :
1225
Abstract :
In speech recognition, it is preferable not to hypothesize the details, e.g., specific age and gender, of a target user. However, speaker independence is one of the things that degrades ASR performance. In this work, we propose a speaker adaptation method to recognize a short time utterance. There have been several studies on speaker-independent DNN-HMM in which i-vector is computed, and the additional information is combined with acoustic features. However, it is difficult to calculate i-vector accurately or apply speaker adaptation (e.g. fMLLR) when the utterance time is short (0.5sec~). In our approach, we calculate the similarity score between the speaker class and the target utterance and utilize speaker class information configured in advance. As a precondition, we restrict the available time period to the first 50 frames per utterance for the recognition of short utterances. In experimental tests, we obtained a 4.0% relative WER gain compared to conventional DNN-HMM.
Keywords :
"Training data","Acoustics","Hidden Markov models","Speech recognition","Speech","Data models","Databases"
Publisher :
ieee
Conference_Titel :
Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2015 Asia-Pacific
Type :
conf
DOI :
10.1109/APSIPA.2015.7415467
Filename :
7415467
Link To Document :
بازگشت