DocumentCode
3752216
Title
Deep neural network based acoustic model using speaker-class information for short time utterance
Author
Hiroshi Seki;Kazumasa Yamamoto;Seiichi Nakagawa
Author_Institution
Toyohashi University of Technology, Aichi, Japan
fYear
2015
Firstpage
1222
Lastpage
1225
Abstract
In speech recognition, it is preferable not to hypothesize the details, e.g., specific age and gender, of a target user. However, speaker independence is one of the things that degrades ASR performance. In this work, we propose a speaker adaptation method to recognize a short time utterance. There have been several studies on speaker-independent DNN-HMM in which i-vector is computed, and the additional information is combined with acoustic features. However, it is difficult to calculate i-vector accurately or apply speaker adaptation (e.g. fMLLR) when the utterance time is short (0.5sec~). In our approach, we calculate the similarity score between the speaker class and the target utterance and utilize speaker class information configured in advance. As a precondition, we restrict the available time period to the first 50 frames per utterance for the recognition of short utterances. In experimental tests, we obtained a 4.0% relative WER gain compared to conventional DNN-HMM.
Keywords
"Training data","Acoustics","Hidden Markov models","Speech recognition","Speech","Data models","Databases"
Publisher
ieee
Conference_Titel
Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2015 Asia-Pacific
Type
conf
DOI
10.1109/APSIPA.2015.7415467
Filename
7415467
Link To Document