DocumentCode :
672328
Title :
Speaker adaptation of neural network acoustic models using i-vectors
Author :
Saon, George ; Soltau, Hagen ; Nahamoo, David ; Picheny, Michael
Author_Institution :
IBM T. J. Watson Res. Center, Yorktown Heights, NY, USA
fYear :
2013
fDate :
8-12 Dec. 2013
Firstpage :
55
Lastpage :
59
Abstract :
We propose to adapt deep neural network (DNN) acoustic models to a target speaker by supplying speaker identity vectors (i-vectors) as input features to the network in parallel with the regular acoustic features for ASR. For both training and test, the i-vector for a given speaker is concatenated to every frame belonging to that speaker and changes across different speakers. Experimental results on a Switchboard 300 hours corpus show that DNNs trained on speaker independent features and i-vectors achieve a 10% relative improvement in word error rate (WER) over networks trained on speaker independent features only. These networks are comparable in performance to DNNs trained on speaker-adapted features (with VTLN and FMLLR) with the advantage that only one decoding pass is needed. Furthermore, networks trained on speaker-adapted features and i-vectors achieve a 5-6% relative improvement in WER after hessian-free sequence training over networks trained on speaker-adapted features only.
Keywords :
learning (artificial intelligence); neural nets; speech recognition; ASR; DNN; FMLLR; Switchboard 300 hours corpus; VTLN; WER; acoustic features; deep neural network acoustic models; hessian-free sequence training; i-vectors; speaker adaptation; speaker independent features; speaker-adapted features; word error rate; Acoustics; Feature extraction; Hidden Markov models; Neural networks; Training; Training data; Vectors;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Automatic Speech Recognition and Understanding (ASRU), 2013 IEEE Workshop on
Conference_Location :
Olomouc
Type :
conf
DOI :
10.1109/ASRU.2013.6707705
Filename :
6707705
Link To Document :
بازگشت