مرکز منطقه ای اطلاع رساني علوم و فناوري - Speaker adaptation of neural network acoustic models using i-vectors

DocumentCode :

672328

Title :

Speaker adaptation of neural network acoustic models using i-vectors

Author :

Saon, George ; Soltau, Hagen ; Nahamoo, David ; Picheny, Michael

Author_Institution :

IBM T. J. Watson Res. Center, Yorktown Heights, NY, USA

fYear :

2013

fDate :

8-12 Dec. 2013

Firstpage :

Lastpage :

Abstract :

We propose to adapt deep neural network (DNN) acoustic models to a target speaker by supplying speaker identity vectors (i-vectors) as input features to the network in parallel with the regular acoustic features for ASR. For both training and test, the i-vector for a given speaker is concatenated to every frame belonging to that speaker and changes across different speakers. Experimental results on a Switchboard 300 hours corpus show that DNNs trained on speaker independent features and i-vectors achieve a 10% relative improvement in word error rate (WER) over networks trained on speaker independent features only. These networks are comparable in performance to DNNs trained on speaker-adapted features (with VTLN and FMLLR) with the advantage that only one decoding pass is needed. Furthermore, networks trained on speaker-adapted features and i-vectors achieve a 5-6% relative improvement in WER after hessian-free sequence training over networks trained on speaker-adapted features only.

Keywords :

learning (artificial intelligence); neural nets; speech recognition; ASR; DNN; FMLLR; Switchboard 300 hours corpus; VTLN; WER; acoustic features; deep neural network acoustic models; hessian-free sequence training; i-vectors; speaker adaptation; speaker independent features; speaker-adapted features; word error rate; Acoustics; Feature extraction; Hidden Markov models; Neural networks; Training; Training data; Vectors;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Automatic Speech Recognition and Understanding (ASRU), 2013 IEEE Workshop on

Conference_Location :

Olomouc

Type :

conf

DOI :

10.1109/ASRU.2013.6707705

Filename :

6707705

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=672328