DocumentCode :
3744829
Title :
Time delay deep neural network-based universal background models for speaker recognition
Author :
David Snyder;Daniel Garcia-Romero;Daniel Povey
Author_Institution :
Center for Language and Speech Processing & Human Language Technology Center of Excellence, The Johns Hopkins University, Baltimore, MD 21218, USA
fYear :
2015
Firstpage :
92
Lastpage :
97
Abstract :
Recently, deep neural networks (DNN) have been incorporated into i-vector-based speaker recognition systems, where they have significantly improved state-of-the-art performance. In these systems, a DNN is used to collect sufficient statistics for i-vector extraction. In this study, the DNN is a recently developed time delay deep neural network (TDNN) that has achieved promising results in LVCSR tasks. We believe that the TDNN-based system achieves the best reported results on SRE10 and it obtains a 50% relative improvement over our GMM baseline in terms of equal error rate (EER). For some applications, the computational cost of a DNN is high. Therefore, we also investigate a lightweight alternative in which a supervised GMM is derived from the TDNN posteriors. This method maintains the speed of the traditional unsupervised-GMM, but achieves a 20% relative improvement in EER.
Keywords :
"Speaker recognition","Feature extraction","Delay effects","Training","Neural networks","Computational modeling","Acoustics"
Publisher :
ieee
Conference_Titel :
Automatic Speech Recognition and Understanding (ASRU), 2015 IEEE Workshop on
Type :
conf
DOI :
10.1109/ASRU.2015.7404779
Filename :
7404779
Link To Document :
بازگشت