DocumentCode :
117977
Title :
Unsupervised speaker adaptation of DNN-HMM by selecting similar speakers for lecture transcription
Author :
Mimura, Masato ; Kawahara, Tatsuya
Author_Institution :
Acad. Center for Comput. & Media Studies, Kyoto Univ., Kyoto, Japan
fYear :
2014
fDate :
9-12 Dec. 2014
Firstpage :
1
Lastpage :
4
Abstract :
Unsupervised speaker adaptation of Deep Neural Network (DNN) is investigated for lecture transcription tasks, in which a single speaker gives a long speech and thus speaker adaptation is important. The proposed method selects similar speakers to the test data (test speaker) from the training database, which are used for retraining the baseline DNN. Several speaker characteristic features are defined for the speaker similarity measure. The feature based on Universal Background Model (UBM) and principal component analysis (PCA) achieves the best performance, resulting in a significant improvement from the baseline DNN and also from the adapted GMM-HMM system. The method is combined with a naive adaptation method using the initial ASR hypothesis of the test data, and an additional improvement is achieved.
Keywords :
audio databases; neural nets; principal component analysis; speaker recognition; DNN-HMM; Deep Neural Network; PCA; UBM; lecture transcription; principal component analysis; selecting similar speakers; test data; test speaker; training database; universal background model; unsupervised speaker adaptation; Accuracy; Adaptation models; Databases; Hidden Markov models; Speech; Speech recognition; Training;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Asia-Pacific Signal and Information Processing Association, 2014 Annual Summit and Conference (APSIPA)
Conference_Location :
Siem Reap
Type :
conf
DOI :
10.1109/APSIPA.2014.7041567
Filename :
7041567
Link To Document :
بازگشت