DocumentCode
117977
Title
Unsupervised speaker adaptation of DNN-HMM by selecting similar speakers for lecture transcription
Author
Mimura, Masato ; Kawahara, Tatsuya
Author_Institution
Acad. Center for Comput. & Media Studies, Kyoto Univ., Kyoto, Japan
fYear
2014
fDate
9-12 Dec. 2014
Firstpage
1
Lastpage
4
Abstract
Unsupervised speaker adaptation of Deep Neural Network (DNN) is investigated for lecture transcription tasks, in which a single speaker gives a long speech and thus speaker adaptation is important. The proposed method selects similar speakers to the test data (test speaker) from the training database, which are used for retraining the baseline DNN. Several speaker characteristic features are defined for the speaker similarity measure. The feature based on Universal Background Model (UBM) and principal component analysis (PCA) achieves the best performance, resulting in a significant improvement from the baseline DNN and also from the adapted GMM-HMM system. The method is combined with a naive adaptation method using the initial ASR hypothesis of the test data, and an additional improvement is achieved.
Keywords
audio databases; neural nets; principal component analysis; speaker recognition; DNN-HMM; Deep Neural Network; PCA; UBM; lecture transcription; principal component analysis; selecting similar speakers; test data; test speaker; training database; universal background model; unsupervised speaker adaptation; Accuracy; Adaptation models; Databases; Hidden Markov models; Speech; Speech recognition; Training;
fLanguage
English
Publisher
ieee
Conference_Titel
Asia-Pacific Signal and Information Processing Association, 2014 Annual Summit and Conference (APSIPA)
Conference_Location
Siem Reap
Type
conf
DOI
10.1109/APSIPA.2014.7041567
Filename
7041567
Link To Document