• DocumentCode
    117977
  • Title

    Unsupervised speaker adaptation of DNN-HMM by selecting similar speakers for lecture transcription

  • Author

    Mimura, Masato ; Kawahara, Tatsuya

  • Author_Institution
    Acad. Center for Comput. & Media Studies, Kyoto Univ., Kyoto, Japan
  • fYear
    2014
  • fDate
    9-12 Dec. 2014
  • Firstpage
    1
  • Lastpage
    4
  • Abstract
    Unsupervised speaker adaptation of Deep Neural Network (DNN) is investigated for lecture transcription tasks, in which a single speaker gives a long speech and thus speaker adaptation is important. The proposed method selects similar speakers to the test data (test speaker) from the training database, which are used for retraining the baseline DNN. Several speaker characteristic features are defined for the speaker similarity measure. The feature based on Universal Background Model (UBM) and principal component analysis (PCA) achieves the best performance, resulting in a significant improvement from the baseline DNN and also from the adapted GMM-HMM system. The method is combined with a naive adaptation method using the initial ASR hypothesis of the test data, and an additional improvement is achieved.
  • Keywords
    audio databases; neural nets; principal component analysis; speaker recognition; DNN-HMM; Deep Neural Network; PCA; UBM; lecture transcription; principal component analysis; selecting similar speakers; test data; test speaker; training database; universal background model; unsupervised speaker adaptation; Accuracy; Adaptation models; Databases; Hidden Markov models; Speech; Speech recognition; Training;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Asia-Pacific Signal and Information Processing Association, 2014 Annual Summit and Conference (APSIPA)
  • Conference_Location
    Siem Reap
  • Type

    conf

  • DOI
    10.1109/APSIPA.2014.7041567
  • Filename
    7041567