• DocumentCode
    3585017
  • Title

    Semi-supervised DNN training in meeting recognition

  • Author

    Pengyuan Zhang ; Yulan Liu ; Hain, Thomas

  • Author_Institution
    Key Lab. of Speech Acoust. & Content Understanding, IACAS, Beijing, China
  • fYear
    2014
  • Firstpage
    141
  • Lastpage
    146
  • Abstract
    Training acoustic models for ASR requires large amounts of labelled data which is costly to obtain. Hence it is desirable to make use of unlabelled data. While unsupervised training can give gains for standard HMM training, it is more difficult to make use of unlabelled data for discriminative models. This paper explores semi-supervised training of Deep Neural Networks (DNN) in a meeting recognition task. We first analyse the impact of imperfect transcription on the DNN and the ASR performance. As labelling error is the source of the problem, we investigate two options available to reduce that: selecting data with fewer errors, and changing the dependence on noise by reducing label precision. Both confidence based data selection and label resolution change are explored in the context of two scenarios of matched and unmatched unlabelled data. We introduce improved DNN based confidence score estimators and show their performance on data selection for both scenarios. Confidence score based data selection was found to yield up to 14.6% relative WER reduction, while better balance between label resolution and recognition hypothesis accuracy allowed further WER reductions by 16.6% relative in the mismatched scenario.
  • Keywords
    Gaussian processes; hidden Markov models; learning (artificial intelligence); mixture models; neural nets; speech recognition; ASR; WER reduction; confidence score estimators; data selection; deep neural networks; meeting recognition task; semisupervised DNN training; standard HMM training; training acoustic models; unmatched unlabelled data; unsupervised training; Abstracts; Accuracy; Lead; Reliability; Silicon; Training; confidence selection; deep neural networks; semi-supervised acoustic model training;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Spoken Language Technology Workshop (SLT), 2014 IEEE
  • Type

    conf

  • DOI
    10.1109/SLT.2014.7078564
  • Filename
    7078564