• DocumentCode
    1757366
  • Title

    Maximum F1-Score Discriminative Training Criterion for Automatic Mispronunciation Detection

  • Author

    Hao Huang ; Haihua Xu ; Xianhui Wang ; Silamu, Wushour

  • Author_Institution
    Sch. of Inf. Sci. & Eng., Xinjiang Univ., Urumqi, China
  • Volume
    23
  • Issue
    4
  • fYear
    2015
  • fDate
    42095
  • Firstpage
    787
  • Lastpage
    797
  • Abstract
    We carry out an in-depth investigation on a newly proposed Maximum F1-score Criterion (MFC) discriminative training objective function for Goodness of Pronunciation (GOP) based automatic mispronunciation detection that makes use of Gaussian Mixture Model-hidden Markov model (GMM-HMM) as acoustic models. The formulation of MFC seeks to directly optimize F1-score by converting the non-differentiable F1-score function into a continuous objective function to facilitate optimization. We present model-space training algorithm according to MFC using extended Baum-Welch form like update equations based on the weak-sense auxiliary function method. We then present MFC based feature-space discriminative training. We train a matrix projecting from posteriors of Gaussians to a normal size feature space, and add the projected features to traditional spectral features. Mispronunciation detection experiments show MFC based model-space training and feature-space training are effective in improving F1-score and other commonly used evaluation metrics. It is also shown MFC training in both the feature-space and model-space outperforms either model-space training or feature-space training alone, and is about 11.6% better than the maximum likelihood (ML) trained baseline in terms of F1-score. Further, we review and compare mispronunciation detection results with the use of MFC and some traditional training criteria that minimize word error rate in speech recognition. The experimental analysis and comparison provide useful insight into the correlations between F1-score maximization and optimization of these training criteria.
  • Keywords
    Gaussian processes; hidden Markov models; learning (artificial intelligence); matrix algebra; mixture models; speech recognition; Gaussian mixture model-hidden Markov model; MFC based model-space training; extended Baum-Welch form; feature-space training; goodness of pronunciation based automatic mispronunciation detection; maximum F1-score discriminative training criterion; model-space training algorithm; nondifferentiable F1-score function; speech recognition; weak-sense auxiliary function method; Acoustics; Hidden Markov models; Linear programming; Mathematical model; Speech; Speech processing; Training; Automatic mispronunciation detection; F1-score; computer-assisted language learning; discriminative training; feature extraction;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    2329-9290
  • Type

    jour

  • DOI
    10.1109/TASLP.2015.2409733
  • Filename
    7055841