• DocumentCode
    734996
  • Title

    Improving HMM/DNN in ASR of under-resourced languages using probabilistic sampling

  • Author

    Meixu Song ; Qingqing Zhang ; Jielin Pan ; Yonghong Yan

  • Author_Institution
    Key Lab. of Speech Acoust. & Content Understanding, Beijing, China
  • fYear
    2015
  • fDate
    12-15 July 2015
  • Firstpage
    20
  • Lastpage
    24
  • Abstract
    In HMM/DNN automatic speech recognition (ASR) systems, the DNNs model the posterior probabilities for triphone states. However, triphone states are unevenly distributed. In this situation, the training algorithm tends to converge to a local optimum more related to states with rich data than states with poor data. Thus, the imbalance of the training data decreases the ASR performances, especially for under-resourced languages. To deal with this issue, we explore a resampling technique, called “probabilistic sampling”, which can be seen as a linear smoothing between the original sampling and the uniform sampling. The effectiveness of the probabilistic sampling has been studied in two under-resourced ASR experiments. With the probabilistic sampling, the first experiment got a 6.3% relative phone error rate (PER) reduction compared to the conventional DNN baseline; the second experiment used shared-hidden-layer multilingual DNN as the baseline, and obtained a 4.9% relative PER reduction.
  • Keywords
    hidden Markov models; learning (artificial intelligence); neural nets; probability; signal sampling; smoothing methods; speech recognition; ASR system; HMM-DNN automatic speech recognition system; PER reduction; deep neural network; linear smoothing; phone error rate; posterior probability; probabilistic sampling; resampling technique; shared-hidden-layer multilingual DNN; training algorithm; training data imbalance; triphone state; underresourced language; uniform sampling; Acoustics; Hidden Markov models; Probabilistic logic; Speech; Speech recognition; Training; Training data; Automatic speech recognition; HM-M/DNN hybrid; probabilistic sampling; under-resourced languages;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Signal and Information Processing (ChinaSIP), 2015 IEEE China Summit and International Conference on
  • Conference_Location
    Chengdu
  • Type

    conf

  • DOI
    10.1109/ChinaSIP.2015.7230354
  • Filename
    7230354