Title :
Hidden Markov Acoustic Modeling With Bootstrap and Restructuring for Low-Resourced Languages
Author :
Cui, Xiaodong ; Xue, Jian ; Chen, Xin ; Olsen, Peder A. ; Dognin, Pierre L. ; Chaudhari, Upendra V. ; Hershey, John R. ; Zhou, Bowen
Author_Institution :
IBM T. J. Watson Res. Center, Yorktown Heights, NY, USA
Abstract :
This paper proposes an acoustic modeling approach based on bootstrap and restructuring to dealing with data sparsity for low-resourced languages. The goal of the approach is to improve the statistical reliability of acoustic modeling for automatic speech recognition (ASR) in the context of speed, memory and response latency requirements for real-world applications. In this approach, randomized hidden Markov models (HMMs) estimated from the bootstrapped training data are aggregated for reliable sequence prediction. The aggregation leads to an HMM with superior prediction capability at cost of a substantially larger size. For practical usage the aggregated HMM is restructured by Gaussian clustering followed by model refinement. The restructuring aims at reducing the aggregated HMM to a desirable model size while maintaining its performance close to the original aggregated HMM. To that end, various Gaussian clustering criteria and model refinement algorithms have been investigated in the full covariance model space before the conversion to the diagonal covariance model space in the last stage of the restructuring. Large vocabulary continuous speech recognition (LVCSR) experiments on Pashto and Dari have shown that acoustic models obtained by the proposed approach can yield superior performance over the conventional training procedure with almost the same run-time memory consumption and decoding speed.
Keywords :
Gaussian processes; acoustic signal processing; decoding; hidden Markov models; pattern clustering; prediction theory; reliability; speech recognition; Gaussian clustering criteria; HMM; automatic speech recognition; bootstrap; data sparsity; decoding speed; diagonal covariance model space; full covariance model space; hidden Markov acoustic modeling; large vocabulary continuous speech recognition experiment; low-resourced language; model refinement algorithm; prediction capability; randomized hidden Markov model; response latency requirement; run-time memory consumption; sequence prediction reliability; statistical reliability; Acoustics; Computational modeling; Data models; Hidden Markov models; Speech; Training; Training data; Bagging; bootstrap and restructuring; hidden Markov model (HMM); large vocabulary continuous speech recognition (LVCSR); low-resourced language;
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
DOI :
10.1109/TASL.2012.2199982