Title :
Training data selection based on context-dependent state matching
Author_Institution :
Google Inc., New York, NY, USA
Abstract :
In this paper we construct a data set for semi-supervised acoustic model training by selecting spoken utterances from a massive collection of anonymized Google Voice Search utterances. Semi-supervised training usually retains high-confidence utterances which are presumed to have an accurate hypothesized transcript, a necessary condition for successful training. Selecting high confidence utterances can however restrict the diversity of the resulting data set. We propose to introduce a constraint enforcing that the distribution of the context-dependent state symbols obtained by running forced alignment of the hypothesized transcript matches a reference distribution estimated from a curated development set. The quality of the obtained training set is illustrated on large scale Voice Search recognition experiments and outperforms random selection of high-confidence utterances.
Keywords :
speech recognition; training; context-dependent state matching; curated development set; google voice search utterances; high-confidence utterance selection; hypothesized transcript matches; large scale voice search recognition; reference distribution estimation; running forced alignment; semisupervised acoustic model training; spoken utterances selection; training data selection; Acoustics; Google; Hidden Markov models; Mobile communication; Speech; Speech processing; Training; data selection; semi-supervised training;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
Conference_Location :
Florence
DOI :
10.1109/ICASSP.2014.6854214