مرکز منطقه ای اطلاع رساني علوم و فناوري - Training data selection based on context-dependent state matching

DocumentCode :

178735

Title :

Training data selection based on context-dependent state matching

Author :

Siohan, Olivier

Author_Institution :

Google Inc., New York, NY, USA

fYear :

2014

fDate :

4-9 May 2014

Firstpage :

3316

Lastpage :

3319

Abstract :

In this paper we construct a data set for semi-supervised acoustic model training by selecting spoken utterances from a massive collection of anonymized Google Voice Search utterances. Semi-supervised training usually retains high-confidence utterances which are presumed to have an accurate hypothesized transcript, a necessary condition for successful training. Selecting high confidence utterances can however restrict the diversity of the resulting data set. We propose to introduce a constraint enforcing that the distribution of the context-dependent state symbols obtained by running forced alignment of the hypothesized transcript matches a reference distribution estimated from a curated development set. The quality of the obtained training set is illustrated on large scale Voice Search recognition experiments and outperforms random selection of high-confidence utterances.

Keywords :

speech recognition; training; context-dependent state matching; curated development set; google voice search utterances; high-confidence utterance selection; hypothesized transcript matches; large scale voice search recognition; reference distribution estimation; running forced alignment; semisupervised acoustic model training; spoken utterances selection; training data selection; Acoustics; Google; Hidden Markov models; Mobile communication; Speech; Speech processing; Training; data selection; semi-supervised training;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on

Conference_Location :

Florence

Type :

conf

DOI :

10.1109/ICASSP.2014.6854214

Filename :

6854214

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=178735