DocumentCode :
3166786
Title :
Distributed discriminative language models for Google voice-search
Author :
Jyothi, Preethi ; Johnson, Leif ; Chelba, Ciprian ; Strope, Brian
Author_Institution :
Ohio State Univ., Columbus, OH, USA
fYear :
2012
fDate :
25-30 March 2012
Firstpage :
5017
Lastpage :
5020
Abstract :
This paper considers large-scale linear discriminative language models trained using a distributed perceptron algorithm. The algorithm is implemented efficiently using a MapReduce/SSTable framework. This work also introduces the use of large amounts of unsupervised data (confidence filtered Google voice-search logs) in conjunction with a novel training procedure that regenerates word lattices for the given data with a weaker acoustic model than the one used to generate the unsupervised transcriptions for the logged data. We observe small but statistically significant improvements in recognition performance after reranking N-best lists of a standard Google voice-search data set.
Keywords :
data loggers; information retrieval; natural language processing; perceptrons; search engines; speech recognition; unsupervised learning; Google voice-search data set; MapReduce-SSTable framework; confidence filtered Google voice-search log; data logging; distributed perceptron algorithm; large-scale linear discriminative language model; unsupervised transcription; weaker acoustic model; word lattice regeneration; Data models; Error analysis; Google; Hidden Markov models; Lattices; Training; Training data; Discriminative language models; Distributed Perceptron; MapReduce;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on
Conference_Location :
Kyoto
ISSN :
1520-6149
Print_ISBN :
978-1-4673-0045-2
Electronic_ISBN :
1520-6149
Type :
conf
DOI :
10.1109/ICASSP.2012.6289047
Filename :
6289047
Link To Document :
بازگشت