• DocumentCode
    3166786
  • Title

    Distributed discriminative language models for Google voice-search

  • Author

    Jyothi, Preethi ; Johnson, Leif ; Chelba, Ciprian ; Strope, Brian

  • Author_Institution
    Ohio State Univ., Columbus, OH, USA
  • fYear
    2012
  • fDate
    25-30 March 2012
  • Firstpage
    5017
  • Lastpage
    5020
  • Abstract
    This paper considers large-scale linear discriminative language models trained using a distributed perceptron algorithm. The algorithm is implemented efficiently using a MapReduce/SSTable framework. This work also introduces the use of large amounts of unsupervised data (confidence filtered Google voice-search logs) in conjunction with a novel training procedure that regenerates word lattices for the given data with a weaker acoustic model than the one used to generate the unsupervised transcriptions for the logged data. We observe small but statistically significant improvements in recognition performance after reranking N-best lists of a standard Google voice-search data set.
  • Keywords
    data loggers; information retrieval; natural language processing; perceptrons; search engines; speech recognition; unsupervised learning; Google voice-search data set; MapReduce-SSTable framework; confidence filtered Google voice-search log; data logging; distributed perceptron algorithm; large-scale linear discriminative language model; unsupervised transcription; weaker acoustic model; word lattice regeneration; Data models; Error analysis; Google; Hidden Markov models; Lattices; Training; Training data; Discriminative language models; Distributed Perceptron; MapReduce;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on
  • Conference_Location
    Kyoto
  • ISSN
    1520-6149
  • Print_ISBN
    978-1-4673-0045-2
  • Electronic_ISBN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2012.6289047
  • Filename
    6289047