Title :
Topic n-gram count language model adaptation for speech recognition
Author :
Haidar, Md Akmal ; O´Shaughnessy, D.
Author_Institution :
INRS-EMT, Montreal, QC, Canada
Abstract :
We introduce novel language model (LM) adaptation approaches using the latent Dirichlet allocation (LDA) model. Observed n-grams in the training set are assigned to topics using soft and hard clustering. In soft clustering, each n-gram is assigned to topics such that the total count of that n-gram for all topics is equal to the global count of that n-gram in the training set. Here, the normalized topic weights of the n-gram are multiplied by the global n-gram count to form the topic n-gram count for the respective topics. In hard clustering, each n-gram is assigned to a single topic with the maximum fraction of the global n-gram count for the corresponding topic. Here, the topic is selected using the maximum topic weight for the n-gram. The topic n-gram count LMs are created using the respective topic n-gram counts and adapted by using the topic weights of a development test set. We compute the average of the confidence measures: the probability of word given topic and the probability of topic given word. The average is taken over the words in the n-grams and the development test set to form the topic weights of the n-grams and the development test set respectively. Our approaches show better performance over some traditional approaches using the WSJ corpus.
Keywords :
pattern clustering; probability; speech recognition; LDA; LM adaptation approach; WSJ corpus; confidence measure; global n-gram count; hard clustering; latent Dirichlet allocation; normalized topic weight; probability; soft clustering; speech recognition; topic N-gram count language model; training set; Adaptation models; Computational modeling; Interpolation; Mathematical model; Speech recognition; Training; Weight measurement; language model adaptation; latent Dirichlet allocation; speech recognition; topic mixtures;
Conference_Titel :
Spoken Language Technology Workshop (SLT), 2012 IEEE
Conference_Location :
Miami, FL
Print_ISBN :
978-1-4673-5125-6
Electronic_ISBN :
978-1-4673-5124-9
DOI :
10.1109/SLT.2012.6424216