DocumentCode :
730833
Title :
Document-specific context plsa language model for speech recognition
Author :
Haidar, Md Akmal ; O´Shaughnessy, Douglas
Author_Institution :
INRS-EMT, Montreal, QC, Canada
fYear :
2015
fDate :
19-24 April 2015
Firstpage :
5326
Lastpage :
5330
Abstract :
In this paper, we introduce a document-specific context probabilistic latent semantic analysis (DCPLSA) model for speech recognition. This is an extension of a CPLSA model [1] where the probability of word is conditioned only on topics. The CPLSA model uses the bigram counts that are the number of appearances of the bigrams in the corpus. These counts are the sum of the bigram counts in different documents where they could appear to describe different topics. We encounter this problem in the CPLSA model and introduce the document-specific CPLSA model (DCPLSA) where the probability of a word is conditioned on both topic and document. We carried out experiments on a continuous speech recognition (CSR) task using the Wall Street Journal (WSJ) corpus and have seen that the proposed DCPLSA approach yields significant reduction in both perplexity and word error rate (WER) measurements over the other approaches used in the literature.
Keywords :
probability; speech recognition; Wall Street Journal corpus; bigram counts; continuous speech recognition; document-specific context PLSA language model; document-specific context probabilistic latent semantic analysis model; word error rate; Adaptation models; Computational modeling; Context; Context modeling; Mathematical model; Speech recognition; Training; Topic models; bigram PLSA models; context-based PLSA language model; speech recognition; statistical language model;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on
Conference_Location :
South Brisbane, QLD
Type :
conf
DOI :
10.1109/ICASSP.2015.7178988
Filename :
7178988
Link To Document :
بازگشت