DocumentCode :
2660015
Title :
Latent dirichlet language model for speech recognition
Author :
Chien, Jen-Tzung ; Chueh, Chuang-Hua
Author_Institution :
Dept. of Comput. Sci. & Inf. Eng., Nat. Cheng Kung Univ., Tainan
fYear :
2008
fDate :
15-19 Dec. 2008
Firstpage :
201
Lastpage :
204
Abstract :
Latent Dirichlet allocation (LDA) has been successfully presented for document modeling and classification. LDA calculates the document probability based on bag-of-words scheme without considering the sequence of words. This model discovers the topic structure at document level, which is different from the concern of word prediction in speech recognition. In this paper, we present a new latent Dirichlet language model (LDLM) for modeling of word sequence. A new Bayesian framework is introduced by merging the Dirichlet priors to characterize the uncertainty of latent topics of n-gram events. The robust topic-based language model is established accordingly. In the experiments, we implement LDLM for continuous speech recognition and obtain better performance than probabilistic latent semantic analysis (PLSA) based language method.
Keywords :
Bayes methods; speech recognition; text analysis; Bayesian framework; continuous speech recognition; latent Dirichlet allocation; latent Dirichlet language model; topic-based language model; word sequence modeling; Bayesian methods; Linear discriminant analysis; Merging; Natural languages; Predictive models; Probability; Robustness; Speech analysis; Speech recognition; Uncertainty; Bayes procedures; Natural languages; clustering methods; smoothing methods; speech recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Spoken Language Technology Workshop, 2008. SLT 2008. IEEE
Conference_Location :
Goa
Print_ISBN :
978-1-4244-3471-8
Electronic_ISBN :
978-1-4244-3472-5
Type :
conf
DOI :
10.1109/SLT.2008.4777875
Filename :
4777875
Link To Document :
بازگشت