Title :
Unsupervised language model adaptation via topic modeling based on named entity hypotheses
Author :
Liu, Yang ; Liu, Feifan
Author_Institution :
Univ. of Texas at Dallas, Richardson, TX
fDate :
March 31 2008-April 4 2008
Abstract :
Language model (LM) adaptation is often achieved by combining a generic LM with a topic-specific model that is more relevant to the target document. Unlike previous work on unsupervised LM adaptation, in this paper we propose to leverage named entity (NE) information for topic analysis and LM adaptation. We investigate two topic modeling approaches, latent Dirichlet allocation (LDA) and clustering, and proposed a new mixture topic model for LDA based LM adaptation. Our experiments for N-best list rescoring have shown that this new adaptation framework using NE information and topic analysis outperforms the baseline generic N-gram LM based on a state-of-the-art Mandarin recognition system.
Keywords :
natural language processing; speech recognition; unsupervised learning; Mandarin recognition system; N-best list rescoring; clustering; latent Dirichlet allocation; named entity hypotheses; speech recognition; topic analysis; topic modeling; unsupervised language model adaptation; Adaptation model; Broadcasting; Hidden Markov models; Information analysis; Linear discriminant analysis; Natural languages; Performance analysis; Speech analysis; Speech recognition; Text categorization; clustering; language model adaptation; latent Dirichlet allocation (LDA); named entities; rescoring;
Conference_Titel :
Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on
Conference_Location :
Las Vegas, NV
Print_ISBN :
978-1-4244-1483-3
Electronic_ISBN :
1520-6149
DOI :
10.1109/ICASSP.2008.4518761