Title :
Bayesian nonparametric language models
Author :
Ying-Lan Chang ; Jen-Tzung Chien
Author_Institution :
Dept. of Electr. & Comput. Eng., Nat. Chiao Tung Univ., Hsinchu, China
Abstract :
Backoff smoothing and topic modeling are crucial issues in n-gram language model. This paper presents a Bayesian non-parametric learning approach to tackle these two issues. We develop a topic-based language model where the numbers of topics and n-grams are automatically determined from data. To cope with this model selection problem, we introduce the nonparametric priors for topics and backoff n-grams. The infinite language models are constructed through the hierarchical Dirichlet process compound Pitman-Yor (PY) process. We develop the topic-based hierarchical PY language model (THPY-LM) with power-law behavior. This model can be simplified to the hierarchical PY (HPY) LM by disregarding the topic information and also the modified Kneser-Ney (MKN) LM by further disregarding the Bayesian treatment. In the experiments, the proposed THPY-LM outperforms state-of-art methods using MKN-LM and HPY-LM.
Keywords :
Bayes methods; computational linguistics; nonparametric statistics; smoothing methods; Bayesian nonparametric language models; Bayesian nonparametric learning approach; Bayesian treatment; MKN LM; PY process; Pitman-Yor process; THPY-LM; backoff smoothing; hierarchical Dirichlet process; hierarchical PY LM; infinite language models; model selection problem; modified Kneser-Ney LM; n-gram language model; nonparametric priors; power-law behavior; topic information; topic modeling; topic-based hierarchical PY language model; topic-based language model; Bayesian methods; Computational modeling; Context; Data models; Smoothing methods; Speech; Speech recognition; Bayesian nonparametrics; backoff smoothing; language model; topic model;
Conference_Titel :
Chinese Spoken Language Processing (ISCSLP), 2012 8th International Symposium on
Conference_Location :
Kowloon
Print_ISBN :
978-1-4673-2506-6
Electronic_ISBN :
978-1-4673-2505-9
DOI :
10.1109/ISCSLP.2012.6423460