Title :
Efficient Estimation of Language Model Statistics of Spontaneous Speech Via Statistical Transformation Model
Author :
Akita, Yuya ; Kawahara, Tatsuya
Author_Institution :
Acad. Center for Comput. & Media Studies, Kyoto Univ.
Abstract :
One of the most significant problems in language modeling of spontaneous speech such as meetings and lectures is that only limited amount of matched training data, i.e. faithful transcript for the relevant task domain, is available. In this paper, we propose a novel transformation approach to estimate language model statistics of spontaneous speech from a document-style text database, which is often available with a large scale. The proposed statistical transformation model is designed for modeling characteristic linguistic phenomena in spontaneous speech and estimating their occurrence probabilities. These contextual patterns and probabilities are derived from a small amount of parallel aligned corpus of the faithful transcripts and their document-style texts. To realize wide coverage and reliable estimation, a model based on part-of-speech (POS) is also prepared to provide a back-off scheme from a word-based model. The approach has been successfully applied to estimation of the language model for National Congress meetings from their minute archives, and significant reduction of test-set perplexity is achieved
Keywords :
language translation; natural languages; speech processing; statistical analysis; back-off scheme; document-style text database; language model statistics; linguistic phenomena; part-of-speech; spontaneous speech; statistical machine translation; statistical transformation model; test-set perplexity; word-based model; Automatic speech recognition; Databases; Differential equations; Large-scale systems; Minutes; Natural languages; Probability; Speech recognition; Statistics; Training data;
Conference_Titel :
Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on
Conference_Location :
Toulouse
Print_ISBN :
1-4244-0469-X
DOI :
10.1109/ICASSP.2006.1660204