DocumentCode
3124217
Title
Phrase-based data selection for language model adaptation in spoken language translation
Author
Shixiang Lu ; Wei Wei ; Xiaoyin Fu ; Lichun Fan ; Bo Xu
Author_Institution
Interactive Digital Media Technol. Res. Center, Inst. of Autom., Beijing, China
fYear
2012
fDate
5-8 Dec. 2012
Firstpage
193
Lastpage
196
Abstract
In this paper, we propose an unsupervised phrase-based data selection model, address the problem of selecting no-domain-specific language model (LM) training data to build adapted LM for use. In spoken language translation (SLT) system, we aim at finding the LM training sentences which are similar to the translation task. Compared with the traditional bag-of-words models, the phrase-based data selection model is more effective because it captures contextual information in modeling the selection of phrase as a whole, rather than selection of single words in isolation. Large-scale experimental results demonstrate that our approach significantly outperforms the state-of-the-art approaches on both LM perplexity and translation performance, respectively.
Keywords
language translation; speech processing; LM training sentence; SLT system; language model adaptation; no-domain-specific language model; spoken language translation system; unsupervised phrase-based data selection model; Adaptation models; Context modeling; Data models; Speech; Speech recognition; Training; Training data; contextual information; language model adaptation; phrase-based data selection; spoken language translation;
fLanguage
English
Publisher
ieee
Conference_Titel
Chinese Spoken Language Processing (ISCSLP), 2012 8th International Symposium on
Conference_Location
Kowloon
Print_ISBN
978-1-4673-2506-6
Electronic_ISBN
978-1-4673-2505-9
Type
conf
DOI
10.1109/ISCSLP.2012.6423483
Filename
6423483
Link To Document