DocumentCode :
1787085
Title :
Using topic models in domain adaptation
Author :
Zahabi, Samira Tofighi ; Bakhshaei, Somayeh ; Khadivi, Shahram
Author_Institution :
HLT Lab., Amirkabir Univ. of Technol., Tehran, Iran
fYear :
2014
fDate :
9-11 Sept. 2014
Firstpage :
539
Lastpage :
543
Abstract :
An important factor of a corpus is its domain, usually the quality of a SMT system trained on an in-domain corpus increases by adding out-of-domain sentences to its training corpus. In this paper we have shown out-of-domain corpora may also contains sentences which are proper for improving the quality of in-domain corpus. These sentences have words and phrases that occur in indomain corpora so, their context is more similar to the context of in-domain parallel corpus and is far from context of out-of-domain parallel corpora. In this paper we suggest a method based on topic models to extract some sentences from out-of-domain parallel corpora that their context are similar to indomain parallel corpus. We used these extracted sentences for training an SMT system. Finally, we will show the BLEU score of the system output increases about 4.69% by adding these extra information to its training corpus.
Keywords :
language translation; natural language processing; BLEU score; SMT system; in-domain parallel corpus; natural language processing; out-of-domain sentences; sentence extraction; statistical machine translation; topic models; Adaptation models; Computational modeling; Context; Context modeling; Equations; Mathematical model; Training; Natural Language Processing; Topic Model; Translation Model; domain adaptation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Telecommunications (IST), 2014 7th International Symposium on
Conference_Location :
Tehran
Print_ISBN :
978-1-4799-5358-5
Type :
conf
DOI :
10.1109/ISTEL.2014.7000763
Filename :
7000763
Link To Document :
بازگشت