DocumentCode :
1908657
Title :
Findings and Considerations in Active Learning Based Framework for Resource-Poor SMT
Author :
Jinhua Du ; Meng Zhang
Author_Institution :
Sch. of Autom. & Inf. Eng., Xi´an Univ. of Technol., Xi´an, China
fYear :
2013
fDate :
17-19 Aug. 2013
Firstpage :
95
Lastpage :
98
Abstract :
Active learning (AL) for resource-poor SMT is an efficient and feasible way to acquire a number of high-quality parallel data to improve translation quality. This paper firstly studies two mainstream sentence selection algorithms that are Geom-phrase and Geom n-gram, and then proposes a sentence perplexity based selection method. Some important findings, such as the impact of sentence length on the AL performance, are observed in the comparison experiments conducted on Chinese-English NIST data. Accordingly, a preprocessing strategy is presented to filter the original monolingual corpus for the purpose of obtaining higher-information sentences. Experimental results on preprocessed data show that the the performance of three selection algorithms is significantly improved compared to the results on the original data.
Keywords :
language translation; learning (artificial intelligence); natural language processing; AL performance; Chinese-English NIST data; Geom n-gram; Geom-phrase; active learning based framework; high-quality parallel data; higher-information sentences; mainstream sentence selection algorithms; monolingual corpus; preprocessed data; preprocessing strategy; resource-poor SMT; sentence perplexity based selection method; translation quality; Algorithm design and analysis; Educational institutions; Measurement; NIST; Probability; System performance; Training; Active Learn-ing; Data Preprocessing; High-information Sentence Selection; Statistical Machine Translation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Asian Language Processing (IALP), 2013 International Conference on
Conference_Location :
Urumqi
Type :
conf
DOI :
10.1109/IALP.2013.28
Filename :
6646012
Link To Document :
بازگشت