DocumentCode :
2910589
Title :
Mining Parallel Data from Comparable Corpora via Triangulation
Author :
Do, Thi-Ngoc-Diep ; Castelli, Eric ; Besacier, Laurent
Author_Institution :
MICA Center, Grenoble INP, Hanoi, Vietnam
fYear :
2011
fDate :
15-17 Nov. 2011
Firstpage :
185
Lastpage :
188
Abstract :
This paper improves an unsupervised method for extracting parallel sentence pairs from a comparable corpus by using the triangulation through a third language. Before, an unsupervised method for extracting parallel sentence pairs from a comparable corpus has been proposed. This method is based on technique of cross-language information retrieval with iterative process and requires no more additional parallel data. The method has been validated on the Vietnamese-French and Vietnamese-English bilingual data. In this paper, we address the problem of using triangulation through a third language to improve the parallel data mining processes: English is used in the Vietnamese-French parallel data mining process, and French is used in the Vietnamese-English parallel data mining process. The experiments conducted show that using triangulation can improve the quality of the extracted data and the quality of the translation system as well.
Keywords :
data mining; information retrieval; iterative methods; language translation; natural language processing; Vietnamese-English bilingual data; Vietnamese-French bilingual data; comparable corpora; cross-language information retrieval; iterative process; machine translation; parallel data mining; parallel sentence pair extraction; translation system quality; triangulation; unsupervised method; Computational linguistics; Data mining; Information filters; Noise measurement; Training; comparable corpus; extracting parallel sentence pairs; triangulation method; unsupervised method;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Asian Language Processing (IALP), 2011 International Conference on
Conference_Location :
Penang
Print_ISBN :
978-1-4577-1733-8
Type :
conf
DOI :
10.1109/IALP.2011.57
Filename :
6121499
Link To Document :
بازگشت