مرکز منطقه ای اطلاع رساني علوم و فناوري - Extracting parallel phrases from comparable corpora

DocumentCode :

172529

Title :

Extracting parallel phrases from comparable corpora

Author :

Jiexin Zhang ; Hailong Cao ; Tiejun Zhao

Author_Institution :

Sch. of Cornputer Sci. & Technol., Harbin Inst. of Technol., Harbin, China

fYear :

2014

fDate :

20-22 Oct. 2014

Firstpage :

166

Lastpage :

169

Abstract :

The state-of-the-art statistical machine translation models are trained with the parallel corpora. However, the traditional SMT loses its power when it comes to language pairs with few bilingual resources. This paper proposes a novel method that treats the phrase extraction as a classification task. We first automatically generate the training and testing phrase pairs for the classifier. Then, we train a SVM classifier which can determine the phrase pairs are either parallel or non-parallel. The proposed approach is evaluated on the translation task of Chinese-English. Experimental results show that the precision of the classifier on test sets is above 70% and the accuracy is above 98% The quality of the extracted data is also evaluated by measuring the impact on the performance of a state-of-the-art SMT system, which is built with a small parallel corpus. It shows better results over the baseline system.

Keywords :

language translation; natural language processing; pattern classification; performance evaluation; support vector machines; Chinese-English translation; SMT; SVM classifier; bilingual resources; classification task; comparable corpora; language pairs; parallel corpora; parallel phrases; performance evaluation; phrase extraction; statistical machine translation model; testing phrase pair; training phrase pair; translation task; Computational linguistics; Data mining; Feature extraction; Support vector machines; Testing; Training; Training data; Statistical Machine Translation; Support Vector Machine; classification; comparable corpus;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Asian Language Processing (IALP), 2014 International Conference on

Conference_Location :

Kuching

Type :

conf

DOI :

10.1109/IALP.2014.6973501

Filename :

6973501

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=172529