Extracting Parallel Fragments from Comparable Documents Using a Feature-Based Method

Author/Authors

Z Rahimi Department of Computer Engineering - Amirkabir University of Technology, Tehran, Iran , M.H Samani Department of Computer Engineering - Amirkabir University of Technology, Tehran, Iran , S Khadivi Department of Computer Engineering - Amirkabir University of Technology, Tehran, Iran

كليدواژه

Comparable Corpora , Parallel Fragments , Machine Translation

سال انتشار

1392

عنوان كنفرانس

همايش بين المللي هوش مصنوعي و پردازش سيگنال

زبان مدرك

لاتين

چكيده لاتين

Here, a novel method for extracting parallel sub-sentential fragments from comparable corpora is presented. The proposed method aims to extract bilingual sentence fragments from noisy sentence pairs. We define a similarity measure between bilingual sentence fragments which is actually a linear combination of some new features. The features are such as fragment length, LLR score, alignment path specifications in the block and translation coverage fraction. This method enables us to extract useful machine translation training data from comparable corpora that contain no parallel sentence pairs. Evaluations indicate that proposed method is very efficient and not only outperforms the existing similar systems in the measure of precision and recall; it also helps to improve the performance of a statistical machine translation system.

كشور

ايران

تعداد صفحه 2

از صفحه

تا صفحه

لينک به اين مدرک

https://search.isc.ac/dl/search/defaultta.aspx?DTC=36&DC=276816