• شماره ركورد كنفرانس
    3540
  • عنوان مقاله

    Extracting Parallel Fragments from Comparable Documents Using a Feature-Based Method

  • Author/Authors
    Z Rahimi Department of Computer Engineering - Amirkabir University of Technology, Tehran, Iran , M.H Samani Department of Computer Engineering - Amirkabir University of Technology, Tehran, Iran , S Khadivi Department of Computer Engineering - Amirkabir University of Technology, Tehran, Iran
  • كليدواژه
    Comparable Corpora , Parallel Fragments , Machine Translation
  • سال انتشار
    1392
  • عنوان كنفرانس
    همايش بين المللي هوش مصنوعي و پردازش سيگنال
  • زبان مدرك
    لاتين
  • چكيده لاتين
    Here, a novel method for extracting parallel sub-sentential fragments from comparable corpora is presented. The proposed method aims to extract bilingual sentence fragments from noisy sentence pairs. We define a similarity measure between bilingual sentence fragments which is actually a linear combination of some new features. The features are such as fragment length, LLR score, alignment path specifications in the block and translation coverage fraction. This method enables us to extract useful machine translation training data from comparable corpora that contain no parallel sentence pairs. Evaluations indicate that proposed method is very efficient and not only outperforms the existing similar systems in the measure of precision and recall; it also helps to improve the performance of a statistical machine translation system.
  • كشور
    ايران
  • تعداد صفحه 2
    8
  • از صفحه
    1
  • تا صفحه
    8