• DocumentCode
    2347463
  • Title

    Generating english-persian parallel corpus using an automatic anchor finding sentence aligner

  • Author

    Yazdchi, Meisam Vosoughpour ; Faili, Heshaam

  • Author_Institution
    Sch. of Electr. & Comput. Eng., Univ. of Tehran, Tehran, Iran
  • fYear
    2010
  • fDate
    21-23 Aug. 2010
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    The more we can enlarge a parallel bilingual corpus, the more we have made it effective and powerful. Providing such corpora demands special efforts both in seeking for as much already translated texts as possible and also in designing appropriate sentence alignment algorithms with as less time complexity as possible. In this paper, we propose algorithms for sentence aligning of two Persian-English texts in linear time complexity and with a surprisingly high accuracy. This linear time-complexity is achieved through our new language-independent anchor finding algorithm which enables us to align as a big parallel text as a whole book in a single attempt and with a high accuracy. As far as we know, this project is the first automatic construction of an English-Persian parallel sentence-level corpus.
  • Keywords
    computational complexity; natural language processing; text analysis; English-Persian parallel corpus; automatic anchor finding sentence aligner; linear time complexity; sentence alignment algorithms;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Natural Language Processing and Knowledge Engineering (NLP-KE), 2010 International Conference on
  • Conference_Location
    Beijing
  • Print_ISBN
    978-1-4244-6896-6
  • Type

    conf

  • DOI
    10.1109/NLPKE.2010.5587769
  • Filename
    5587769