DocumentCode
2347463
Title
Generating english-persian parallel corpus using an automatic anchor finding sentence aligner
Author
Yazdchi, Meisam Vosoughpour ; Faili, Heshaam
Author_Institution
Sch. of Electr. & Comput. Eng., Univ. of Tehran, Tehran, Iran
fYear
2010
fDate
21-23 Aug. 2010
Firstpage
1
Lastpage
6
Abstract
The more we can enlarge a parallel bilingual corpus, the more we have made it effective and powerful. Providing such corpora demands special efforts both in seeking for as much already translated texts as possible and also in designing appropriate sentence alignment algorithms with as less time complexity as possible. In this paper, we propose algorithms for sentence aligning of two Persian-English texts in linear time complexity and with a surprisingly high accuracy. This linear time-complexity is achieved through our new language-independent anchor finding algorithm which enables us to align as a big parallel text as a whole book in a single attempt and with a high accuracy. As far as we know, this project is the first automatic construction of an English-Persian parallel sentence-level corpus.
Keywords
computational complexity; natural language processing; text analysis; English-Persian parallel corpus; automatic anchor finding sentence aligner; linear time complexity; sentence alignment algorithms;
fLanguage
English
Publisher
ieee
Conference_Titel
Natural Language Processing and Knowledge Engineering (NLP-KE), 2010 International Conference on
Conference_Location
Beijing
Print_ISBN
978-1-4244-6896-6
Type
conf
DOI
10.1109/NLPKE.2010.5587769
Filename
5587769
Link To Document