DocumentCode
573565
Title
State-of-the-art English to Persian Statistical Machine Translation system
Author
Mansouri, Amin ; Faili, Heshaam
Author_Institution
Dept. of Electr. & Comput. Eng., Univ. of Tehran, Tehran, Iran
fYear
2012
fDate
2-3 May 2012
Firstpage
174
Lastpage
179
Abstract
Comparison of several kinds of English-Persian Statistical Machine Translation systems is reported in this paper. A large parallel corpus containing about 6 million tokens on each side has been developed for training the proposed SMT system. In development of the parallel corpus, a noisy filtering system based on MaxEnt classifier has been innovated to distinguish between correct and incorrect sentence pairs. By using the generated parallel corpus, a variety of SMT systems on English to Persian languages has been developed. Several variations on SMT, such as hybrid MT or statistical post editing MT has been proposed in this paper. The whole systems were tested on two different types of test set, one extracted randomly from parallel corpus and the other containing formal English sentences extracted from English learning book. The results shows hybrid system of SMT augmented by a rule based detection of English phrasal verb and Persian compound verb improves the baseline significantly. Also, state-of-the-art results on English-Persian translation are obtained by Verb-aware SMT with respect to BLEU measure.
Keywords
knowledge based systems; language translation; natural language processing; pattern classification; statistical analysis; BLEU measure; English learning book; English phrasal verb; English-Persian statistical machine translation system; MaxEnt classifier; Persian compound verb; SMT system; hybrid MT system; noisy filtering system; parallel corpus; rule based detection; sentence pairs; statistical post editing MT system; verb-aware SMT system; Compounds; Feature extraction; Filtering; Google; Noise measurement; Training; Hybrid Machine Translation; MaxEnt Classifier; Parallel Corpus; Statistical Machine Translation;
fLanguage
English
Publisher
ieee
Conference_Titel
Artificial Intelligence and Signal Processing (AISP), 2012 16th CSI International Symposium on
Conference_Location
Shiraz, Fars
Print_ISBN
978-1-4673-1478-7
Type
conf
DOI
10.1109/AISP.2012.6313739
Filename
6313739
Link To Document