DocumentCode :
3490213
Title :
A Hybrid Approach for Word Alignment in English-Hindi Parallel Corpora with Scarce Resources
Author :
Srivastava, Jaideep ; Sanyal, Subrata
Author_Institution :
Inf. Technol. Indian Inst. of Inf. Technol., Allahabad, India
fYear :
2012
fDate :
13-15 Nov. 2012
Firstpage :
185
Lastpage :
188
Abstract :
This paper presents an approach which improves the performance of the word alignment with scarce resources for English-Hindi language pair. We obtain an improvement in the performance of IBM Model 1-2 algorithm by applying part of speech (POS) tag prior to the computation of word alignment probability. This paper demonstrates the increase of precision, recall and F-measure by approximately 15%, 11%, 14% respectively and reduction in Alignment Error Rate (AER) by approximately 14% with IBM Model 1. Similarly it shows an increase of precision, recall and F-measure by approximately 6%, 6% and 6% respectively and reduction in Alignment Error Rate (AER) by approximately 6% with IBM Model 2. Experiments of this paper are based on TDIL corpus.
Keywords :
language translation; natural language processing; probability; statistical analysis; AER; English-Hindi language pair; English-Hindi parallel corpora; IBM model 1-2 algorithm; POS; TDIL corpus; alignment error rate; increase F-measure; increase precision; increase recall; natural language processing; part-of-speech tag; performance improvement; scarce resources; statistical machine translation; word alignment probability; Computational linguistics; Computational modeling; Error analysis; Hidden Markov models; Information technology; Tagging; Training; POS tagger; Scarce resources; Statistical Machine Translation; Word alignment;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Asian Language Processing (IALP), 2012 International Conference on
Conference_Location :
Hanoi
Print_ISBN :
978-1-4673-6113-2
Electronic_ISBN :
978-0-7695-4886-9
Type :
conf
DOI :
10.1109/IALP.2012.13
Filename :
6473727
Link To Document :
بازگشت