DocumentCode :
3081287
Title :
Exploiting parallel corpus for automatic extraction of multilingual names: Transliteration perspective
Author :
Kundu, Bijoy ; Choudhury, S.K.
Author_Institution :
Language Technol., Centre for Dev. of Adv. Comput., Kolkata, India
fYear :
2012
fDate :
7-9 Dec. 2012
Firstpage :
608
Lastpage :
612
Abstract :
This paper describes a novel approach for extraction of multilingual transliteration pairs from aligned parallel corpus. The proposed approach utilizes an encoding technique based on “Place and Manner of Articulation”. Jaccard Coefficient has been used to measure the distance between encoded source and target transliteration pairs. The proposed methodology has been employed for extraction of English-Bangla transliteration pairs and reported 94% accuracy which is quite encouraging when compared to Expectation Maximization based word alignment module that yields 59.42% accuracy on the same test data.
Keywords :
encoding; language translation; English-Bangla transliteration pairs extraction; Jaccard coefficient; Place and Manner of Articulation; automatic extraction; expectation maximization; maximization based word alignment module; multilingual names; multilingual transliteration pairs; parallel corpus; transliteration perspective; Accuracy; Computational linguistics; Computational modeling; Dictionaries; Encoding; Feature extraction; USA Councils; Machine Translation; Place and Manner of Articulation; Transliteration;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
India Conference (INDICON), 2012 Annual IEEE
Conference_Location :
Kochi
Print_ISBN :
978-1-4673-2270-6
Type :
conf
DOI :
10.1109/INDCON.2012.6420690
Filename :
6420690
Link To Document :
بازگشت