Title :
Exploiting parallel corpus for automatic extraction of multilingual names: Transliteration perspective
Author :
Kundu, Bijoy ; Choudhury, S.K.
Author_Institution :
Language Technol., Centre for Dev. of Adv. Comput., Kolkata, India
Abstract :
This paper describes a novel approach for extraction of multilingual transliteration pairs from aligned parallel corpus. The proposed approach utilizes an encoding technique based on “Place and Manner of Articulation”. Jaccard Coefficient has been used to measure the distance between encoded source and target transliteration pairs. The proposed methodology has been employed for extraction of English-Bangla transliteration pairs and reported 94% accuracy which is quite encouraging when compared to Expectation Maximization based word alignment module that yields 59.42% accuracy on the same test data.
Keywords :
encoding; language translation; English-Bangla transliteration pairs extraction; Jaccard coefficient; Place and Manner of Articulation; automatic extraction; expectation maximization; maximization based word alignment module; multilingual names; multilingual transliteration pairs; parallel corpus; transliteration perspective; Accuracy; Computational linguistics; Computational modeling; Dictionaries; Encoding; Feature extraction; USA Councils; Machine Translation; Place and Manner of Articulation; Transliteration;
Conference_Titel :
India Conference (INDICON), 2012 Annual IEEE
Conference_Location :
Kochi
Print_ISBN :
978-1-4673-2270-6
DOI :
10.1109/INDCON.2012.6420690