Title :
A Word Sense Disambiguation Technique for Sinhala
Author :
Janindu Arukgoda;Vidudaya Bandara;Samiththa Bashani;Vijayindu Gamage;Daya Wimalasuriya
Author_Institution :
Dept. of Comput. Sci. &
Abstract :
Word sense disambiguation is the task of identifying the implied sense of a polysemous word in a given context. There have been many efforts on word sense disambiguation for English, but the amount of efforts for Sinhala is very little. This paper presents ongoing efforts on developing a rule based word sense disambiguation algorithm using the Sinhala WordNet developed at University of Moratuwa as a basis. This is the first attempt on building such an algorithm for Sinhala. For this task we have implemented the Simplified Lesk algorithm with our own modifications under the two assumptions ´one sense per collocation´ and ´one sense per discourse´. We define a window size around the target polysemous word and calculate the number of words in that window that overlap with each sense of the target polysemous word. Since there has not been many significant initiatives on natural language processing applications for Sinhala, critical resources such as functioning morphological analysis tools are not available, making accurate word sense disambiguation an even harder task. Using web articles as the data source, this system has attempted to disambiguate 10 instances of polysemous words and has been evaluated to achieve a precision of 63% and an F score 0.63.
Keywords :
"Context","Semantics","Natural language processing","Computational linguistics","Computers","Knowledge based systems","Machine learning algorithms"
Conference_Titel :
Artificial Intelligence with Applications in Engineering and Technology (ICAIET), 2014 4th International Conference on
DOI :
10.1109/ICAIET.2014.42