Title :
Named entity recognition for Sinhala language
Author :
Dahanayaka, J.K. ; Weerasinghe, A.R.
Author_Institution :
Sch. of Comput., Univ. of Colombo, Colombo, Sri Lanka
Abstract :
Named Entity Recognition (NER) is one of the major subtasks that have to be solved in most Natural Language Processing related tasks. However it is very much challenging to build a proper Named Entity Recognition system especially for Indic languages such as Sinhala because of the language features it inherits such as the absence of capitalization. Since there has not been much previous work based on NER for Sinhala, the concept and the needed resources have to be built from scratch. This paper tries to find out the effectiveness of using data-driven techniques to detect Named Entities in Sinhala text. Conditional Random Fields (CRF) and Maximum Entropy (ME) model were applied to this task. It is found that the former outperformed the latter in all experiments. A CRF model is able to detect Sinhala Named Entities with a very high precision (91.64%) and reasonable recall (69.34%) rates.
Keywords :
maximum entropy methods; natural language processing; probability; text analysis; CRF; ME model; NER; Sinhala language; conditional random field; data-driven technique; maximum entropy model; named entity recognition; natural language processing; DH-HEMTs; Electronics packaging; Hafnium; IP networks; Conditional Random Fields; Maximum Entropy model; Named Entity; Named Entity Recognition; Natural Language Processing; Sinhala Language;
Conference_Titel :
Advances in ICT for Emerging Regions (ICTer), 2014 International Conference on
Print_ISBN :
978-1-4799-7731-4
DOI :
10.1109/ICTER.2014.7083904