DocumentCode :
128334
Title :
Improved Named Entity Tagset for Punjabi Language
Author :
Kaur, Amardeep ; Josan, Gurpreet Singh
Author_Institution :
Comput. Sci. Eng. Dept., Punjabi Univ., Paitala, India
fYear :
2014
fDate :
6-8 March 2014
Firstpage :
1
Lastpage :
5
Abstract :
Annotated corpus plays an important role in developing machine learning based Named Entity Recognition system. For creating an annotated corpus, it is important to decide in advance the Named Entity Tagset to be used. A Named Entity Tagset is defined as a collection of tags or labels, in the form of a scheme, indicating the named entity class of a word to which it belongs in the text. In this paper we have proposed an improved Named Entity Tagset of 14 tags for the task of Named Entity Recognition in Punjabi Language. This improvement was realized from the challenges faced during annotation process in our previous research work with 12 tags. Apart from this we have discussed the importance and issues related to defining a Named Entity Tagset and annotation guidelines. We have also discussed various global tagsets found in Literature. We have referred Extended Named Entity Hierarchy for improving our current tagset.
Keywords :
learning (artificial intelligence); natural language processing; Punjabi language; annotated corpus; annotation guidelines; extended named entity hierarchy; machine learning based named entity recognition system; named entity tagset; Conferences; Context; Geology; Guidelines; Natural language processing; Organizations; Tagging; Named Entity Recognition; Named Entity Tagset; Punjabi; Tagset Design Issues;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Engineering and Computational Sciences (RAECS), 2014 Recent Advances in
Conference_Location :
Chandigarh
Print_ISBN :
978-1-4799-2290-1
Type :
conf
DOI :
10.1109/RAECS.2014.6799638
Filename :
6799638
Link To Document :
بازگشت