DocumentCode :
2500908
Title :
Thai named entity recognition based on conditional random fields
Author :
Tirasaroj, Nutcha ; Aroonmanakun, Wirote
Author_Institution :
Dept. of Linguistics, Chulalongkorn Univ., Bangkok, Thailand
fYear :
2009
fDate :
20-22 Oct. 2009
Firstpage :
216
Lastpage :
220
Abstract :
This paper presents the Thai named entity recognition (NER) systems using Conditional Random Fields (CRFs). In the previous studies of Thai NER, there are not any systems using syllable-segmented data as an input but word-segmented one. Since the results of some researches on NER in other languages such as Chinese show that the systems based on character are better than those based on word, this study is also conducted to find out if the syllable-segmented input helps improve Thai NER. In order to compare the system getting word-segmented input to that getting syllable-segmented input, there will be two sets of features used in the systems in this study. The results of the experiment show that the systems do not perform well enough due to few features used. However, it reveals that the syllable-based system is slightly better than the word-based one. The corpus, training data preparation and system overview are also included in this paper.
Keywords :
data handling; natural language processing; random processes; Thai named entity recognition; conditional random field; natural language processing; syllable-segmented input; word-segmented data; Art; Data mining; Entropy; Graphical models; Labeling; Machine learning; Natural language processing; Natural languages; Training data;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Natural Language Processing, 2009. SNLP '09. Eighth International Symposium on
Conference_Location :
Bangkok
Print_ISBN :
978-1-4244-4138-9
Electronic_ISBN :
978-1-4244-4139-6
Type :
conf
DOI :
10.1109/SNLP.2009.5340913
Filename :
5340913
Link To Document :
بازگشت