DocumentCode
2593071
Title
A comparative study on Thai word segmentation approaches
Author
Haruechaiyasak, Choochart ; Kongyoung, Sarawoot ; Dailey, Matthew
Author_Institution
Nat. Electron. & Comput. Technol. Center (NECTEC), Human Language Technol. Lab. (HLT), Pathumthani
Volume
1
fYear
2008
fDate
14-17 May 2008
Firstpage
125
Lastpage
128
Abstract
In this paper, we analyze and compare various approaches for Thai word segmentation. The word segmentation approaches could be classified into two distinct types, dictionary based (DCB) and machine learning based (MLB). The DCB approach relies on a set of terms for parsing and segmenting input texts. Whereas the MLB approach relies on a model trained from a corpus by using machine learning techniques. We compare between two algorithms from the DCB approach: longest-matching and maximal matching, and four algorithms from the MLB approach: Naive Bayes (NB), decision tree, support vector machine (SVM), and conditional random field (CRF). From the experimental results, the DCB approach yielded better performance than the NB, decision tree and SVM algorithms from the MLB approach. However, the best performance was obtained from the CRF algorithm with the precision and recall of 95.79% and 94.98%, respectively.
Keywords
Bayes methods; decision trees; learning (artificial intelligence); natural language processing; support vector machines; Thai word segmentation; conditional random field; decision tree; dictionary based word segmentation; longest-matching algorithms; machine learning based word segmentation; maximal matching; naive Bayes; support vector machine; Decision trees; Dictionaries; Information management; Information retrieval; Laboratories; Machine learning; Machine learning algorithms; Natural languages; Niobium; Support vector machines; Word segmentation; dictionary-based; machine learning algorithms; morphological analysis; tokenization;
fLanguage
English
Publisher
ieee
Conference_Titel
Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology, 2008. ECTI-CON 2008. 5th International Conference on
Conference_Location
Krabi
Print_ISBN
978-1-4244-2101-5
Electronic_ISBN
978-1-4244-2102-2
Type
conf
DOI
10.1109/ECTICON.2008.4600388
Filename
4600388
Link To Document