• DocumentCode
    2593071
  • Title

    A comparative study on Thai word segmentation approaches

  • Author

    Haruechaiyasak, Choochart ; Kongyoung, Sarawoot ; Dailey, Matthew

  • Author_Institution
    Nat. Electron. & Comput. Technol. Center (NECTEC), Human Language Technol. Lab. (HLT), Pathumthani
  • Volume
    1
  • fYear
    2008
  • fDate
    14-17 May 2008
  • Firstpage
    125
  • Lastpage
    128
  • Abstract
    In this paper, we analyze and compare various approaches for Thai word segmentation. The word segmentation approaches could be classified into two distinct types, dictionary based (DCB) and machine learning based (MLB). The DCB approach relies on a set of terms for parsing and segmenting input texts. Whereas the MLB approach relies on a model trained from a corpus by using machine learning techniques. We compare between two algorithms from the DCB approach: longest-matching and maximal matching, and four algorithms from the MLB approach: Naive Bayes (NB), decision tree, support vector machine (SVM), and conditional random field (CRF). From the experimental results, the DCB approach yielded better performance than the NB, decision tree and SVM algorithms from the MLB approach. However, the best performance was obtained from the CRF algorithm with the precision and recall of 95.79% and 94.98%, respectively.
  • Keywords
    Bayes methods; decision trees; learning (artificial intelligence); natural language processing; support vector machines; Thai word segmentation; conditional random field; decision tree; dictionary based word segmentation; longest-matching algorithms; machine learning based word segmentation; maximal matching; naive Bayes; support vector machine; Decision trees; Dictionaries; Information management; Information retrieval; Laboratories; Machine learning; Machine learning algorithms; Natural languages; Niobium; Support vector machines; Word segmentation; dictionary-based; machine learning algorithms; morphological analysis; tokenization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology, 2008. ECTI-CON 2008. 5th International Conference on
  • Conference_Location
    Krabi
  • Print_ISBN
    978-1-4244-2101-5
  • Electronic_ISBN
    978-1-4244-2102-2
  • Type

    conf

  • DOI
    10.1109/ECTICON.2008.4600388
  • Filename
    4600388