• DocumentCode
    130979
  • Title

    A novel unsupervised non-iterative approach to word segmentation

  • Author

    Hanshi Wang ; Haining Xu ; Lizhen Liu ; Wei Song ; Jingli Lu

  • Author_Institution
    Inf. & Eng. Coll., Capital Normal Univ., Beijing, China
  • fYear
    2014
  • fDate
    27-29 June 2014
  • Firstpage
    824
  • Lastpage
    827
  • Abstract
    Word segmentation is the crucial first step of natural language understanding (NLU) for Chinese corpora. Our early paper presented "Evaluation Selection and Adjustment" (ESA), an unsupervised approach to word segmentation. In this article, we present a novel non-iterative variation of ESA, comparing it with other similar methods and we get better performance. Besides that, we analyze "Balancing" and compare it with "Standardizing", another algorithm, to solve the problem of "how to evaluate the words of different lengths and compare them with each other for statistical methods of word segmentation". The experimental results show that "Balancing" is more effective than "Standardizing" for the task.
  • Keywords
    feature selection; natural language processing; text analysis; word processing; Chinese corpora; ESA; NLU; evaluation selection and adjustment; natural language understanding; unsupervised noniterative approach; word segmentation; non-iterative; standardizing; unsupervised; word segmentation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Software Engineering and Service Science (ICSESS), 2014 5th IEEE International Conference on
  • Conference_Location
    Beijing
  • ISSN
    2327-0586
  • Print_ISBN
    978-1-4799-3278-8
  • Type

    conf

  • DOI
    10.1109/ICSESS.2014.6933693
  • Filename
    6933693