• DocumentCode
    2234050
  • Title

    Analysis of Myanmar Word boundary and segmentation by using Statistical Approach

  • Author

    Aye Myat Mon ; Thein, Myint Myint ; Htay, Su Su ; Phyue, Soe Lai ; Win, Thinn Thinn

  • Author_Institution
    Univ. of Comput. Studies, Mandalay, Myanmar
  • Volume
    5
  • fYear
    2010
  • fDate
    20-22 Aug. 2010
  • Abstract
    This paper proposed a unified approach for Myanmar Word analysis using Finite State Automata (FSA), Rule Based Heuristic Approach and Statistical Approach. Myanmar has no inter-word space and it make the tokenizing task difficulties. Therefore, to recognize the word, we implement with FSA. Segmentation is a major problem because of no delimiter. If there were errors in segmentation, this will cause subsequence failure in further NLP processes. Segmentation is also an essential preprocessing task for Natural Language Processing, such as Machine Translation, Information Retrieval etc. In this system, the Rule Based Heuristic Approach and Statistical Approach are used with corpus based dictionary. Evaluation results showed that the method is very effective for the Myanmar language.
  • Keywords
    finite state machines; natural language processing; statistical analysis; word processing; Myanmar language; corpus based dictionary; finite state automata; natural language processing; rule based heuristic approach; statistical approach; word boundary; word segmentation; Entropy; Merging; FSA; Natural Language Processing; Segmentation; Statistical approach; Syllable Merging;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Advanced Computer Theory and Engineering (ICACTE), 2010 3rd International Conference on
  • Conference_Location
    Chengdu
  • ISSN
    2154-7491
  • Print_ISBN
    978-1-4244-6539-2
  • Type

    conf

  • DOI
    10.1109/ICACTE.2010.5579805
  • Filename
    5579805