• DocumentCode
    1909482
  • Title

    The Comparative Research on the Segmentation Strategies of Tibetan Bounded-Variant Forms

  • Author

    Congjun Long ; Caijun Kang ; Di Jiang

  • Author_Institution
    Nat. Languages Resource Monitoring & Res. Center, Minzu Univ. of China, Beijing, China
  • fYear
    2013
  • fDate
    17-19 Aug. 2013
  • Firstpage
    243
  • Lastpage
    246
  • Abstract
    The segmentation of Tibetan bounded-variant forms (TBVFS) is one of the most foundational tasks in text processing and the segmenting results directly influence the word segmentation, portaging, syntactic parsing and the Named Entity Extraction and so on. At present, the segmenting results are unsatisfactory and cannot be applied in practice. In this article, authors firstly describe the features of TBVFS, their distributions and then test the segmenting results by using two different segmentation strategies and conclude that Statistics-based methods for morpheme position tagging is better than Rule-based methods. If some rules are used to adjust a part of mistaken segmentations in the post processing, this kind of segmentation problem can be resolved.
  • Keywords
    natural language processing; statistical analysis; text analysis; TBVFS segmentation strategies; Tibetan bounded-variant form segmentation strategies; morpheme position tagging; named entity extraction; portaging; statistics-based methods; syntactic parsing; text processing; word segmentation; Accuracy; Computational linguistics; Context; Educational institutions; Electronic mail; Tagging; Training; Segmentation Strategies; Tibetan; bounded-variant forms;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Asian Language Processing (IALP), 2013 International Conference on
  • Conference_Location
    Urumqi
  • Type

    conf

  • DOI
    10.1109/IALP.2013.75
  • Filename
    6646046