DocumentCode
1909482
Title
The Comparative Research on the Segmentation Strategies of Tibetan Bounded-Variant Forms
Author
Congjun Long ; Caijun Kang ; Di Jiang
Author_Institution
Nat. Languages Resource Monitoring & Res. Center, Minzu Univ. of China, Beijing, China
fYear
2013
fDate
17-19 Aug. 2013
Firstpage
243
Lastpage
246
Abstract
The segmentation of Tibetan bounded-variant forms (TBVFS) is one of the most foundational tasks in text processing and the segmenting results directly influence the word segmentation, portaging, syntactic parsing and the Named Entity Extraction and so on. At present, the segmenting results are unsatisfactory and cannot be applied in practice. In this article, authors firstly describe the features of TBVFS, their distributions and then test the segmenting results by using two different segmentation strategies and conclude that Statistics-based methods for morpheme position tagging is better than Rule-based methods. If some rules are used to adjust a part of mistaken segmentations in the post processing, this kind of segmentation problem can be resolved.
Keywords
natural language processing; statistical analysis; text analysis; TBVFS segmentation strategies; Tibetan bounded-variant form segmentation strategies; morpheme position tagging; named entity extraction; portaging; statistics-based methods; syntactic parsing; text processing; word segmentation; Accuracy; Computational linguistics; Context; Educational institutions; Electronic mail; Tagging; Training; Segmentation Strategies; Tibetan; bounded-variant forms;
fLanguage
English
Publisher
ieee
Conference_Titel
Asian Language Processing (IALP), 2013 International Conference on
Conference_Location
Urumqi
Type
conf
DOI
10.1109/IALP.2013.75
Filename
6646046
Link To Document