DocumentCode :
1909482
Title :
The Comparative Research on the Segmentation Strategies of Tibetan Bounded-Variant Forms
Author :
Congjun Long ; Caijun Kang ; Di Jiang
Author_Institution :
Nat. Languages Resource Monitoring & Res. Center, Minzu Univ. of China, Beijing, China
fYear :
2013
fDate :
17-19 Aug. 2013
Firstpage :
243
Lastpage :
246
Abstract :
The segmentation of Tibetan bounded-variant forms (TBVFS) is one of the most foundational tasks in text processing and the segmenting results directly influence the word segmentation, portaging, syntactic parsing and the Named Entity Extraction and so on. At present, the segmenting results are unsatisfactory and cannot be applied in practice. In this article, authors firstly describe the features of TBVFS, their distributions and then test the segmenting results by using two different segmentation strategies and conclude that Statistics-based methods for morpheme position tagging is better than Rule-based methods. If some rules are used to adjust a part of mistaken segmentations in the post processing, this kind of segmentation problem can be resolved.
Keywords :
natural language processing; statistical analysis; text analysis; TBVFS segmentation strategies; Tibetan bounded-variant form segmentation strategies; morpheme position tagging; named entity extraction; portaging; statistics-based methods; syntactic parsing; text processing; word segmentation; Accuracy; Computational linguistics; Context; Educational institutions; Electronic mail; Tagging; Training; Segmentation Strategies; Tibetan; bounded-variant forms;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Asian Language Processing (IALP), 2013 International Conference on
Conference_Location :
Urumqi
Type :
conf
DOI :
10.1109/IALP.2013.75
Filename :
6646046
Link To Document :
بازگشت