مرکز منطقه ای اطلاع رساني علوم و فناوري - Tibetan Word Segmentation Based on Word-Position Tagging

DocumentCode :

1909456

Title :

Tibetan Word Segmentation Based on Word-Position Tagging

Author :

Caijun Kang ; Di Jiang ; Congjun Long

Author_Institution :

Humanities & Commun. Coll., Shanghai Normal Univ., Shanghai, China

fYear :

2013

fDate :

17-19 Aug. 2013

Firstpage :

239

Lastpage :

242

Abstract :

The best advantage of Tibetan word segmentation based on word-position is to reduce segmentation errors for unknown words. In this article authors upgrade usual 4-tag set to 6-tag set to fit in with the features of Tibetan characters, using CRF as tagging model to train and test corpus data, then building post processing modules to revise the result data. The experimental result shows that this method achieves a good performance and deserves further study, including expanding the corpus and optimizing the tag set and feature templates.

Keywords :

natural language processing; statistical analysis; CRF; Tibetan characters; Tibetan word segmentation; conditional random fields; word segmentation error reduction; word-position tagging; Accuracy; Computational linguistics; Dictionaries; Educational institutions; Hidden Markov models; Tagging; Training; CRF; Tibetan; tagging model; word-position;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Asian Language Processing (IALP), 2013 International Conference on

Conference_Location :

Urumqi

Type :

conf

DOI :

10.1109/IALP.2013.74

Filename :

6646045

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1909456