DocumentCode :
1909366
Title :
Tibetan Text Classification Based on the Feature of Position Weight
Author :
Hui Cao ; Huiqiang Jia
Author_Institution :
Chinese Nat. Inst. of Inf. Technol., Northwest Univ. for Nat., Lanzhou, China
fYear :
2013
fDate :
17-19 Aug. 2013
Firstpage :
220
Lastpage :
223
Abstract :
Based on the study of Tibetan characters and grammar, this paper has done research on Tibetan in the text categorization weight algorithm based on the vector space model. Comprehensively considering the position information of Tibetan which presented in the text, the paper has proposed an improved TF-IDF weighting algorithm. In the process, it has adopted χ2 (CHI) statistical methods for features on the Tibetan word document extraction and used the cosine method in Tibetan text similarity calculation to distinguish between similar documents in Tibetan. The Tibetan text classification algorithm with linear separable support vector machine classification of Tibetan texts, and finally compared the TF-IDF algorithm with the improved TF-IDF algorithm in the effects of the Tibetan text classification. Finally, it shows that the improved TF-IDF algorithm has better classification effect.
Keywords :
natural language processing; pattern classification; statistical analysis; support vector machines; text analysis; χ2 statistical method; CHI statistical method; TF-IDF weighting algorithm; Tibetan characters; Tibetan grammar; Tibetan position information; Tibetan text classification algorithm; Tibetan text similarity calculation; Tibetan word document extraction; cosine method; linear separable support vector machine classification; position weight feature; text categorization weight algorithm; vector space model; Classification algorithms; Feature extraction; Indexes; Support vector machine classification; Text categorization; Vectors; Feature words; Position weight; Support Vector Machine; Text classification; Tibetan;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Asian Language Processing (IALP), 2013 International Conference on
Conference_Location :
Urumqi
Type :
conf
DOI :
10.1109/IALP.2013.63
Filename :
6646041
Link To Document :
بازگشت