DocumentCode :
2892675
Title :
Chinese Text Classification Using Key Characters String Kernel
Author :
Zheng, Shiqiang ; Yang, Yujiu ; Wu, Haiping ; Liu, Wenhuang
Author_Institution :
Grad. Sch. at Shenzhen, Tsinghua Univ., Shenzhen, China
fYear :
2009
fDate :
12-14 Oct. 2009
Firstpage :
113
Lastpage :
119
Abstract :
Most Chinese text classification methods are based on Chinese word segmentation and bag of words (BOW). The classification performance largely relies on the accuracy of segmentation. Unfortunately, perfect precision and disambiguation of segmentation cannot be reached. In order to solve this problem, a novel Chinese text classification method using string kernel is presented. String kernel computes the similarity of a pair of documents by comparing common substrings they have. Experimental results show that our method greatly enhances the classification on small training data sets. Although the performance of traditional string kernel is comparable to that of BOW methods on larger data set, the dimension of feature space is so high that the calculation process is very time-consuming. Our proposed key characters string kernel technique solves the efficiency and effectiveness problems. Experiments on larger data set show that SVM with key characters string kernel can achieve superior performance.
Keywords :
classification; natural language processing; support vector machines; text analysis; Chinese text classification; SVM; key characters string Kernel; support vector machine; Data mining; Databases; Frequency; Kernel; Learning systems; Natural languages; Support vector machines; Text categorization; Training data; Web sites; Sting Kernel; Support Vector Machines; text classification;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Semantics, Knowledge and Grid, 2009. SKG 2009. Fifth International Conference on
Conference_Location :
Zhuhai
Print_ISBN :
978-0-7695-3810-5
Type :
conf
DOI :
10.1109/SKG.2009.59
Filename :
5368026
Link To Document :
بازگشت