DocumentCode
566902
Title
A novel kernel for text categorization
Author
Zhang, Lujiang ; Hu, Xiaohui
Author_Institution
Sch. of Autom. Sci. & Electr. Eng., Beijing Univ. of Aeronaut. & Astronaut., Beijing, China
Volume
1
fYear
2012
fDate
25-27 May 2012
Firstpage
186
Lastpage
190
Abstract
In this paper we proposed a novel kernel for text categorization. This kernel is an inner product in the feature space generated by all word combinations of specified length. A word combination is a collection of different words co-occurring in the same sentence. The word combination of length k is weighted by the k-th root of the product of the inverse document frequencies (IDF) of its words. A computationally simple and efficient algorithm was proposed to calculate this kernel. We conducted experiments on the 20 Newsgroups dataset. This kernel achieves better performance than the classical word kernel and word-sequence kernel. We also assessed the impact of word combination length on performance.
Keywords
feature extraction; pattern classification; support vector machines; text analysis; Newsgroups dataset; classical word kernel; feature space generation; inverse document frequency; sentence; support vector machine; text categorization kernel; text classification; word combination length; word cooccurrence; word-sequence kernel; Educational institutions; Indexes; Kernel; Machine learning; Semantics; Support vector machines; Text categorization; kernel methods; support vector machine; text classification; word-combination kernel;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer Science and Automation Engineering (CSAE), 2012 IEEE International Conference on
Conference_Location
Zhangjiajie
Print_ISBN
978-1-4673-0088-9
Type
conf
DOI
10.1109/CSAE.2012.6272576
Filename
6272576
Link To Document