Title :
A novel kernel for text categorization
Author :
Zhang, Lujiang ; Hu, Xiaohui
Author_Institution :
Sch. of Autom. Sci. & Electr. Eng., Beijing Univ. of Aeronaut. & Astronaut., Beijing, China
Abstract :
In this paper we proposed a novel kernel for text categorization. This kernel is an inner product in the feature space generated by all word combinations of specified length. A word combination is a collection of different words co-occurring in the same sentence. The word combination of length k is weighted by the k-th root of the product of the inverse document frequencies (IDF) of its words. A computationally simple and efficient algorithm was proposed to calculate this kernel. We conducted experiments on the 20 Newsgroups dataset. This kernel achieves better performance than the classical word kernel and word-sequence kernel. We also assessed the impact of word combination length on performance.
Keywords :
feature extraction; pattern classification; support vector machines; text analysis; Newsgroups dataset; classical word kernel; feature space generation; inverse document frequency; sentence; support vector machine; text categorization kernel; text classification; word combination length; word cooccurrence; word-sequence kernel; Educational institutions; Indexes; Kernel; Machine learning; Semantics; Support vector machines; Text categorization; kernel methods; support vector machine; text classification; word-combination kernel;
Conference_Titel :
Computer Science and Automation Engineering (CSAE), 2012 IEEE International Conference on
Conference_Location :
Zhangjiajie
Print_ISBN :
978-1-4673-0088-9
DOI :
10.1109/CSAE.2012.6272576