DocumentCode :
3096226
Title :
A hybrid algorithm for text classification based on rough set
Author :
Deng, Weibin
Author_Institution :
Key Lab. of Electron. Commerce & Modern Logistics, Chongqing Univ. of Posts & Telecommun., Chongqing, China
Volume :
1
fYear :
2011
fDate :
11-13 March 2011
Firstpage :
406
Lastpage :
410
Abstract :
Nowadays, text classification has been one of the key subjects in intelligent information processing. Owing to the complex features of natural language, the feature space dimensions will be particularly high. How to improve the accuracy of text classification is an important and hard problem. As rough set is a useful tool to deal with uncertain information, a hybrid algorithm for text classification based on rough set is proposed in this paper. A set can be divided into positive region, negative region and boundary region by rough set. So, we can divide the documents into certain classes and doubt set using rough set firstly. In addition, based on the attributes´ importance degree theory in the informational view of rough set, the documents of the doubt set are classified further. We find that most of the documents can be classified with high accuracy in the first stage. Furthermore, the conditional independence assumption of naïve Bayes is relaxed to some extent in the second stage. Simulation results on general data sets comparing with naïve Byes, supported vector machine, and k-nearest neighbor illustrate the efficiency of this algorithm.
Keywords :
natural languages; pattern classification; rough set theory; hybrid algorithm; intelligent information processing; natural language; rough set theory; supported vector machine; text classification; Accuracy; Algorithm design and analysis; Classification algorithms; Feature extraction; Niobium; Support vector machines; Text categorization; KNN; SVM; rough set; text classification; weighted naïve Bayes;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Research and Development (ICCRD), 2011 3rd International Conference on
Conference_Location :
Shanghai
Print_ISBN :
978-1-61284-839-6
Type :
conf
DOI :
10.1109/ICCRD.2011.5764046
Filename :
5764046
Link To Document :
بازگشت