DocumentCode
3096226
Title
A hybrid algorithm for text classification based on rough set
Author
Deng, Weibin
Author_Institution
Key Lab. of Electron. Commerce & Modern Logistics, Chongqing Univ. of Posts & Telecommun., Chongqing, China
Volume
1
fYear
2011
fDate
11-13 March 2011
Firstpage
406
Lastpage
410
Abstract
Nowadays, text classification has been one of the key subjects in intelligent information processing. Owing to the complex features of natural language, the feature space dimensions will be particularly high. How to improve the accuracy of text classification is an important and hard problem. As rough set is a useful tool to deal with uncertain information, a hybrid algorithm for text classification based on rough set is proposed in this paper. A set can be divided into positive region, negative region and boundary region by rough set. So, we can divide the documents into certain classes and doubt set using rough set firstly. In addition, based on the attributes´ importance degree theory in the informational view of rough set, the documents of the doubt set are classified further. We find that most of the documents can be classified with high accuracy in the first stage. Furthermore, the conditional independence assumption of naïve Bayes is relaxed to some extent in the second stage. Simulation results on general data sets comparing with naïve Byes, supported vector machine, and k-nearest neighbor illustrate the efficiency of this algorithm.
Keywords
natural languages; pattern classification; rough set theory; hybrid algorithm; intelligent information processing; natural language; rough set theory; supported vector machine; text classification; Accuracy; Algorithm design and analysis; Classification algorithms; Feature extraction; Niobium; Support vector machines; Text categorization; KNN; SVM; rough set; text classification; weighted naïve Bayes;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer Research and Development (ICCRD), 2011 3rd International Conference on
Conference_Location
Shanghai
Print_ISBN
978-1-61284-839-6
Type
conf
DOI
10.1109/ICCRD.2011.5764046
Filename
5764046
Link To Document