Title of article :
Rough set and ensemble learning based semi-supervised algorithm for text classification
Author/Authors :
Shi، نويسنده , , Lei and Ma، نويسنده , , Xinming and Xi، نويسنده , , Lei and Duan، نويسنده , , Qiguo and Zhao، نويسنده , , Jingying، نويسنده ,
Issue Information :
روزنامه با شماره پیاپی سال 2011
Abstract :
Text classification has received more and more attention due to the enormous growth of digital content available on-line. This paper investigates the design of two-class text classifiers using positive and unlabeled data only. The specialty of this problem is that there is no labeled negative example for learning, which makes traditional text classification techniques inapplicable. In this paper, a novel semi-supervised classification algorithm based on tolerance rough set and ensemble learning is proposed. Tolerance rough set theory is used to approximate concepts existed in documents and extract an initial set of negative example. Then, SVM, Rocchio and Naive Bayes algorithms are used as base classifiers to construct an ensemble classifier, which runs iteratively and exploits margins between positive and negative data to progressively improve the approximation of negative data. Thus, the class boundary eventually converges to the true boundary of the positive class in the feature space. An experimental evaluation of different methods is carried out on two common text corpora, i.e., the Reuters-21578 collection and the WebKB collection. The experimental results indicate that the proposed method achieves significant performance improvement.
Keywords :
Text classification , Rough set , Semi-supervised classification , Ensemble Learning
Journal title :
Expert Systems with Applications
Journal title :
Expert Systems with Applications