Title :
Improving sparsely labeled text classification with data editing
Author :
Zhang, Xue ; Zhao, Dong-yan ; Xiao, Wang-xin
Author_Institution :
Institute of Computer Science & Engineering, Peking University, Beijing 100871, China
Abstract :
In this paper, an active semi-supervised framework combining with data editing is proposed to improve sparsely labeled text classification. It integrates semi-supervised learning with active learning, and fully utilizes the advantage of active learning by fusing it with a data editing technique. The algorithm works in an iterative fashion in which the steps of self-labeling, active labeling and editing are iterated alternatively. Active learning and data editing techniques are designed to cope with the training data bias and sparsity. According to our knowledge, the fusion of active learning with data editing technique to eliminate self-labeled noise is novel. Extensive experimental study on several real-world data sets shows the encouraging results of the proposed text classification framework for sparsely labeled text classification compared with several state-of-the-art methods.
Keywords :
Algorithm design and analysis; Classification algorithms; Labeling; Nearest neighbor searches; Support vector machines; Text categorization; Training data; active learning; data editing; semi-supervised learning; sparsely labeled text classification;
Conference_Titel :
Information Science and Engineering (ICISE), 2010 2nd International Conference on
Conference_Location :
Hangzhou, China
Print_ISBN :
978-1-4244-7616-9
DOI :
10.1109/ICISE.2010.5690328