DocumentCode :
2125627
Title :
Improving sparsely labeled text classification with data editing
Author :
Zhang, Xue ; Zhao, Dong-yan ; Xiao, Wang-xin
Author_Institution :
Institute of Computer Science & Engineering, Peking University, Beijing 100871, China
fYear :
2010
fDate :
4-6 Dec. 2010
Firstpage :
3774
Lastpage :
3777
Abstract :
In this paper, an active semi-supervised framework combining with data editing is proposed to improve sparsely labeled text classification. It integrates semi-supervised learning with active learning, and fully utilizes the advantage of active learning by fusing it with a data editing technique. The algorithm works in an iterative fashion in which the steps of self-labeling, active labeling and editing are iterated alternatively. Active learning and data editing techniques are designed to cope with the training data bias and sparsity. According to our knowledge, the fusion of active learning with data editing technique to eliminate self-labeled noise is novel. Extensive experimental study on several real-world data sets shows the encouraging results of the proposed text classification framework for sparsely labeled text classification compared with several state-of-the-art methods.
Keywords :
Algorithm design and analysis; Classification algorithms; Labeling; Nearest neighbor searches; Support vector machines; Text categorization; Training data; active learning; data editing; semi-supervised learning; sparsely labeled text classification;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Science and Engineering (ICISE), 2010 2nd International Conference on
Conference_Location :
Hangzhou, China
Print_ISBN :
978-1-4244-7616-9
Type :
conf
DOI :
10.1109/ICISE.2010.5690328
Filename :
5690328
Link To Document :
بازگشت