DocumentCode :
3301712
Title :
Text Classification Based on Transfer Learning and Self-Training
Author :
Zheng, Yabin ; Teng, Shaohua ; Liu, Zhiyuan ; Sun, Maosong
Author_Institution :
Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing
Volume :
3
fYear :
2008
fDate :
18-20 Oct. 2008
Firstpage :
363
Lastpage :
367
Abstract :
Traditional text classification methods make a basic assumption: the training and test set are homologous, while this naive assumption may not hold in the real world, especially in the Web environment. Documents on the Web change from time to time, pre-trained model may be out of date when applied to new emerging documents. However some information of training set is nonetheless useful. In this paper we proposed a novel method to discover the constant common knowledge in both training and test set by transfer learning, then a model is built based on this knowledge to fit the distribution in test set. The model is reinforced iteratively by adding most confident instances in unlabeled test set to training set until convergence, which is a self-training process, preliminary experiment shows that our method achieves an approximately 8.92% improvement as compared to the standard supervised-learning method.
Keywords :
learning (artificial intelligence); text analysis; convergence; self-training; supervised learning; text classification; training set; transfer learning; Automatic testing; Computer science; Convergence; Information science; Intelligent systems; Machine learning; Sun; Support vector machines; Text categorization; Text mining; Self-Training; Text Classification; Transfer Learning;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Natural Computation, 2008. ICNC '08. Fourth International Conference on
Conference_Location :
Jinan
Print_ISBN :
978-0-7695-3304-9
Type :
conf
DOI :
10.1109/ICNC.2008.498
Filename :
4667162
Link To Document :
بازگشت