DocumentCode :
2348982
Title :
Marine literature categorization based on minimizing the labelled data
Author :
Zhang, Wei ; Wang, Qiuhong ; Deng, Ye ; Du, Ranran
Author_Institution :
Dept. of Comput. Sci. & Technol., Ocean Univ. of China, Qingdao, China
fYear :
2010
fDate :
21-23 Aug. 2010
Firstpage :
1
Lastpage :
6
Abstract :
In marine literature categorization, supervised machine learning method will take a lot of time for labelling the samples by hand. So we utilize Co-training method to decrease the quantities of labelled samples needed for training the classifier. In this paper, we only select features from the text details and add attribute labels to them. It can greatly boost the efficiency of text processing. For building up two views, we split features into two parts, each of which can form an independent view. One view is made up of the feature set of abstract, and the other is made up of the feature sets of title, keywords, creator and department. In experiments, the F1 value and error rate of the categorization system could reach about 0.863 and 14.26%.They are close to the performance of supervised classifier (0.902 and 9.13%), which is trained by more than 1500 labelled samples, however, the labelled samples used by Co-training categorization method to train the original classifier are only one positive sample and one negative sample. In addition we consider joining the idea of the active-learning in Co-training method.
Keywords :
information retrieval; learning (artificial intelligence); literature; pattern classification; text analysis; word processing; active-learning; cotraining categorization method; labelled data minimization; marine literature categorization; supervised classifier; supervised machine learning method; text processing; Manuals; Support vector machines; Variable speed drives; Co-training; Marine literature categorization; active-learning; two views;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Natural Language Processing and Knowledge Engineering (NLP-KE), 2010 International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4244-6896-6
Type :
conf
DOI :
10.1109/NLPKE.2010.5587847
Filename :
5587847
Link To Document :
بازگشت