DocumentCode
2193604
Title
Vote-Based LELC for Positive and Unlabeled Textual Data Streams
Author
Liu, Bo ; Xiao, Yanshan ; Cao, Longbing ; Yu, Philip S.
Author_Institution
Centre for Quantum Comput. & Intell. Syst., Univ. of Technol., Sydney, NSW, Australia
fYear
2010
fDate
13-13 Dec. 2010
Firstpage
951
Lastpage
958
Abstract
In this paper, we extend LELC (PU Learning by Extracting Likely Positive and Negative Micro-Clusters) method to cope with positive and unlabeled data streams. Our developed approach, which is called vote-based LELC, works in three steps. In the first step, we extract representative documents from unlabeled data and assign a vote score to each document. The assigned vote score reflects the degree of belongingness of an example towards its corresponding class. In the second step, the extracted representative examples, together with their vote scores, are incorporated into a learning phase to build an SVM-based classifier. In the third step, we propose the usage of an ensemble classifier to cope with concept drift involved in the textual data stream environment. Our developed approach aims at improving the performance of LELC by rendering examples to contribute differently to the construction of the classifier according to their vote scores. Extensive experiments on textual data streams have demonstrated that vote-based LELC outperforms the original LELC method.
Keywords
data structures; pattern classification; support vector machines; text analysis; LELC; SVM-based classifier; document extraction; documents representation; learning; learning by extracting likely positive and negative micro-cluster; unlabeled textual data streams; vote-based LELC; Data Streams; Positive and Unlabeled Learning;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Mining Workshops (ICDMW), 2010 IEEE International Conference on
Conference_Location
Sydney, NSW
Print_ISBN
978-1-4244-9244-2
Electronic_ISBN
978-0-7695-4257-7
Type
conf
DOI
10.1109/ICDMW.2010.201
Filename
5693398
Link To Document