DocumentCode :
2129583
Title :
One-Class Classification of Text Streams with Concept Drift
Author :
Zhang, Yang ; Li, Xue ; Orlowska, Maria
Author_Institution :
Univ. of Queensland, Brisbane, QLD
fYear :
2008
fDate :
15-19 Dec. 2008
Firstpage :
116
Lastpage :
125
Abstract :
Research on streaming data classification has been mostly based on the assumption that data can be fully labelled. However, this is impractical. Firstly it is impossible to make a complete labelling before all data has arrived. Secondly it is generally very expensive to obtain fully labelled data by using man power. Thirdly user interests may change with time so the labels issued earlier may be inconsistent with the labels issued later - this represents concept drift. In this paper, we consider the problem of one-class classification on text stream with respect to concept drift where a large volume of documents arrives at a high speed and with change of user interests and data distribution. In this case, only a small number of positively labelled documents is available for training. We propose a stacking style ensemble-based approach and have compared it to all other window-based approaches, such as single window, fixed window, and full memory approaches. Our experiment results demonstrate that the proposed ensemble approach outperforms all other approaches.
Keywords :
text analysis; data distribution; one-class classification; positively labelled documents; stacking style ensemble-based approach; streaming data classification; text streams; window-based approach; Conferences; Current measurement; Data mining; Feedback; Information retrieval; Information technology; Labeling; Natural languages; Stacking; Text categorization; Concept Drift; One-class Classification; Text Stream;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining Workshops, 2008. ICDMW '08. IEEE International Conference on
Conference_Location :
Pisa
Print_ISBN :
978-0-7695-3503-6
Electronic_ISBN :
978-0-7695-3503-6
Type :
conf
DOI :
10.1109/ICDMW.2008.54
Filename :
4733929
Link To Document :
بازگشت