Title :
The PerfSim Algorithm for Concept Drift Detection in Imbalanced Data
Author :
Antwi, D.K. ; Viktor, Herna L. ; Japkowicz, Nathalie
Author_Institution :
Sch. of Electr. Eng. & Comput. Sci., Univ. of Ottawa, Ottawa, ON, Canada
Abstract :
There is currently a surge of interest in adaptive learning algorithms for applications ranging from ozone level peak predictions, learning stock market indicators, and detecting smart phone usage patterns. In such scenarios, the detection of change (or drift) in the concept being learned is important to ensure that correct, timely and relevant models are constructed. In addition, such data is often imbalanced and, to further complicate the issue, we are frequently interested in learning the minority class. It follows that ignoring these two aspects during learning may lead to unreliable, or even incorrect, models being built. In this research we discuss the interplay between concept drift detection and imbalanced data sets in order to ensure reliable results. We introduce a novel algorithm that, rather than considering a single performance evaluation measure such as accuracy for change detection, considers all the components of a confusion matrix and employs the cosine similarity coefficient. We evaluate our algorithm against a real world mobile phone database, as well as benchmarking datasets, and we compare it with two other state-of-the-art methods. The results show that our approach is particularly sensitive to concept drifts occurring in imbalanced data sets. Our evaluation indicates that our algorithm is able to detect concept drift reliably. Further, our method is shown to perform very well compared to the other techniques, especially when the drift occurs in the minority class of a class imbalance problem.
Keywords :
data handling; learning (artificial intelligence); matrix algebra; mobile computing; PerfSim algorithm; adaptive learning algorithms; benchmarking datasets; change detection; class imbalance problem; concept drift detection; confusion matrix; continuous data streams; cosine similarity coefficient; imbalanced data sets; machine learning; mobile phone database; ozone level peak predictions; smart phone usage pattern detection; stock market indicator learning; Accuracy; Change detection algorithms; Data models; Prediction algorithms; Reliability; Training; Vectors; Concept drift detection; class imbalance; classification; reliable model building;
Conference_Titel :
Data Mining Workshops (ICDMW), 2012 IEEE 12th International Conference on
Conference_Location :
Brussels
Print_ISBN :
978-1-4673-5164-5
DOI :
10.1109/ICDMW.2012.122