Title :
Certainty-Enhanced Active Learning for Improving Imbalanced Data Classification
Author :
Fu, JuiHsi ; Lee, SingLing
Author_Institution :
Dept. of Comput. Sci. & Inf. Eng., Nat. Chung Cheng Univ., Chiayi, Taiwan
Abstract :
In active learning algorithms, informative samples are usually queried for true labels according to the disagreement of existing hypotheses. However we observed that, when the streaming dataset has skewed class membership, the imbalanced data classification problem is caused in active learning. The Minority class is overwhelmed by the majority class in generating the hypotheses. In this paper, for each unlabeled sample we propose to utilize only local behavior in the certainty-enhanced neighborhood, rather than the entire dataset, to generate the error minimization hypotheses. Consequently, our proposed method enhances the prediction of hypotheses and is able to determine the query probabilities properly. In our experiments, synthetic and real-world datasets are used for presenting the effectiveness of our active learning approach. It is shown that the proposed approach decreases the probability of querying a certain (majority) sample and has the ability of dealing with the imbalanced data classification problem in active learning.
Keywords :
learning (artificial intelligence); pattern classification; probability; certainty-enhanced active learning; error minimization hypotheses; improving imbalanced data classification; majority class; minority class; query probabilities; skewed class membership; Algorithm design and analysis; Complexity theory; Measurement uncertainty; Minimization; Polynomials; Supervised learning; Support vector machines; Active Learning; Certainty-Enhanced Neighborhood; Imbalanced Data Classification; Lazy Learning; Streaming Datasets;
Conference_Titel :
Data Mining Workshops (ICDMW), 2011 IEEE 11th International Conference on
Conference_Location :
Vancouver, BC
Print_ISBN :
978-1-4673-0005-6
DOI :
10.1109/ICDMW.2011.43