Title :
Resampling and Cost-Sensitive Methods for Imbalanced Multi-instance Learning
Author :
Xiaoguang Wang ; Xuan Liu ; Japkowicz, Nathalie ; Matwin, S.
Author_Institution :
Sch. of Electr. Eng. & Comput. Sci., Univ. of Ottawa, Ottawa, ON, Canada
Abstract :
Multi-instance learning uses a set of bags containing many instances, which makes it different from standard propositional classification. Our research shows that, similar to the single-instance imbalance problem, classification of multi-instance data with imbalanced class distributions significantly degrades performance when compared to most standard multi-instance algorithms in a balanced setting. Due to the inherent differences between multi-instance and single-instance learning, the existing solutions for single-instance class imbalance problems do not transfer directly to multi-instance datasets. This is a drawback, as imbalanced multi-instance problems often occur in data mining practice. In this paper, we propose two solution frameworks for multi-instance class imbalanced datasets. In the first we explore multi-instance data sampling methods, and in the second we present a novel generalized version of a multi-instance cost-sensitive boosting technique. Experimental results, on benchmark datasets and application datasets, show that the proposed frameworks are an effective solution for the multi-instance class imbalance problem.
Keywords :
learning (artificial intelligence); pattern classification; sampling methods; application datasets; benchmark datasets; data mining; imbalanced class distribution; imbalanced multi-instance learning method; multi-instance class imbalanced datasets; multi-instance cost-sensitive boosting technique; multi-instance data classification; multi-instance data sampling methods; resampling method; Boosting; Data mining; Educational institutions; Sampling methods; Sonar detection; Training; Class Imbalance; Multi-instance learning;
Conference_Titel :
Data Mining Workshops (ICDMW), 2013 IEEE 13th International Conference on
Conference_Location :
Dallas, TX
Print_ISBN :
978-1-4799-3143-9
DOI :
10.1109/ICDMW.2013.85