مرکز منطقه ای اطلاع رساني علوم و فناوري - Learning from combination of data chunks for multi-class imbalanced data

DocumentCode :

1797886

Title :

Learning from combination of data chunks for multi-class imbalanced data

Author :

Xu-Ying Liu ; Qian-Qian Li

Author_Institution :

Key Lab. of Comput. Network & Inf. Integration, Southeast Univ., Nanjing, China

fYear :

2014

fDate :

6-11 July 2014

Firstpage :

1680

Lastpage :

1687

Abstract :

Class-imbalance is very common in real-world applications. Previous studies focused on binary-class imbalance problem, whereas multi-class imbalance problem is more general and more challenging. Under-sampling is an effective and efficient method for binary-class imbalanced data. But when it is used for multi-class imbalanced data, many more majority class examples are ignored because there are often multiple majority classes, and the minority class often has few data. To utilize the information contained in the majority class examples ignored by under-sampling, this paper proposes a method ChunkCombine. For each majority class, it performs under-sampling multiple times to obtained non-overlapping data chunks, such that they contain the most information that a data sample of the same size can contain. Each data chunk has the same size as the minority class to achieve balance. Then every possible combination of the minority class and each data chunk from every majority class forms a balanced training set. ChunkCombine uses ensemble techniques to learn from the different training sets derived from all the possible combinations. Experimental results show it is better than many other popular methods for multi-class imbalanced data when average accuracy, G-mean and MAUC are used as evaluation measures. Besides, we discuss different evaluation measures and suggest that, a multi-class F-measure Mean F-Measure (MFM) is unsuitable for multi-class imbalanced data in many situations because it is not consistent with the standard F-measure in binary-class case and it is close to accuracy.

Keywords :

learning (artificial intelligence); pattern classification; set theory; ChunkCombine method; G-mean; MAUC; average accuracy; binary-class imbalance problem; multiclass AdaBoost classifiers; multiclass F-measure mean F-measure; multiclass imbalanced data; nonoverlapping data chunks; training sets; under-sampling; Accuracy; Boosting; Educational institutions; Feature extraction; Standards; Training;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Neural Networks (IJCNN), 2014 International Joint Conference on

Conference_Location :

Beijing

Print_ISBN :

978-1-4799-6627-1

Type :

conf

DOI :

10.1109/IJCNN.2014.6889667

Filename :

6889667

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1797886