Title :
Class Imbalance Oriented Logistic Regression
Author :
Yadong Dong ; Huaping Guo ; Weimei Zhi ; Ming Fan
Author_Institution :
Sch. of Inf. Eng., ZhengZhou Univ., Zhengzhou, China
Abstract :
Class-imbalance is quite common in real world. For the imbalanced class distribution, traditional state-of-the-art classifiers do not work well on imbalanced data sets. In this paper, we apply logistic regression model to class-imbalance problem, and propose a novel algorithm called CILR (Class Imbalance oriented Logistic Regression) to tackle imbalanced data sets. Unlike traditional logistic regression which tries to optimize MLE (maximum likelihood Estimation) function, CILR optimizes the proposed objective function based on MLE and recall metric in this paper. The loss function takes full use of the characteristic of both majority class and minority class simultaneously, which guarantees that CILR enhances the classification performance of logistic regression on rare class without decreasing accuracy in general. Experimental results on 16 data sets show that CILR performs significantly better than traditional logistic regression, under-sampled logistic regression and over-sampled logistic regression.
Keywords :
data handling; logistics; maximum likelihood estimation; pattern classification; regression analysis; CILR performs; MLE function; class imbalance oriented logistic regression; class-imbalance problem; classification performance; imbalanced class distribution; imbalanced data sets; logistic regression model; maximum likelihood estimation function; minority class; over-sampled logistic regression; under-sampled logistic regression; Accuracy; Breast cancer; Ionosphere; Linear programming; Logistics; Maximum likelihood estimation; Measurement; classification; imbalanced data sets; logistic regression; recall;
Conference_Titel :
Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC), 2014 International Conference on
Conference_Location :
Shanghai
Print_ISBN :
978-1-4799-6235-8
DOI :
10.1109/CyberC.2014.42