DocumentCode
3114973
Title
A New Performance Measure for Class Imbalance Learning. Application to Bioinformatics Problems
Author
Batuwita, Rukshan ; Palade, Vasile
Author_Institution
Comput. Lab., Univ. of Oxford, Oxford, UK
fYear
2009
fDate
13-15 Dec. 2009
Firstpage
545
Lastpage
550
Abstract
In class imbalance learning, the performance measure used for the model selection would play a vital role. It has been well-studied in the past research that the most widely used performance measure, the overall accuracy of the model, can lead to sub-optimal classification models when learning from imbalanced datasets. In order to overcome this problem, other performance measures, such as the geometric-mean (Gm) and F-measure (Fm), have been used for imbalanced dataset learning. Training a classifier system with an imbalanced dataset (where the positive class is the minority class) would usually produce sub-optimal models having a higher specificity (SP) and a lower sensitivity (SE). By applying class imbalance learning methods, we would often be able to increase the SE by sacrificing some amount of SP. In some type of real world imbalanced classification problems, such as the gene finding Bioinformatics problems, it is important to improve the SE as much as possible by keeping the reduction of SP to the minimum. In this paper, we show that with respect to this type of classification problems the existing performance measures used in class imbalance learning (Gm and Fm) can still result in sub-optimal classification models. In order to circumvent these problems, we introduced a new performance measure, called adjusted geometric-mean (AGm). We show, both analytically and empirically on two real-world Bioinformatics datasets, that AGm can perform better than Gm and Fm metrics.
Keywords
bioinformatics; classification; learning (artificial intelligence); adjusted geometric-mean; bioinformatics; class imbalance learning; imbalanced dataset learning; suboptimal classification models; Bioinformatics; Costs; Data processing; Electronic mail; Laboratories; Learning systems; Machine learning; Performance analysis; Predictive models; Proteins; Bioinformatics; Class Imbalance Learning; Model Selection; Performance Measures; SVMs;
fLanguage
English
Publisher
ieee
Conference_Titel
Machine Learning and Applications, 2009. ICMLA '09. International Conference on
Conference_Location
Miami Beach, FL
Print_ISBN
978-0-7695-3926-3
Type
conf
DOI
10.1109/ICMLA.2009.126
Filename
5381421
Link To Document