DocumentCode :
2843833
Title :
Addressing Data-Complexity for Imbalanced Data-Sets: A Preliminary Study on the Use of Preprocessing for C4.5
Author :
Luengo, Julián ; Fernandez, Alicia ; Herrera, Francisco ; Herrera, Francisco
Author_Institution :
Dept. of Comput. Sci. & A.I., Univ. of Granada, Granada, Spain
fYear :
2009
fDate :
Nov. 30 2009-Dec. 2 2009
Firstpage :
523
Lastpage :
528
Abstract :
In this work we analyse the behaviour of the C4.5 classification method with respect to a bunch of imbalanced data-sets. We consider the use of two metrics of data complexity known as ¿maximum Fishers discriminant ratio¿ and ¿nonlinearity of 1NN classifier¿, to analyse the effect of preprocessing (oversampling in this case) in order to deal with the imbalance problem. In order to do that, we analyse C4.5 over a wide range of imbalanced data-sets built from real data, and try to extract behaviour patterns from the results. We obtain rules that describe both good or bad behaviours of C4.5 in the case of using the original data-sets (absence of preprocessing) and when applying preprocessing. These rules allow us to determine the effect of the use of preprocessing and to predict the response of C4.5 to preprocessing from the data-set´s complexity metrics prior to its application, and then establish when the preprocessing would be useful to.
Keywords :
pattern classification; 1NN classifier; C4.5 classification method; data complexity metrics; imbalanced data sets; maximum Fishers discriminant ratio; Application software; Classification tree analysis; Computer science; Data mining; Decision trees; Density measurement; Geometry; Intelligent systems; Pattern analysis; Topology; C4.5; Classification; Data complexity; Imbalanced Data-sets; Oversampling;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Intelligent Systems Design and Applications, 2009. ISDA '09. Ninth International Conference on
Conference_Location :
Pisa
Print_ISBN :
978-1-4244-4735-0
Electronic_ISBN :
978-0-7695-3872-3
Type :
conf
DOI :
10.1109/ISDA.2009.233
Filename :
5364953
Link To Document :
بازگشت