DocumentCode :
2485172
Title :
Mining Data with Rare Events: A Case Study
Author :
Seiffert, Chris ; Khoshgoftaar, Taghi M. ; Hulse, Jason Van ; Napolitano, Amri
Author_Institution :
Florida Atlantic Univ., Boca Raton
Volume :
2
fYear :
2007
fDate :
29-31 Oct. 2007
Firstpage :
132
Lastpage :
139
Abstract :
The performance of classification models can be negatively impacted if the data on which they are trained contains very rare events. While recent research has investigated the issue of class imbalance, few if any studies address issues related to the handling of extreme imbalance (rare events), where the minority class can account for as little as 0.1% of the training data. This work investigates the effect of dataset size and class distribution on classification performance when examples from the minority class are rare. In addition, we compare the performance improvement achieved by acquiring additional examples to that of applying data sampling. Our results demonstrate that data sampling is very effective at alleviating the problem of rare events.
Keywords :
data mining; pattern classification; data mining; data sampling; pattern classification; Artificial intelligence; Buildings; Computer science; Costs; Data engineering; Data mining; Sampling methods; Terminology; Training data; USA Councils;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Tools with Artificial Intelligence, 2007. ICTAI 2007. 19th IEEE International Conference on
Conference_Location :
Patras
ISSN :
1082-3409
Print_ISBN :
978-0-7695-3015-4
Type :
conf
DOI :
10.1109/ICTAI.2007.71
Filename :
4410370
Link To Document :
بازگشت