مرکز منطقه ای اطلاع رساني علوم و فناوري - A novel feature selection technique for highly imbalanced data

DocumentCode :

1842912

Title :

A novel feature selection technique for highly imbalanced data

Author :

Khoshgoftaar, Taghi M. ; Gao, Kehan ; Van Hulse, Jason

Author_Institution :

Florida Atlantic Univ., Boca Raton, FL, USA

fYear :

2010

fDate :

4-6 Aug. 2010

Firstpage :

Lastpage :

Abstract :

Two challenges often encountered in data mining are the presence of excessive features in a data set and unequal numbers of examples in the two classes in a binary classification problem. In this paper, we propose a novel approach to feature selection for imbalanced data in the context of software quality engineering. This technique consists of a repetitive process of data sampling followed by feature ranking and finally aggregating the results generated during the repetitive process. This repetitive feature selection method is compared with two other approaches: one uses a filter-based feature ranking technique alone on the original data, while the other uses the data sampling and feature ranking techniques together only once. The empirical validation is carried out on two groups of software data sets. The results demonstrate that our proposed repetitive feature selection method performs on average significantly better than the other two approaches, especially when the data set is highly imbalanced.

Keywords :

data mining; pattern classification; software quality; binary classification problem; data mining; data sampling; feature selection technique; filter-based feature ranking technique; highly imbalanced data; repetitive feature selection method; repetitive process; software data sets; software quality engineering; Analysis of variance; Measurement; Niobium; Radio frequency; Software quality; Support vector machines; Training data;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Information Reuse and Integration (IRI), 2010 IEEE International Conference on

Conference_Location :

Las Vegas, NV

Print_ISBN :

978-1-4244-8097-5

Type :

conf

DOI :

10.1109/IRI.2010.5558961

Filename :

5558961

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1842912