مرکز منطقه ای اطلاع رساني علوم و فناوري - How the Choice of Wrapper Learner and Performance Metric Affects Subset Evaluation

DocumentCode :

680729

Title :

How the Choice of Wrapper Learner and Performance Metric Affects Subset Evaluation

Author :

Wald, Randall ; Khoshgoftaar, Taghi M. ; Napolitano, Antonio

Author_Institution :

Florida Atlantic Univ., Boca Raton, FL, USA

fYear :

2013

fDate :

4-6 Nov. 2013

Firstpage :

426

Lastpage :

432

Abstract :

Due to the widespread problem of high dimensionality(datasets with many features/independent attributes), feature selection has become an important research topic in many areas of machine learning. One form of feature selection, wrapper-based subset evaluation, has been the focus of a moderate amount of research, because its use of classification learners to find optimal feature subsets has the potential to remove redundant features and find feature subsets which directly achieve the goal of improving classification performance. However, while the choice of learner to use within the wrapper framework has previously been studied, no paper has thoroughly investigated the role of the performance metric used within the wrapper process. Especially with imbalanced data (data where one class predominates over other classes), traditional metrics such as accuracy can give a misleading view of how many instances from each class are mislabeled. While it seems intuitive that metrics which take balance into account will affect the chosen features, no previous study has investigated this effect directly. In the present work, we test five different learners and five different performance metrics within the wrapper framework and use a newly-proposed variant of the Tanimoto Index to evaluate the similarity among the different choices of learner and metric as all other factors are held constant, using two datasets from the domain of social network profile mining. We find that while the Best Arithmetic Mean and Best Geometric Mean metrics (both of which find the stated means of True Positive Rate and True Negative Rate) are somewhat similar, they still are quite distinct, and no other metrics are particularly similar to one another. The five learners were also found to produce extremely dissimilar feature subsets. Thus, we show that the choice of both learner and metric has a major effect on which features are selected by through wrapper-based feature selection.

Keywords :

data mining; learning (artificial intelligence); pattern classification; Tanimoto Index; arithmetic mean; classification learners; feature selection; geometric mean metrics; imbalanced data; machine learning; optimal feature subsets; performance metric; social network profile mining; subset evaluation; wrapper learner; wrapper process; Buildings; Feature extraction; Indexes; Measurement; Stability criteria; Support vector machines; Twitter; Wrapper feature selection; imbalanced data; performance metrics; similarity;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Tools with Artificial Intelligence (ICTAI), 2013 IEEE 25th International Conference on

Conference_Location :

Herndon, VA

ISSN :

1082-3409

Print_ISBN :

978-1-4799-2971-9

Type :

conf

DOI :

10.1109/ICTAI.2013.70

Filename :

6735281

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=680729