مرکز منطقه ای اطلاع رساني علوم و فناوري - Filter- and wrapper-based feature selection for predicting user interaction with Twitter bots

DocumentCode :

1831855

Title :

Filter- and wrapper-based feature selection for predicting user interaction with Twitter bots

Author :

Wald, Randall ; Khoshgoftaar, Taghi ; Napolitano, Antonio

Author_Institution :

Florida Atlantic Univ., Boca Raton, FL, USA

fYear :

2013

fDate :

14-16 Aug. 2013

Firstpage :

416

Lastpage :

423

Abstract :

High dimensionality (the presence of too many features) is a problem which plagues many datasets, including mining from personality profiles. Feature selection can be used to reduce the number of features, and many strategies have been proposed to help select the most important features from a larger group. Feature rankers will produce a metric for each feature and return the best for a given subset size, while filter-based subset evaluation will perform statistical analysis on whole subsets and wrapper-based subset selection will use classification models with chosen features to decide which are most important for model-building. While all three approaches have been discussed in the literature, relatively little work compares all three with one another directly. In the present study, we do precisely this, considering feature ranking, filter-based subset evaluation, and wrapper-based subset selection (along with no feature ranking) on two datasets based on predicting interaction with bots on Twitter. For the two subset-based techniques, we consider two search techniques (Best First and Greedy Stepwise) to build the subsets, while we use one feature ranker (ROC) chosen for its excellent performance in previous works. Six learners are used to build models with the selected features. We find that feature ranking consistently performs well, giving the best results for four of the six learners on both datasets. In addition, all of the techniques other than feature ranking perform worse than no feature selection for four of six learners. This leads us to recommend the use of feature ranking over more complex subset evaluation techniques.

Keywords :

data mining; social networking (online); ROC; Twitter bots; best first search technique; feature ranking; filter based feature selection; filter-based subset evaluation; greedy stepwise search technique; user interaction prediction; wrapper-based feature selection; wrapper-based subset selection; Data models; Feature extraction; Measurement; Niobium; Pragmatics; Support vector machines; Twitter; Twitter; feature selection; interaction; replies; social bots;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Information Reuse and Integration (IRI), 2013 IEEE 14th International Conference on

Conference_Location :

San Francisco, CA

Type :

conf

DOI :

10.1109/IRI.2013.6642501

Filename :

6642501

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1831855