DocumentCode :
1831855
Title :
Filter- and wrapper-based feature selection for predicting user interaction with Twitter bots
Author :
Wald, Randall ; Khoshgoftaar, Taghi ; Napolitano, Antonio
Author_Institution :
Florida Atlantic Univ., Boca Raton, FL, USA
fYear :
2013
fDate :
14-16 Aug. 2013
Firstpage :
416
Lastpage :
423
Abstract :
High dimensionality (the presence of too many features) is a problem which plagues many datasets, including mining from personality profiles. Feature selection can be used to reduce the number of features, and many strategies have been proposed to help select the most important features from a larger group. Feature rankers will produce a metric for each feature and return the best for a given subset size, while filter-based subset evaluation will perform statistical analysis on whole subsets and wrapper-based subset selection will use classification models with chosen features to decide which are most important for model-building. While all three approaches have been discussed in the literature, relatively little work compares all three with one another directly. In the present study, we do precisely this, considering feature ranking, filter-based subset evaluation, and wrapper-based subset selection (along with no feature ranking) on two datasets based on predicting interaction with bots on Twitter. For the two subset-based techniques, we consider two search techniques (Best First and Greedy Stepwise) to build the subsets, while we use one feature ranker (ROC) chosen for its excellent performance in previous works. Six learners are used to build models with the selected features. We find that feature ranking consistently performs well, giving the best results for four of the six learners on both datasets. In addition, all of the techniques other than feature ranking perform worse than no feature selection for four of six learners. This leads us to recommend the use of feature ranking over more complex subset evaluation techniques.
Keywords :
data mining; social networking (online); ROC; Twitter bots; best first search technique; feature ranking; filter based feature selection; filter-based subset evaluation; greedy stepwise search technique; user interaction prediction; wrapper-based feature selection; wrapper-based subset selection; Data models; Feature extraction; Measurement; Niobium; Pragmatics; Support vector machines; Twitter; Twitter; feature selection; interaction; replies; social bots;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Reuse and Integration (IRI), 2013 IEEE 14th International Conference on
Conference_Location :
San Francisco, CA
Type :
conf
DOI :
10.1109/IRI.2013.6642501
Filename :
6642501
Link To Document :
بازگشت