DocumentCode
245002
Title
Towards Scalable and Accurate Online Feature Selection for Big Data
Author
Kui Yu ; Xindong Wu ; Wei Ding ; Jian Pei
Author_Institution
Sch. of Comput. Sci., Simon Fraser Univ., Burnaby, BC, Canada
fYear
2014
fDate
14-17 Dec. 2014
Firstpage
660
Lastpage
669
Abstract
Feature selection is important in many big data applications. There are at least two critical challenges. Firstly, in many applications, the dimensionality is extremely high, in millions, and keeps growing. Secondly, feature selection has to be highly scalable, preferably in an online manner such that each feature can be processed in a sequential scan. In this paper, we develop SAOLA, a Scalable and Accurate On Line Approach for feature selection. With a theoretical analysis on a low bound on the pair wise correlations between features in the currently selected feature subset, SAOLA employs novel online pair wise comparison techniques to address the two challenges and maintain a parsimonious model over time in an online manner. An empirical study using a series of benchmark real data sets shows that SAOLA is scalable on data sets of extremely high dimensionality, and has superior performance over the state-of-the-art feature selection methods.
Keywords
Big Data; data mining; Big Data; SAOLA; feature selection; pairwise correlation; scalable and accurate online approach; Accuracy; Big data; Correlation; Markov processes; Redundancy; Search problems; Training; Extremely high dimensionality; Feature redundancy; Online feature selection;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Mining (ICDM), 2014 IEEE International Conference on
Conference_Location
Shenzhen
ISSN
1550-4786
Print_ISBN
978-1-4799-4303-6
Type
conf
DOI
10.1109/ICDM.2014.63
Filename
7023383
Link To Document