Title :
Towards Scalable and Accurate Online Feature Selection for Big Data
Author :
Kui Yu ; Xindong Wu ; Wei Ding ; Jian Pei
Author_Institution :
Sch. of Comput. Sci., Simon Fraser Univ., Burnaby, BC, Canada
Abstract :
Feature selection is important in many big data applications. There are at least two critical challenges. Firstly, in many applications, the dimensionality is extremely high, in millions, and keeps growing. Secondly, feature selection has to be highly scalable, preferably in an online manner such that each feature can be processed in a sequential scan. In this paper, we develop SAOLA, a Scalable and Accurate On Line Approach for feature selection. With a theoretical analysis on a low bound on the pair wise correlations between features in the currently selected feature subset, SAOLA employs novel online pair wise comparison techniques to address the two challenges and maintain a parsimonious model over time in an online manner. An empirical study using a series of benchmark real data sets shows that SAOLA is scalable on data sets of extremely high dimensionality, and has superior performance over the state-of-the-art feature selection methods.
Keywords :
Big Data; data mining; Big Data; SAOLA; feature selection; pairwise correlation; scalable and accurate online approach; Accuracy; Big data; Correlation; Markov processes; Redundancy; Search problems; Training; Extremely high dimensionality; Feature redundancy; Online feature selection;
Conference_Titel :
Data Mining (ICDM), 2014 IEEE International Conference on
Conference_Location :
Shenzhen
Print_ISBN :
978-1-4799-4303-6
DOI :
10.1109/ICDM.2014.63