• DocumentCode
    245002
  • Title

    Towards Scalable and Accurate Online Feature Selection for Big Data

  • Author

    Kui Yu ; Xindong Wu ; Wei Ding ; Jian Pei

  • Author_Institution
    Sch. of Comput. Sci., Simon Fraser Univ., Burnaby, BC, Canada
  • fYear
    2014
  • fDate
    14-17 Dec. 2014
  • Firstpage
    660
  • Lastpage
    669
  • Abstract
    Feature selection is important in many big data applications. There are at least two critical challenges. Firstly, in many applications, the dimensionality is extremely high, in millions, and keeps growing. Secondly, feature selection has to be highly scalable, preferably in an online manner such that each feature can be processed in a sequential scan. In this paper, we develop SAOLA, a Scalable and Accurate On Line Approach for feature selection. With a theoretical analysis on a low bound on the pair wise correlations between features in the currently selected feature subset, SAOLA employs novel online pair wise comparison techniques to address the two challenges and maintain a parsimonious model over time in an online manner. An empirical study using a series of benchmark real data sets shows that SAOLA is scalable on data sets of extremely high dimensionality, and has superior performance over the state-of-the-art feature selection methods.
  • Keywords
    Big Data; data mining; Big Data; SAOLA; feature selection; pairwise correlation; scalable and accurate online approach; Accuracy; Big data; Correlation; Markov processes; Redundancy; Search problems; Training; Extremely high dimensionality; Feature redundancy; Online feature selection;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining (ICDM), 2014 IEEE International Conference on
  • Conference_Location
    Shenzhen
  • ISSN
    1550-4786
  • Print_ISBN
    978-1-4799-4303-6
  • Type

    conf

  • DOI
    10.1109/ICDM.2014.63
  • Filename
    7023383