• DocumentCode
    376
  • Title

    On Similarity Preserving Feature Selection

  • Author

    Zheng Zhao ; Lei Wang ; Huan Liu ; Jieping Ye

  • Author_Institution
    SAS Headquarters, SAS Inst. Inc., Cary, NC, USA
  • Volume
    25
  • Issue
    3
  • fYear
    2013
  • fDate
    Mar-13
  • Firstpage
    619
  • Lastpage
    632
  • Abstract
    In the literature of feature selection, different criteria have been proposed to evaluate the goodness of features. In our investigation, we notice that a number of existing selection criteria implicitly select features that preserve sample similarity, and can be unified under a common framework. We further point out that any feature selection criteria covered by this framework cannot handle redundant features, a common drawback of these criteria. Motivated by these observations, we propose a new "Similarity Preserving Feature Selection” framework in an explicit and rigorous way. We show, through theoretical analysis, that the proposed framework not only encompasses many widely used feature selection criteria, but also naturally overcomes their common weakness in handling feature redundancy. In developing this new framework, we begin with a conventional combinatorial optimization formulation for similarity preserving feature selection, then extend it with a sparse multiple-output regression formulation to improve its efficiency and effectiveness. A set of three algorithms are devised to efficiently solve the proposed formulations, each of which has its own advantages in terms of computational complexity and selection performance. As exhibited by our extensive experimental study, the proposed framework achieves superior feature selection performance and attractive properties.
  • Keywords
    combinatorial mathematics; computational complexity; feature extraction; optimisation; regression analysis; combinatorial optimization formulation; computational complexity; feature redundancy handling; similarity preserving feature selection criteria; sparse multiple output regression formulation; Algorithm design and analysis; Feature extraction; Laplace equations; Optimization; Prediction algorithms; Redundancy; Feature selection; multiple output regression; redundancy removal; similarity preserving; sparse regularization;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2011.222
  • Filename
    6051436