• DocumentCode
    939125
  • Title

    Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy

  • Author

    Peng, Hanchuan ; Long, Fuhui ; Ding, Chris

  • Author_Institution
    Lawrence Berkeley Nat. Lab., California Univ., Berkeley, CA, USA
  • Volume
    27
  • Issue
    8
  • fYear
    2005
  • Firstpage
    1226
  • Lastpage
    1238
  • Abstract
    Feature selection is an important problem for pattern classification systems. We study how to select good features according to the maximal statistical dependency criterion based on mutual information. Because of the difficulty in directly implementing the maximal dependency condition, we first derive an equivalent form, called minimal-redundancy-maximal-relevance criterion (mRMR), for first-order incremental feature selection. Then, we present a two-stage feature selection algorithm by combining mRMR and other more sophisticated feature selectors (e.g., wrappers). This allows us to select a compact set of superior features at very low cost. We perform extensive experimental comparison of our algorithm and other methods using three different classifiers (naive Bayes, support vector machine, and linear discriminate analysis) and four different data sets (handwritten digits, arrhythmia, NCI cancer cell lines, and lymphoma tissues). The results confirm that mRMR leads to promising improvement on feature selection and classification accuracy.
  • Keywords
    feature extraction; pattern classification; statistical analysis; arrhythmia; cancer cell lines; first-order incremental feature selection; handwritten digits; linear discriminate analysis; lymphoma tissues; maximal statistical dependency criterion; minimal-redundancy-maximal-relevance criterion; mutual information criteria; naive Bayes; pattern classification systems; support vector machine; Algorithm design and analysis; Cancer; Costs; Diversity reception; Mutual information; Pattern classification; Performance analysis; Redundancy; Support vector machine classification; Support vector machines; Index Terms- Feature selection; classification.; maximal dependency; maximal relevance; minimal redundancy; mutual information; Algorithms; Artificial Intelligence; Cluster Analysis; Computer Simulation; Diagnosis, Computer-Assisted; Humans; Information Storage and Retrieval; Models, Statistical; Neoplasms; Numerical Analysis, Computer-Assisted; Pattern Recognition, Automated;
  • fLanguage
    English
  • Journal_Title
    Pattern Analysis and Machine Intelligence, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0162-8828
  • Type

    jour

  • DOI
    10.1109/TPAMI.2005.159
  • Filename
    1453511