• DocumentCode
    2516129
  • Title

    A Hybrid Method for Feature Selection Based on Mutual Information and Canonical Correlation Analysis

  • Author

    Sakar, C. Okan ; Kursun, Olcay

  • Author_Institution
    Dept. of Comput. Eng., Bahcesehir Univ., Istanbul, Turkey
  • fYear
    2010
  • fDate
    23-26 Aug. 2010
  • Firstpage
    4360
  • Lastpage
    4363
  • Abstract
    Mutual Information (MI) is a classical and widely used dependence measure that generally can serve as a good feature selection algorithm. However, under-sampled classes or rare but certain relations are overlooked by this measure, which can result in missing relevant features that could be very predictive of variables of interest, such as certain phenotypes or disorders in biomedical research, rare but dangerous factors in ecology, intrusions in network systems, etc. On the other hand, Kernel Canonical Correlation Analysis (KCCA) is a nonlinear correlation measure effectively used to detect independence but its use for feature selection or ranking is limited due to the fact that its formulation is not intended to measure the amount of information (entropy) of the dependence. In this paper, we propose Predictive Mutual Information (PMI), a hybrid measure of relevance not only is based on MI but also accounts for predictability of signals from one another as in KCCA. We show that PMI has more improved feature detection capability than MI and KCCA, especially in catching suspicious coincidences that are rare but potentially important not only for subsequent experimental studies but also for building computational predictive models which is demonstrated on two toy datasets and a real intrusion detection system dataset.
  • Keywords
    correlation theory; entropy; feature extraction; operating system kernels; security of data; statistical analysis; KCCA; PMI; feature selection algorithm; information entropy; intrusion detection system; kernel canonical correlation analysis; nonlinear correlation measure; predictive mutual information; Accuracy; Biomedical measurements; Correlation; Entropy; Joints; Kernel; Mutual information;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Pattern Recognition (ICPR), 2010 20th International Conference on
  • Conference_Location
    Istanbul
  • ISSN
    1051-4651
  • Print_ISBN
    978-1-4244-7542-1
  • Type

    conf

  • DOI
    10.1109/ICPR.2010.1060
  • Filename
    5597870