DocumentCode
2516129
Title
A Hybrid Method for Feature Selection Based on Mutual Information and Canonical Correlation Analysis
Author
Sakar, C. Okan ; Kursun, Olcay
Author_Institution
Dept. of Comput. Eng., Bahcesehir Univ., Istanbul, Turkey
fYear
2010
fDate
23-26 Aug. 2010
Firstpage
4360
Lastpage
4363
Abstract
Mutual Information (MI) is a classical and widely used dependence measure that generally can serve as a good feature selection algorithm. However, under-sampled classes or rare but certain relations are overlooked by this measure, which can result in missing relevant features that could be very predictive of variables of interest, such as certain phenotypes or disorders in biomedical research, rare but dangerous factors in ecology, intrusions in network systems, etc. On the other hand, Kernel Canonical Correlation Analysis (KCCA) is a nonlinear correlation measure effectively used to detect independence but its use for feature selection or ranking is limited due to the fact that its formulation is not intended to measure the amount of information (entropy) of the dependence. In this paper, we propose Predictive Mutual Information (PMI), a hybrid measure of relevance not only is based on MI but also accounts for predictability of signals from one another as in KCCA. We show that PMI has more improved feature detection capability than MI and KCCA, especially in catching suspicious coincidences that are rare but potentially important not only for subsequent experimental studies but also for building computational predictive models which is demonstrated on two toy datasets and a real intrusion detection system dataset.
Keywords
correlation theory; entropy; feature extraction; operating system kernels; security of data; statistical analysis; KCCA; PMI; feature selection algorithm; information entropy; intrusion detection system; kernel canonical correlation analysis; nonlinear correlation measure; predictive mutual information; Accuracy; Biomedical measurements; Correlation; Entropy; Joints; Kernel; Mutual information;
fLanguage
English
Publisher
ieee
Conference_Titel
Pattern Recognition (ICPR), 2010 20th International Conference on
Conference_Location
Istanbul
ISSN
1051-4651
Print_ISBN
978-1-4244-7542-1
Type
conf
DOI
10.1109/ICPR.2010.1060
Filename
5597870
Link To Document