Title :
Super-CWC and super-LCC: Super fast feature selection algorithms
Author :
Kilho Shin;Tetsuji Kuboyama;Takako Hashimot;Dave Shepard
Author_Institution :
University of Hyogo, Kobe, Japan
Abstract :
Feature selection is a useful tool for identifying which features, or attributes, of a dataset cause or explain phenomena, and improving the efficiency and accuracy of learning algorithms for discovering such phenomena. Consequently, feature selection has been studied intensively in machine learning research. However, advanced feature selection algorithms that can avoid redundant selection of features and can detect interacting features require heavy computation in general and hence are seldom used for big data analysis. To eliminate this limitation, we tried to improve the run-time performance of two of the most advanced feature selection algorithms known in the literature. We have developed two accurate and extremely fast algorithms, namely Super CWC and Super LCC. In experiments with multiple real datasets which are actually studied in big data research, we have demonstrated that our algorithms improve the performance of their original algorithms remarkably. For example, for two datasets, one with 15,568 instances and 15,741 features and another with 200,569 instances and 99,672 features, Super-CWC performed feature selection in 1.4 seconds and in 405 seconds, respectively. This is a remarkable improvement, because it is estimated that the original algorithms would need several hours to a few ten days to perform feature selection on the same datasets.
Keywords :
"Feature extraction","Big data","Algorithm design and analysis","Machine learning algorithms","Mutual information","Clustering algorithms","Redundancy"
Conference_Titel :
Big Data (Big Data), 2015 IEEE International Conference on
DOI :
10.1109/BigData.2015.7363742