DocumentCode
1344016
Title
Local-Learning-Based Feature Selection for High-Dimensional Data Analysis
Author
Sun, Yijun ; Todorovic, Sinisa ; Goodison, Steve
Author_Institution
Interdiscipl. Center for Biotechnol. Res., Univ. of Florida, Gainesville, FL, USA
Volume
32
Issue
9
fYear
2010
Firstpage
1610
Lastpage
1626
Abstract
This paper considers feature selection for data classification in the presence of a huge number of irrelevant features. We propose a new feature-selection algorithm that addresses several major issues with prior work, including problems with algorithm implementation, computational complexity, and solution accuracy. The key idea is to decompose an arbitrarily complex nonlinear problem into a set of locally linear ones through local learning, and then learn feature relevance globally within the large margin framework. The proposed algorithm is based on well-established machine learning and numerical analysis techniques, without making any assumptions about the underlying data distribution. It is capable of processing many thousands of features within minutes on a personal computer while maintaining a very high accuracy that is nearly insensitive to a growing number of irrelevant features. Theoretical analyses of the algorithm´s sample complexity suggest that the algorithm has a logarithmical sample complexity with respect to the number of features. Experiments on 11 synthetic and real-world data sets demonstrate the viability of our formulation of the feature-selection problem for supervised learning and the effectiveness of our algorithm.
Keywords
computational complexity; data analysis; learning (artificial intelligence); arbitrarily complex nonlinear problem; computational complexity; high dimensional data analysis; local learning based feature selection algorithm; logarithmical sample complexity; machine learning; real world data set; supervised learning; Algorithm design and analysis; Computational complexity; Data analysis; Machine learning; Machine learning algorithms; Microcomputers; Numerical analysis; Sun; Support vector machine classification; Support vector machines; Feature selection; ell_1 regularization; local learning; logistical regression; sample complexity.; Algorithms; Artificial Intelligence; Computer Simulation; Decision Support Techniques; Models, Theoretical; Pattern Recognition, Automated;
fLanguage
English
Journal_Title
Pattern Analysis and Machine Intelligence, IEEE Transactions on
Publisher
ieee
ISSN
0162-8828
Type
jour
DOI
10.1109/TPAMI.2009.190
Filename
5342431
Link To Document