مرکز منطقه ای اطلاع رساني علوم و فناوري - Bayesian feature selection for classification with possibly large number of classes

Title of article :

Bayesian feature selection for classification with possibly large number of classes

Author/Authors :

Davis، نويسنده , , Justin and Pensky، نويسنده , , Marianna and Crampton، نويسنده , , William، نويسنده ,

Issue Information :

روزنامه با شماره پیاپی سال 2011

Pages :

From page :

3256

To page :

3266

Abstract :

In what follows, we introduce two Bayesian models for feature selection in high-dimensional data, specifically designed for the purpose of classification. We use two approaches to the problem: one which discards the components which have “almost constant” values (Model 1) and another which retains the components for which variations in-between the groups are larger than those within the groups (Model 2). We assume that p ⪢ n , i.e. the number of components p is much larger than the number of samples n, and that only few of those p components are useful for subsequent classification. We show that particular cases of the above two models recover familiar variance or ANOVA-based component selection. When one has only two classes and features are a priori independent, Model 2 reduces to the Feature Annealed Independence Rule (FAIR) introduced by Fan and Fan (2008) and can be viewed as a natural generalization of FAIR to the case of L > 2 classes. The performance of the methodology is studies via simulations and using a biological dataset of animal communication signals comprising 43 groups of electric signals recorded from tropical South American electric knife fishes.

Keywords :

Classification , High-dimensional data , ANOVA , Bayesian feature selection

Journal title :

Journal of Statistical Planning and Inference

Serial Year :

2011

Journal title :

Journal of Statistical Planning and Inference

Record number :

2221578

Link To Document :

https://search.isc.ac/dl/search/defaultta.aspx?DTC=10&DC=2221578