مرکز منطقه ای اطلاع رساني علوم و فناوري - Efficient Feature Selection in the Presence of Multiple Feature Classes

DocumentCode :

2710322

Title :

Efficient Feature Selection in the Presence of Multiple Feature Classes

Author :

Dhillon, Paramveer S. ; Foster, Dean ; Ungar, Lyle H.

fYear :

2008

fDate :

15-19 Dec. 2008

Firstpage :

779

Lastpage :

784

Abstract :

We present an information theoretic approach to feature selection when the data possesses feature classes. Feature classes are pervasive in real data. For example, in gene expression data, the genes which serve as features may be divided into classes based on their membership in gene families or pathways. When doing word sense disambiguation or named entity extraction, features fall into classes including adjacent words, their parts of speech, and the topic and venue of the document the word is in. When predictive features occur predominantly in a small number of feature classes, our information theoretic approach significantly improves feature selection. Experiments on real and synthetic data demonstrate substantial improvement in predictive accuracy over the standard L₀ penalty-based stepwise and stream wise feature selection methods as well as over Lasso and Elastic Nets, all of which are oblivious to the existence of feature classes.

Keywords :

feature extraction; pattern classification; feature selection; features extraction; gene expression data; information theoretic approach; multiple feature classes; word sense disambiguation; Accuracy; Biological information theory; Computational Intelligence Society; Data mining; Feature extraction; Gene expression; Principal component analysis; Proteins; Speech; Statistics; Feature Selection; Minimum Description Length Coding;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Data Mining, 2008. ICDM '08. Eighth IEEE International Conference on

Conference_Location :

Pisa

ISSN :

1550-4786

Print_ISBN :

978-0-7695-3502-9

Type :

conf

DOI :

10.1109/ICDM.2008.56

Filename :

4781178

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2710322