مرکز منطقه ای اطلاع رساني علوم و فناوري - Feature selection with biased sample distributions

DocumentCode :

3079794

Title :

Feature selection with biased sample distributions

Author :

Kamal, A.H.M. ; Zhu, Xingquan ; Pandya, Abhijit ; Hsu, Sam

Author_Institution :

Dept. of Comput. Sci. & Eng., Florida Atlantic Univ., Boca Raton, FL, USA

fYear :

2009

fDate :

10-12 Aug. 2009

Firstpage :

Lastpage :

Abstract :

Feature selection concerns the problem of selecting a number of important features (w.r.t. the class labels) in order to build accurate prediction models. Traditional feature selection methods, however, fail to take the sample distributions into the consideration which may lead to poor predictions for minority class examples. Due to the sophistication and the cost involved in the data collection process, many applications, such as biomedical research, commonly face biased data collections with one class of examples (e.g., diseased samples) significantly less than other classes (e.g., normal samples). For these applications, the minority class examples, such as disease samples, credit card frauds, and network intrusions, are only a small portion of the data collections but deserve full attentions for accurate prediction. In this paper, we propose three filtering techniques, higher weight (HW), differential minority repeat (DMR) and balanced minority repeat (BMR), to identify important features from biased data collections. Experimental comparisons with the ReliefF method on five datasets demonstrate the effectiveness of the proposed methods in selecting informative features from data with biased sample distributions.

Keywords :

feature extraction; learning (artificial intelligence); pattern classification; sampling methods; statistical distributions; accurate prediction model; balanced minority repeat filtering technique; biased sample distribution; biomedical research; credit card fraud; differential minority repeat filtering technique; disease sample; feature selection; higher weight filtering technique; machine learning; minority class example; network intrusion; pattern classification; Australia; Cancer; Computer science; Costs; Credit cards; Data mining; Diseases; Filtering; Filters; Predictive models; Classification; biased sample distributions; feature selection; imbalanced data;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Information Reuse & Integration, 2009. IRI '09. IEEE International Conference on

Conference_Location :

Las Vegas, NV

Print_ISBN :

978-1-4244-4114-3

Electronic_ISBN :

978-1-4244-4116-7

Type :

conf

DOI :

10.1109/IRI.2009.5211613

Filename :

5211613

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3079794