Statistical bias and variance of gene selection and cross validation methods: A case study on hypertension prediction

Author

Gormez, Zeliha ; Kursun, Olcay ; Sertbas, Ahmet ; Aydin, Nizamettin ; Seker, Huseyin

Author_Institution

Comput. Eng. Dept., Univ. of Istanbul, Istanbul, Turkey

fYear

2012

fDate

5-7 Jan. 2012

Firstpage

616

Lastpage

619

Abstract

In exploratory association studies of genes with certain diseases, a single or a small number of genes (features) related with the diseases are selected¹ among many thousands investigated. We investigate the statistical bias and variance of simple yet common (correlation and mutual information based) feature selection algorithms using well-known cross-validation methods (leave-one-out and k-fold) on a gene finding study for hypertension prediction. Our findings show that selected genes are different for different methods and different cross-validation runs for both single gene selection and gene subset selection.

Keywords

learning (artificial intelligence); medical computing; statistical analysis; cross validation methods; feature selection algorithms; gene subset selection; hypertension prediction; single gene selection; statistical bias; statistical variance; Prediction algorithms;

fLanguage

English

Publisher

ieee

Conference_Titel

Biomedical and Health Informatics (BHI), 2012 IEEE-EMBS International Conference on

Conference_Location

Hong Kong

Print_ISBN

978-1-4577-2176-2

Electronic_ISBN

978-1-4577-2175-5

Type

conf

DOI

10.1109/BHI.2012.6211658

Filename

6211658