DocumentCode :
2442879
Title :
Estimating the statistical significance of classifiers by varying the number of genes
Author :
Maglietta, Rosalia ; Piepoli, A. ; D´Addabbo, A. ; Cotugno, R. ; Pesole, Graziano ; Liuni, S. ; Savino, M. ; Carella, M. ; Perri, F. ; Ancona, N.
Author_Institution :
ISSIA- CNR, Bari
fYear :
2006
fDate :
28-30 May 2006
Firstpage :
109
Lastpage :
110
Abstract :
We present a statistically well founded method to construct cancer predictors using gene expression profiles. This methodology is applied to a new microarray data set extracted from 25 patients affected by colon cancer. In particular, we answer to precise questions: how many gene expression levels are correlated with the pathology and how many are sufficient for an accurate classification? The proposed method provides answer to these questions avoiding the potential pitfalls hidden in the analysis of microarray data. We have evaluated the generalization error, estimated through the Leave-K-Out Cross Validation error, of two different classification schemes by varying the number of selected genes. We found that, Regularized Least Squares (RLS) and Support Vector Machines (SVM) classifiers, using the whole gene set, have error rates of e = 14% (p = 0.023) and e = 11% (p = 0.016) respectively. Concerning the number of genes, the performances of RLS and SVM classifiers do not change when the 74% of genes is used. The statistical significance was measured by using permutation test.
Keywords :
DNA; biology computing; cancer; least squares approximations; pattern classification; statistical analysis; support vector machines; DNA microarray data; RLS; SVM; cancer predictors; colon cancer; gene expression profiles; leave-k-out cross validation error; regularized least squares classifier; statistical significance; support vector machine classifiers; Cancer; Colon; Data analysis; Data mining; Gene expression; Least squares methods; Pathology; Resonance light scattering; Support vector machine classification; Support vector machines;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Genomic Signal Processing and Statistics, 2006. GENSIPS '06. IEEE International Workshop on
Conference_Location :
College Station, TX
Print_ISBN :
1-4244-0384-7
Electronic_ISBN :
1-4244-0385-5
Type :
conf
DOI :
10.1109/GENSIPS.2006.353180
Filename :
4161801
Link To Document :
بازگشت