Title :
Derivation of minimum best sample size from microarray data sets: A Monte Carlo approach
Author :
Bi, Chengpeng ; Becker, Mara ; Leeder, Steve
Author_Institution :
Div. of Clinical Pharmacology, Univ. of Missouri, Kansas City, MO, USA
Abstract :
NCBI has been accumulating a large repository of microarray data sets, namely Gene Expression Omnibus (GEO). GEO is a great resource enabling one to pursue various biological and pathological questions. The question we ask here is: given a set of gene signatures and a classifier, what is the best minimum sample size in a clinical microarray research that can effectively distinguish different types of patient responses to a therapeutic drug. It is difficult to answer the question since the sample size for most microarray experiments stored in GEO is very limited. This paper presents a Monte Carlo approach to simulating the best minimum microarray sample size based on the available data sets. Support Vector Machine (SVM) is used as a classifier to compute prediction accuracy for different sample size. Then, a logistic function is applied to fit the relationship between sample size and accuracy whereby a theoretic minimum sample size can be derived.
Keywords :
Monte Carlo methods; biology computing; genetics; pattern classification; support vector machines; GEO; Monte Carlo approach; NCBI; gene expression omnibus; logistic function; microarray data sets; minimum best sample size; support vector machine; therapeutic drug; Accuracy; Logistics; Mathematical model; Monte Carlo methods; Support vector machines; Testing; Training;
Conference_Titel :
Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), 2011 IEEE Symposium on
Conference_Location :
Paris
Print_ISBN :
978-1-4244-9896-3
DOI :
10.1109/CIBCB.2011.5948461