Author_Institution :
Dept. of Stat., Seoul Nat. Univ., Seoul, South Korea
Abstract :
A popular goal of microarray analysis is identification of differentially expressed genes (DEGs) between groups, which usually involves two-group comparisons. Many statistical methods have been developed toward this end, such as the t-test and the permutation test. In some cases, more than two groups of interest may be compared, for example, in identification of DEGs across three or four different stages of cancer or across different stages of the cell cycle. Several statistical approaches are also available for such multi-group analyses, including analysis of variance (ANOVA) models. We hypothesized that statistical methods developed for identifying DEGs for ordered groups would provide higher power for such ordered information. Although there are some methods available for ordered group comparisons, they have been rarely applied to the analysis of microarray data. In this paper, we consider various statistical tests for identifying DEGs in comparisons involving more than two groups with ordered information (i.e., cancer stage and cell cycle data). We first consider a constraint ANOVA (CANOVA) model by extending an ANOVA model without using order information, and then employ a proportional odds (PO) model by extending a general logit model. Finally, a simple correlation-based approach is considered. Through extensive simulation studies, we evaluated the performance of the CANOVA, PO, and correlation approaches by comparing the sizes and powers of these methods. The CANOVA, PO, and correlation approaches were applied to real microarray data of The Cancer Genome Atlas (TCGA). We specifically focused on the acute myeloid leukemia (AML) mRNA microarray data set and considered the results of cytogenetic analyses as group information of AML. To identify the genes related to these risk categories, we selected 25 good samples, 25 intermediate samples, and 25 poor samples in the TCGA data set.
Keywords :
RNA; bioinformatics; cancer; cellular biophysics; genetics; genomics; lab-on-a-chip; molecular biophysics; statistical analysis; AML mRNA microarray data set; TCGA; The Cancer Genome Atlas; acute myeloid leukemia; analysis of variance models; cell cycle; constraint ANOVA models; cytogenetic analysis; differentially expressed gene identification; general logit model; multigroup analyses; ordinal phenotypes; permutation test; proportional odds model; statistical methods; t-test; Analysis of variance; Analytical models; Bioinformatics; Cancer; Correlation; Data analysis; ANOVA; acute myeloid leukemia (AML); baseline category model; constrained ANOVA; correlation; differentially expressed genes (DEG); microarray; ordinal restriction; proportional odds;