Title of article :
Identifying Gene Signature in RNA Sequencing Multiple Sclerosis Data
Author/Authors :
Kenarangi ، Taiebe Department of Biostatistics and Epidemiology - University of Social Welfare and Rehabilitation Sciences , Bakhshi ، Enayatolah Department of Biostatistics and Epidemiology - University of Social Welfare and Rehabilitation Sciences , Inanloo Rahatloo ، Kolsoum Department of Cell and Molecular Biology - School of Biology, College of Science - University of Tehran , Biglarian ، Akbar Department of Biostatistics and Epidemiology - Social Determinants of Health Research Center - University of Social Welfare and Rehabilitation Sciences
Abstract :
Objectives: Multiple Sclerosis (MS) is a complex central nervous system disease; it is the result of a combination of genetic predispositions and a nongenetic trigger. This study aims to find the gene signatures using a Pareto optimization algorithm for MS RNA sequencing (RNA-seq) data. Methods: This case-control study involved 50 samples (25 MS patients and 25 age-matched healthy individuals) and their GSE profiles (GSE123496) were selected from the National Center for Biotechnology Information Gene Expression Omnibus database. We used Pareto-optimal cluster size identification to find the gene signatures in the RNA-seq data. After prefiltering and normalizing the data, we used the Limma package to find the differentially expressed genes (DEGs). The Pareto-optimal cluster size for these DEGs was then determined using the technique, multi-objective optimization for collecting the clusters alternatives. Afterward, the RNA-seq data were clustered via k-means with suitable cluster size. The best cluster, as a signature, was found by calculating the mean of the Spearman correlation coefficients (SCCs) of whole genes in the module in a pairwise manner. All analysis was performed in the R software, 4.1.1 package, under virtual space with 100 GB RAM. Results: In total, 960 DEGs were identified by the Limma analysis. Among them, 720 were up-regulated genes and 240 were down-regulated genes. Meanwhile, 6 Pareto-optimal clusters were obtained. Two clusters that had the greatest average SCCs score (0.88 and 0.74, respectively) were chosen as the gene signatures. Discussion: A total of 9 metabolic prognostic genes and 3 biological pathways were identified. These can provide more potent prognostic information for MS patients.
Keywords :
Multiple sclerosis , Gene signature , K , means , Pareto optimal clustering , RNA , seq
Journal title :
Iranian Rehabilitation Journal (IRJ)
Journal title :
Iranian Rehabilitation Journal (IRJ)