Title of article :
Penalized Regression Versus Random Forest Model in Analyzing High Dimensional Proteomic Data: Diagnosis of IgA Nephropathy
Author/Authors :
Almasi, Afshin Department of Biostatistics and Epidemiology - School of Public Health - Kermanshah University of Medical Sciences, Kermanshah , Kalantari, Shiva Chronic Kidney Disease Research Center - Labbafinejad Hospital - Shahid Beheshti University of Medical Sciences, Tehran , Hashemian, Amirhossein Department of Biostatistics and Epidemiology - School of Public Health - Kermanshah University of Medical Sciences, Kermanshah , Mohammadi Majd, Tahereh Department of Biostatistics and Epidemiology - School of Public Health - Kermanshah University of Medical Sciences, Kermanshah
Abstract :
Background: Immunoglobulin A nephropathy (IgAN) is considered a chronic renal disease and the most prevalent glomerulonephritis throughout the world. In order to model a large number of extracted biomarkers and identify the most effective biomarkers on IgAN disease, the researchers implemented 2 methods of penalized regression, known as LASSO and MCP logistic regression versus random forest method, which are appropriate for high dimensional and low sample size problems.
Methods: Urinary protein profiles for both groups were composed of 493 proteins. Data were obtained in the case group (13 patients) using an experiment on urinary protein profile of patients with IgAN and in the control group (8 healthy individuals) using nanoscale liquid chromatography with tandem mass spectrometry. Mann Whitney test as univariate analysis, and LASSO, MCP and random forest as multivariate analysis were used to evaluate the simultaneous effect of biomarkers on IgAN in a high dimensional and low sample size setting. All the statistical analyses were performed in the R 3.3.2 software.
Results: Although Mann Whitney test showed that 144 out of 493 proteins were significantly different between the 2 groups, LASSO, MCP, and random forest showed only 7, 3, and 5 biomarkers as effective factors in IgAN diseases, respectively. The most effective biomarker was SULF2 (OR = 0.28) and ALBU (OR = 2.66) in LASSO, A1AT (OR = 73.7) in MCP, and GOLM1 and IBP7 in the random forest method.
Conclusions: Because all the 3 models were able to truly differentiate all the IgAN patients from the control groups, the researchers suggest the proposed model for high dimensional and low sample size datasets.
Keywords :
Diagnosis , IgA Nephropathy , LASSO , MCP , Random Forest , Biomarker
Journal title :
Astroparticle Physics