كليدواژه :
Confidence interval , Lasso , p , value , Post , selection inference , Significance testing.
چكيده فارسي :
In recent years, a great deal of interest has focused on conducting inference on the parameters in a linear model in the high-dimensional setting. We review the challenges and two main ideas behind the recently proposed methods, one that has focused on inference based on a sub-model selected by the lasso, and the other that has focused on inference using a debiased version of the lasso estimator. We then consider a simple and very naive two-step procedure for this task, in which we (i) fit a lasso model in order to obtain a subset of the variables; and (ii) fit a least squares model on the lasso-selected set. Conventional statistical wisdom tells us that we cannot make use of the standard statistical inference tools for the resulting least squares model (such as confidence intervals and p-values), since we peeked at the data twice: once in running the lasso, and again in fitting the least squares model. However, we show that under a certain set of assumptions, with high probability, the set of variables selected by the lasso is deterministic. Consequently, the naive two-step approach can yield confidence intervals that have asymptotically correct coverage, as well as p-values with proper Type-I error control. Furthermore, this two-step approach unifies two existing camps of work on high-dimensional inference.