DocumentCode
2888122
Title
Early stopping for non-parametric regression: An optimal data-dependent stopping rule
Author
Raskutti, Garvesh ; Wainwright, Martin J. ; Yu, Bin
Author_Institution
Dept. of Stat., UC Berkeley, Berkeley, CA, USA
fYear
2011
fDate
28-30 Sept. 2011
Firstpage
1318
Lastpage
1325
Abstract
The goal of non-parametric regression is to estimate an unknown function f* based on n i.i.d. observations of the form yi = f* (xi) + wi, where {wi}in=1 are additive noise variables. Simply choosing a function to minimize the least-squares loss 1/2n Σi=1n (yi - f(xi))2 will lead to "overfitting", so that various estimators are based on different types of regularization. The early stopping strategy is to run an iterative algorithm such as gradient descent for a fixed but finite number of iterations. Early stopping is known to yield estimates with better prediction accuracy than those obtained by running the algorithm for an infinite number of iterations. Although bounds on this prediction error are known for certain function classes and step size choices, the bias-variance tradeoffs for arbitrary reproducing kernel Hilbert spaces (RKHSs) and arbitrary choices of step-sizes have not been well-understood to date. In this paper, we derive upper bounds on both the L2(Pn) and L2(P) error for arbitrary RKHSs, and provide an explicit and easily computable data-dependent stopping rule. In particular, it depends only on the sum of step-sizes and the eigenvalues of the empirical kernel matrix for the RKHS. For Sobolev spaces and finite-rank kernel classes, we show that our stopping rule yields estimates that achieve the statistically optimal rates in a minimax sense.
Keywords
Hilbert spaces; eigenvalues and eigenfunctions; gradient methods; least squares approximations; matrix algebra; regression analysis; Sobolev spaces; additive noise variables; early stopping strategy; empirical kernel matrix eigenvalues; finite-rank kernel classes; function estimation; gradient descent; iterative algorithm; least-squares loss minimization; nonparametric regression; optimal data-dependent stopping rule; reproducing kernel Hilbert spaces; step-sizes; Complexity theory; Eigenvalues and eigenfunctions; Hafnium; Hilbert space; Iterative methods; Kernel; Upper bound;
fLanguage
English
Publisher
ieee
Conference_Titel
Communication, Control, and Computing (Allerton), 2011 49th Annual Allerton Conference on
Conference_Location
Monticello, IL
Print_ISBN
978-1-4577-1817-5
Type
conf
DOI
10.1109/Allerton.2011.6120320
Filename
6120320
Link To Document