Title :
Towards stochastic conjugate gradient methods
Author :
Schraudolph, Nicol N. ; Graepel, Thore
Author_Institution :
Inst. of Computational Sci., Eidgenossische Tech. Hochschule, Zurich, Switzerland
Abstract :
The method of conjugate gradients provides a very effective way to optimize large, deterministic systems by gradient descent. In its standard form, however, it is not amenable to stochastic approximation of the gradient. We explore a number of ways to adopt ideas from conjugate gradient in the stochastic setting, using fast Hessian-vector products to obtain curvature information cheaply. In our benchmark experiments the resulting highly scalable algorithms converge about an order of magnitude faster than ordinary stochastic gradient descent.
Keywords :
Hessian matrices; conjugate gradient methods; optimisation; stochastic processes; curvature information; fast Hessian-vector products; gradient descent; highly scalable algorithms; large deterministic systems optimization; stochastic approximation; stochastic conjugate gradient methods; Convergence; Costs; Gradient methods; Iterative algorithms; Iterative methods; Least squares methods; Newton method; Optimization methods; Recursive estimation; Stochastic processes;
Conference_Titel :
Neural Information Processing, 2002. ICONIP '02. Proceedings of the 9th International Conference on
Print_ISBN :
981-04-7524-1
DOI :
10.1109/ICONIP.2002.1198180