DocumentCode
746902
Title
Dynamics of Learning in Multilayer Perceptrons Near Singularities
Author
Cousseau, Florent ; Ozeki, Tomoko ; Amari, Shun-Ichi
Author_Institution
Brain Sci. Inst., Amari Unit for Math. Neurosci., RIKEN, Wako
Volume
19
Issue
8
fYear
2008
Firstpage
1313
Lastpage
1328
Abstract
The dynamical behavior of learning is known to be very slow for the multilayer perceptron, being often trapped in the "plateau." It has been recently understood that this is due to the singularity in the parameter space of perceptrons, in which trajectories of learning are drawn. The space is Riemannian from the point of view of information geometry and contains singular regions where the Riemannian metric or the Fisher information matrix degenerates. This paper analyzes the dynamics of learning in a neighborhood of the singular regions when the true teacher machine lies at the singularity. We give explicit asymptotic analytical solutions (trajectories) both for the standard gradient (SGD) and natural gradient (NGD) methods. It is clearly shown, in the case of the SGD method, that the plateau phenomenon appears in a neighborhood of the critical regions, where the dynamical behavior is extremely slow. The analysis of the NGD method is much more difficult, because the inverse of the Fisher information matrix diverges. We conquer the difficulty by introducing the "blow-down" technique used in algebraic geometry. The NGD method works efficiently, and the state converges directly to the true parameters very quickly while it staggers in the case of the SGD method. The analytical results are compared with computer simulations, showing good agreement. The effects of singularities on learning are thus qualitatively clarified for both standard and NGD methods.
Keywords
gradient methods; learning (artificial intelligence); multilayer perceptrons; Fisher information matrix; NGD; Riemannian metric; SGD; blow-down technique; learning dynamics; multilayer perceptrons; natural gradient methods; standard gradient methods; Dynamics of learning; multilayer perceptrons; natural gradient (NGD) learning; singularity; standard gradient (SGD) learning; Algorithms; Artificial Intelligence; Computer Simulation; Models, Theoretical; Neural Networks (Computer); Pattern Recognition, Automated;
fLanguage
English
Journal_Title
Neural Networks, IEEE Transactions on
Publisher
ieee
ISSN
1045-9227
Type
jour
DOI
10.1109/TNN.2008.2000391
Filename
4539808
Link To Document