Title :
Information geometry of adaptive systems
Author :
Amari, Shun-Ichi ; Ozeki, Tomoko ; Park, Hyeyoung
Author_Institution :
RIKEN, Inst. of Phys. & Chem. Res., Saitama, Japan
Abstract :
An adaptive system works in a stochastic environment so that its behavior is represented by a probability distribution, e.g., a conditional probability density of the output conditioned on the input. Information geometry is a powerful tool to study the intrinsic geometry of parameter spaces related to probability distributions. The article investigates the local Riemannian metric and topological singular structures of parameter spaces of hierarchical systems such as multilayer perceptrons. The natural gradient learning method is introduced to the system, which has an idealistic dynamical behavior of learning, which is free of plateau phenomena of learning. We explain the reason from the topological structures of singularities existing in hierarchical systems. We mostly use multilayer perceptrons as examples, but the geometrical structure is common to many hierarchical systems such as Gaussian mixtures of density functions and ARMA models of time series. The singularities are ubiquitous in a hierarchical system. The Fisher information metric degenerates and estimators of parameters are not subject to a Gaussian at singularities. This implies that the Cramer-Rao paradigm does not hold. Model selection is an important subject in hierarchical systems. However, the Cramer-Rao paradigm is used to derive model selection criteria such as AIC and MDL. This study requests further modification of these criteria. This study is a first step to analyze the singular structures of the parameter space and its relation to dynamical behavior of learning
Keywords :
Gaussian processes; adaptive systems; autoregressive moving average processes; hierarchical systems; information theory; learning systems; multilayer perceptrons; parameter space methods; probability; time series; AIC; ARMA models; Cramer-Rao paradigm; Fisher information metric; Gaussian mixtures; MDL; adaptive systems; conditional probability density; density functions; dynamical learning behavior; hierarchical systems; information geometry; local Riemannian metric; model selection; multilayer perceptrons; natural gradient learning method; parameter spaces; probability distribution; stochastic environment; time series; topological singular structures; topological structures; Adaptive systems; Density functional theory; Extraterrestrial measurements; Hierarchical systems; Information geometry; Learning systems; Multilayer perceptrons; Probability distribution; Solid modeling; Stochastic systems;
Conference_Titel :
Adaptive Systems for Signal Processing, Communications, and Control Symposium 2000. AS-SPCC. The IEEE 2000
Conference_Location :
Lake Louise, Alta.
Print_ISBN :
0-7803-5800-7
DOI :
10.1109/ASSPCC.2000.882438