Author_Institution :
Dept. of Comput. Sci., Chinese Univ. of Hong Kong, Shatin, Hong Kong
Abstract :
In this paper, we extend Bayesian-Kullback Ying-Yang (BKYY) learning into a much broader Bayesian Ying-Yang (BYY) learning system via different separation functionals instead of using only Kullback divergence, and elaborate the power of BYY learning as a general learning theory for parameter learning, scale selection, structure evaluation, regularization and sampling design. Improved criteria are proposed for selecting number of densities on finite mixture and Gaussian mixtures, for selecting number of clusters in MSE clustering, for selecting subspace dimension in PCA related methods, for selecting number of expert nets in mixture of experts and its alternative model and for selecting number of basis functions in RBF nets. Three categories of non-Kullback separation functionals namely convex divergence, Lp divergence and decorrelation index, are suggested for BYY learning as alternatives for those learning models based on Kullback divergence, with some properties discussed. As examples, the EM algorithms for finite mixture, mixture of experts and its alternative model are derived with convex divergence
Keywords :
Bayes methods; learning systems; maximum entropy methods; neural nets; Bayesian Ying-Yang learning system; EM algorithms; Gaussian mixtures; Kullback divergence; RBF neural nets; convex divergence; decorrelation index; finite mixture; maximum output entropy; parameter learning; sampling design; scale selection; separation functionals; structure evaluation; Bayesian methods; Computer science; Decorrelation; Density functional theory; Information processing; Kernel; Learning systems; Principal component analysis; Sampling methods; Training data;