DocumentCode :
21560
Title :
Learning the Information Divergence
Author :
Dikmen, Onur ; Zhirong Yang ; Oja, Erkki
Author_Institution :
Dept. of Inf. & Comput. Sci., Aalto Univ., Espoo, Finland
Volume :
37
Issue :
7
fYear :
2015
fDate :
July 1 2015
Firstpage :
1442
Lastpage :
1454
Abstract :
Information divergence that measures the difference between two nonnegative matrices or tensors has found its use in a variety of machine learning problems. Examples are Nonnegative Matrix/Tensor Factorization, Stochastic Neighbor Embedding, topic models, and Bayesian network optimization. The success of such a learning task depends heavily on a suitable divergence. A large variety of divergences have been suggested and analyzed, but very few results are available for an objective choice of the optimal divergence for a given task. Here we present a framework that facilitates automatic selection of the best divergence among a given family, based on standard maximum likelihood estimation. We first propose an approximated Tweedie distribution for the β-divergence family. Selecting the best β then becomes a machine learning problem solved by maximum likelihood. Next, we reformulate α-divergence in terms of β-divergence, which enables automatic selection of α by maximum likelihood with reuse of the learning principle for β-divergence. Furthermore, we show the connections between γ- and β-divergences as well as Renyi- and α-divergences, such that our automatic selection framework is extended to non-separable divergences. Experiments on both synthetic and real-world data demonstrate that our method can quite accurately select information divergence across different learning problems and various divergence families.
Keywords :
information theory; learning (artificial intelligence); maximum likelihood estimation; α-divergence; β-divergence family; γ-divergence; Renyi-divergence; approximated Tweedie distribution; automatic selection framework; information divergence; learning principle reuse; machine learning problem; nonseparable divergences; standard maximum likelihood estimation; Approximation methods; Brain modeling; Maximum likelihood estimation; Medals; Standards; Stochastic processes; Tensile stress; Information divergence; Tweedie distribution; information divergence; maximum likelihood; nonnegative matrix factorization; stochastic neighbor embedding; tweedie distribution;
fLanguage :
English
Journal_Title :
Pattern Analysis and Machine Intelligence, IEEE Transactions on
Publisher :
ieee
ISSN :
0162-8828
Type :
jour
DOI :
10.1109/TPAMI.2014.2366144
Filename :
6942194
Link To Document :
بازگشت