• DocumentCode
    259569
  • Title

    A Cyclic Contrastive Divergence Learning Algorithm for High-Order RBMs

  • Author

    Dingsheng Luo ; Yi Wang ; Xiaoqiang Han ; Xihong Wu

  • Author_Institution
    Key Lab. of Machine Perception, Peking Univ., Beijing, China
  • fYear
    2014
  • fDate
    3-6 Dec. 2014
  • Firstpage
    80
  • Lastpage
    86
  • Abstract
    The Restricted Boltzmann Machine (RBM), a special case of general Boltzmann Machines and a typical Probabilistic Graphical Models, has attracted much attention in recent years due to its powerful ability in extracting features and representing the distribution underlying the training data. A most commonly used algorithm in learning RBMs is called Contrastive Divergence (CD) proposed by Hinton, which starts a Markov chain at a data point and runs the chain for only a few iterations to get a low variance estimator. However, when referring to a high-order RBM, since there are interactions among its visible layers, the gradient approximation via CD learning usually becomes far from the log-likelihood gradient and even may cause CD learning to fall into an infinite loop with high reconstruction error. In this paper, a new algorithm named Cyclic Contrastive Divergence (CCD) is introduced for learning high-order RBMs. Unlike the standard CD algorithm, CCD updates the parameters according to each visible layer in turn, by borrowing the idea of Cyclic Block Coordinate Descent method. To evaluate the performance of the proposed CCD algorithm, regarding to high-order RBMs learning, both algorithms CCD and standard CD are theoretically analyzed, including convergence, estimate upper bound and both biases comparison, from which the superiority of CCD learning is revealed. Experiments on MNIST dataset for the handwritten digit classification task are performed. The experimental results show that CCD is more applicable and consistently outperforms the standard CD in both convergent speed and performance.
  • Keywords
    Boltzmann machines; Markov processes; approximation theory; feature extraction; gradient methods; handwritten character recognition; learning (artificial intelligence); pattern classification; statistical distributions; CCD learning; CD algorithm; CD learning; MNIST dataset; Markov chain; cyclic block coordinate descent method; cyclic contrastive divergence learning algorithm; data point; feature extraction; gradient approximation; handwritten digit classification task; high-order RBM; log-likelihood gradient; probabilistic graphical model; reconstruction error; restricted Boltzmann machine; training data; variance estimator; Approximation algorithms; Approximation methods; Charge coupled devices; Hidden Markov models; Standards; Topology; Training; Convergence; Cyclic Contrastive Divergence Learning; Gradient Approximation; High-order RBMs; Upper Bound;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Applications (ICMLA), 2014 13th International Conference on
  • Conference_Location
    Detroit, MI
  • Type

    conf

  • DOI
    10.1109/ICMLA.2014.18
  • Filename
    7033095