• DocumentCode
    1622683
  • Title

    Pruning and growing hierachical mixtures of experts

  • Author

    Waterhouse, S.R. ; Robinson, A.J.

  • Author_Institution
    Cambridge Univ., UK
  • fYear
    1995
  • Firstpage
    341
  • Lastpage
    346
  • Abstract
    The `hierarchical mixture of experts´ (HME) is a tree-structured statistical model that is an alternative to multilayer perceptrons. Its training algorithm consists of a number of forward and backward passes through the tree. These are computationally expensive, especially when the trees are large. To reduce the computation, we may either allow the network to find its own structure in a constructive manner (tree growing) or consider only the most likely paths through the tree (path pruning). Pruning keeps the number of parameters constant but considers only the most likely paths through the tree at any time; this leads to significant speedups in training and evaluation. In the growing algorithm, we start with a small tree and apply a splitting criterion based on maximum likelihood to each terminal node. After splitting the best node according to this criterion, we retrain the tree for a set number of iterations, or until there is no further increase in likelihood, at which point the tree is grown again. This results in a flexible architecture which is both faster to train and more efficient in terms of its parameters. To aid the convergence of these algorithms, it is beneficial to introduce regularization into the HME, which stops the evolution of large weights which would otherwise cause branches of the tree to be pinched off. This also aids generalization, as we demonstrate on a toy regression problem. Results for the growing and pruning algorithms show significant speedups over conventional algorithms in discriminating between two interlocking spirals and classifying 8-bit parity patterns
  • Keywords
    convergence; cooperative systems; generalisation (artificial intelligence); hierarchical systems; iterative methods; learning (artificial intelligence); neural net architecture; pattern classification; statistics; tree searching; 8-bit parity pattern classification; backward passes; computational speedup; convergence; flexible architecture; forward passes; generalization; hierachical mixtures of experts; interlocking spirals; iterative retraining; maximum likelihood; most likely paths; path pruning; regression problem; regularization; splitting criterion; terminal nodes; training algorithm; tree growing; tree-structured statistical model;
  • fLanguage
    English
  • Publisher
    iet
  • Conference_Titel
    Artificial Neural Networks, 1995., Fourth International Conference on
  • Conference_Location
    Cambridge
  • Print_ISBN
    0-85296-641-5
  • Type

    conf

  • DOI
    10.1049/cp:19950579
  • Filename
    497842