Title :
Pruning and growing hierachical mixtures of experts
Author :
Waterhouse, S.R. ; Robinson, A.J.
Author_Institution :
Cambridge Univ., UK
Abstract :
The `hierarchical mixture of experts´ (HME) is a tree-structured statistical model that is an alternative to multilayer perceptrons. Its training algorithm consists of a number of forward and backward passes through the tree. These are computationally expensive, especially when the trees are large. To reduce the computation, we may either allow the network to find its own structure in a constructive manner (tree growing) or consider only the most likely paths through the tree (path pruning). Pruning keeps the number of parameters constant but considers only the most likely paths through the tree at any time; this leads to significant speedups in training and evaluation. In the growing algorithm, we start with a small tree and apply a splitting criterion based on maximum likelihood to each terminal node. After splitting the best node according to this criterion, we retrain the tree for a set number of iterations, or until there is no further increase in likelihood, at which point the tree is grown again. This results in a flexible architecture which is both faster to train and more efficient in terms of its parameters. To aid the convergence of these algorithms, it is beneficial to introduce regularization into the HME, which stops the evolution of large weights which would otherwise cause branches of the tree to be pinched off. This also aids generalization, as we demonstrate on a toy regression problem. Results for the growing and pruning algorithms show significant speedups over conventional algorithms in discriminating between two interlocking spirals and classifying 8-bit parity patterns
Keywords :
convergence; cooperative systems; generalisation (artificial intelligence); hierarchical systems; iterative methods; learning (artificial intelligence); neural net architecture; pattern classification; statistics; tree searching; 8-bit parity pattern classification; backward passes; computational speedup; convergence; flexible architecture; forward passes; generalization; hierachical mixtures of experts; interlocking spirals; iterative retraining; maximum likelihood; most likely paths; path pruning; regression problem; regularization; splitting criterion; terminal nodes; training algorithm; tree growing; tree-structured statistical model;
Conference_Titel :
Artificial Neural Networks, 1995., Fourth International Conference on
Conference_Location :
Cambridge
Print_ISBN :
0-85296-641-5
DOI :
10.1049/cp:19950579