مرکز منطقه ای اطلاع رساني علوم و فناوري - A probabilistic approach which provides a modular and adaptive neural network architecture for discrimination

Abstract :

Concerns the supervised discrimination of a vector x(∈R^P) between K classes (C_i;i=1. . .K). The discrimination consists in learning a discriminant function from a training set of N examples. In a Bayesian context, the discriminant function is a probability function which is the probability of having the class C _i knowing the pattern to classify is x, denoted P(C_i/x) (or equivalently P(C_i,x)). It is well-known that multilayer perceptrons (MLP) with a single hidden layer are universal classifiers in the sense that they can approximate decision surfaces of arbitrary complexity, provided the number of hidden neurons is large enough. Sometimes it is possible to decompose the classification problem, which requires a big network, into subproblems which are efficiently solved by simple modules (with a few or no hidden neurons). To each subproblem corresponds a cluster within the data set on which a module acts like an expert. If back-propagation is used to train a single MLP to solve the global discrimination, and thus to perform these different subproblems, there will generally be strong interference effects which could lead to slow learning and poor generalization; so for these many reasons the modular approach seems to be preferable. A number of authors have suggested to use a system composed of several different `experts´: one `expert´ for each subproblem. The author gives theoretical justification for this approach by constructing the global discriminant functions P(C_i/x) from outputs of the `experts´ which perform local discriminations within the previous clusters. In a Bayesian context, this means that one is able to construct the global discriminant functions P(C_i/x) by means of the discriminant functions for each subproblem. Two main hypothesis are posed; the experts have probabilities as outputs and information about clusters is available. The author looks for appropriate output nonlinearities and for an appropriate criterion for the update of parameters of the neural networks. Two approaches are studied: with or without cooperation between modules during the learning