Gradient descent fails to separate

Author

Brady, M. ; Raghavan, R. ; Slawny, J.

Author_Institution

Lockheed Res. & Dev., Palo Alto, CA, USA

fYear

1988

fDate

24-27 July 1988

Firstpage

649

Abstract

In the context of neural network procedures, it is proved that gradient descent on a surface defined by a sum of squared errors can fail to separate families of vectors. Each output is assumed to be a differentiable monotone transformation (typically the logistic) of a linear combination of inputs. Several examples are given of two families of vectors for which a linear combination exists that will serve to separate the two families. However, the minimum cost solution does not yield the desired combination. The examples include several cases where there are no local minima, as well as a one-layer system showing local minima with a large basin of attraction. In contrast to the perceptron convergence theorem, which proves that the perceptron architecture, there is no convergence theorem for gradient descent which would allow correct classification. The theorem disproves the presumption made in recent years, that barring local minima, gradient descent will find the best set of weights for a given problem.<>

Keywords

neural nets; optimisation; differentiable monotone transformation; gradient descent; neural network; optimisation; squared errors; Neural networks; Optimization methods;

fLanguage

English

Publisher

ieee

Conference_Titel

Neural Networks, 1988., IEEE International Conference on

Conference_Location

San Diego, CA, USA

Type

conf

DOI

10.1109/ICNN.1988.23902

Filename

23902