DocumentCode
3326196
Title
Gradient descent fails to separate
Author
Brady, M. ; Raghavan, R. ; Slawny, J.
Author_Institution
Lockheed Res. & Dev., Palo Alto, CA, USA
fYear
1988
fDate
24-27 July 1988
Firstpage
649
Abstract
In the context of neural network procedures, it is proved that gradient descent on a surface defined by a sum of squared errors can fail to separate families of vectors. Each output is assumed to be a differentiable monotone transformation (typically the logistic) of a linear combination of inputs. Several examples are given of two families of vectors for which a linear combination exists that will serve to separate the two families. However, the minimum cost solution does not yield the desired combination. The examples include several cases where there are no local minima, as well as a one-layer system showing local minima with a large basin of attraction. In contrast to the perceptron convergence theorem, which proves that the perceptron architecture, there is no convergence theorem for gradient descent which would allow correct classification. The theorem disproves the presumption made in recent years, that barring local minima, gradient descent will find the best set of weights for a given problem.<>
Keywords
neural nets; optimisation; differentiable monotone transformation; gradient descent; neural network; optimisation; squared errors; Neural networks; Optimization methods;
fLanguage
English
Publisher
ieee
Conference_Titel
Neural Networks, 1988., IEEE International Conference on
Conference_Location
San Diego, CA, USA
Type
conf
DOI
10.1109/ICNN.1988.23902
Filename
23902
Link To Document