• DocumentCode
    3326196
  • Title

    Gradient descent fails to separate

  • Author

    Brady, M. ; Raghavan, R. ; Slawny, J.

  • Author_Institution
    Lockheed Res. & Dev., Palo Alto, CA, USA
  • fYear
    1988
  • fDate
    24-27 July 1988
  • Firstpage
    649
  • Abstract
    In the context of neural network procedures, it is proved that gradient descent on a surface defined by a sum of squared errors can fail to separate families of vectors. Each output is assumed to be a differentiable monotone transformation (typically the logistic) of a linear combination of inputs. Several examples are given of two families of vectors for which a linear combination exists that will serve to separate the two families. However, the minimum cost solution does not yield the desired combination. The examples include several cases where there are no local minima, as well as a one-layer system showing local minima with a large basin of attraction. In contrast to the perceptron convergence theorem, which proves that the perceptron architecture, there is no convergence theorem for gradient descent which would allow correct classification. The theorem disproves the presumption made in recent years, that barring local minima, gradient descent will find the best set of weights for a given problem.<>
  • Keywords
    neural nets; optimisation; differentiable monotone transformation; gradient descent; neural network; optimisation; squared errors; Neural networks; Optimization methods;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Neural Networks, 1988., IEEE International Conference on
  • Conference_Location
    San Diego, CA, USA
  • Type

    conf

  • DOI
    10.1109/ICNN.1988.23902
  • Filename
    23902