DocumentCode :
402901
Title :
The influence of the number of clusters on randomly expanded data sets
Author :
Van Zyl, Jacobus ; Cloete, Ian
Author_Institution :
Sch. of Inf. Technol., Germany Int. Univ., Bruchsal, Germany
Volume :
1
fYear :
2003
fDate :
2-5 Nov. 2003
Firstpage :
355
Abstract :
Neural networks have been shown capable of learning arbitrary input-output mappings. However, like most machines learning algorithms, neural networks are adversely affected by sparse training sets, especially with respect to generalization performance. Several approaches to improve generalization performance when only sparse training data are available have been suggested. These include adding noise to training data or to weight updates. One method by Karystinos and Pados first clusters the training data and then generates new training data using a probability density function estimated from the clusters. This paper investigates this method further, especially with respect to the sensitivity of the method to the clustering procedure. We investigate the sensitivity to the number of clusters used by the clustering method, the sensitivity to the clustering method (K-means) itself, and also the use of the minimum differential entropy as an indicator for good cluster choice.
Keywords :
learning (artificial intelligence); minimum entropy methods; neural nets; set theory; statistical analysis; clustering procedure; machines learning algorithms; minimum differential entropy; neural networks; probability density function; randomly expanded data sets; sparse training data; sparse training sets; Clustering algorithms; Clustering methods; Covariance matrix; Entropy; Information technology; Jacobian matrices; Machine learning algorithms; Neural networks; Probability density function; Training data;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Machine Learning and Cybernetics, 2003 International Conference on
Print_ISBN :
0-7803-8131-9
Type :
conf
DOI :
10.1109/ICMLC.2003.1264501
Filename :
1264501
Link To Document :
بازگشت