Abstract :
Machine learning algorithms depend heavily on the data representation, which dominates its success in experiment accuracy. Autoencoder model structure is proposed to learn from data a good representation with the least possible amount of distortion. Furthermore, it has been proven that boosting sparsity when learning representation can significantly improve performance on classification tasks and also make the feature vector easy to interpret. One straightforward approach for autoencoder to obtain sparse representation is to impose sparse penalty on its overall cost function. Nevertheless, few comparative analysis has been conducted to evaluate which sparse penalty term works better. In this paper, we adopt L1 norm, L2 norm, Student-t penalties, which are rarely deployed to penalise the hidden unit outputs, and commonly used penalty KL-divergence in the literature. Then, we present a detailed analysis to evaluate which penalty achieves better result in terms of reconstruction error, sparseness of representation and classification performance on test datasets. Experimental study on MNIST, CIFAR-10, SVHN, OPTDIGITS and NORB datasets reveals that all these penalties achieve sparse representation and outperforms representations learned by pure autoencoder on classification performance and sparseness of feature vectors. Moreover, we hope this topics and the practices would provide insights for future research.