Online marginalized linear stacked denoising autoencoders for learning from big data stream

Author

Arif Budiman;Mohamad Ivan Fanany;Chan Basaruddin

Author_Institution

Faculty of Computer Science, University of Indonesia, Depok, West Java Indonesia

fYear

2015

Firstpage

227

Lastpage

235

Abstract

Big non-stationary data, which comes in gradual fashion or stream, is one important issue in the application of big data to train deep learning machines. In this paper, we focused on a unique variant of traditional autoencoder, which is called Marginalized Linear Stacked Denoising Autoencoder (MLSDA). MLSDA uses a simple linear model. It is faster and uses less number of parameters than the traditional SDA. It also takes advantages of convex optimization. It has better improvement in the bag of words feature representation. However, the traditional SDA with stochastic gradient descent has been more widely accepted in many applications. The stochastic gradient descent is naturally an online learning. It makes the traditional SDA more scalable for streaming big data. This paper proposes a simple modification of MLSDA. Our modification uses matrix multiplication concept for online learning. The experiment result showed the similar accuracy level compared with a batch version of MLSDA and using lower computation resources. The online MLSDA will improve the scalability of MLSDA for handling streaming big data that representing bag of words features for natural language processing, information retrieval, and computer vision.

Keywords

"Natural language processing","Graphics processing units","Support vector machines"

Publisher

ieee

Conference_Titel

Advanced Computer Science and Information Systems (ICACSIS), 2015 International Conference on

Type

conf

DOI

10.1109/ICACSIS.2015.7415181

Filename

7415181