مرکز منطقه ای اطلاع رساني علوم و فناوري - Sparsity-based representation for categorical data

DocumentCode :

683521

Title :

Sparsity-based representation for categorical data

Author :

Menon, Rajesh ; Nair, S.S. ; Srindhya, K. ; Kaimal, M.R.

Author_Institution :

Dept. of Comput. Sci. & Eng., Amrita Vishwa Vidyapeetham, Kollam, India

fYear :

2013

fDate :

19-21 Dec. 2013

Firstpage :

Lastpage :

Abstract :

Over the past few decades, many algorithms have been continuously evolving in the area of machine learning. This is an era of big data which is generated by different applications related to various fields like medicine, the World Wide Web, E-learning networking etc. So, we are still in need for more efficient algorithms which are computationally cost effective and thereby producing faster results. Sparse representation of data is one giant leap toward the search for a solution for big data analysis. The focus of our paper is on algorithms for sparsity-based representation of categorical data. For this, we adopt a concept from the image and signal processing domain called dictionary learning. We have successfully implemented its sparse coding stage which gives the sparse representation of data using Orthogonal Matching Pursuit (OMP) algorithms (both Batch and Cholesky based) and its dictionary update stage using the Singular Value Decomposition (SVD). We have also used a preprocessing stage where we represent the categorical dataset using a vector space model based on the TF-IDF weighting scheme. Our paper demonstrates how input data can be decomposed and approximated as a linear combination of minimum number of elementary columns of a dictionary which so formed will be a compact representation of data. Classification or clustering algorithms can now be easily performed based on the generated sparse coded coefficient matrix or based on the dictionary. We also give a comparison of the dictionary learning algorithm when applying different OMP algorithms. The algorithms are analysed and results are demonstrated by synthetic tests and on real data.

Keywords :

Big Data; data structures; encoding; learning (artificial intelligence); matrix algebra; pattern classification; pattern clustering; singular value decomposition; vectors; Big Data analysis; Cholesky based algorithm; OMP algorithms; SVD; TF-IDF weighting scheme; batch based algorithm; categorical dataset; classification algorithm; clustering algorithm; compact data representation; dictionary learning algorithm; dictionary update stage; generated sparse coded coefficient matrix; machine learning; orthogonal matching pursuit algorithms; singular value decomposition; sparse coding stage; sparse data representation; sparsity-based representation; vector space model; Algorithm design and analysis; Dictionaries; Encoding; Matching pursuit algorithms; Matrix decomposition; Sparse matrices; Vectors; OMP; SVD; cholesky decomposition; dictionary learning; sparse coding; sparse representation;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Intelligent Computational Systems (RAICS), 2013 IEEE Recent Advances in

Conference_Location :

Trivandrum

Print_ISBN :

978-1-4799-2177-5

Type :

conf

DOI :

10.1109/RAICS.2013.6745450

Filename :

6745450

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=683521