DocumentCode :
702593
Title :
Online sketching of big categorical data with absent features
Author :
Yanning Shen ; Mardani, Morteza ; Giannakis, Georgios B.
Author_Institution :
ECE Dept., Univ. of Minnesota, Minneapolis, MN, USA
fYear :
2015
fDate :
18-20 March 2015
Firstpage :
1
Lastpage :
6
Abstract :
With the scale of data growing every day, reducing the dimensionality (a.k.a. sketching) of high-dimensional vectors has emerged as a task of increasing importance. Relevant issues to address in this context include the sheer volume of data vectors that may consist of categorical (meaning finite-alphabet) features, the typically streaming format of data acquisition, and the possibly absent features. To cope with these challenges, the present paper brings forth a novel rank-regularized maximum likelihood approach that models categorical data as quantized values of analog-amplitude features with low intrinsic dimensionality. This model along with recent online rank regularization advances are leveraged to sketch high-dimensional categorical data `on the fly.´ Simulated tests with synthetic as well as real-world datasets corroborate the merits of the novel scheme relative to state-of-the-art alternatives.
Keywords :
Big Data; maximum likelihood estimation; pattern classification; analog-amplitude features; big categorical data; intrinsic dimensionality; online data sketching; online rank regularization; rank-regularized maximum likelihood approach; Accuracy; Convergence; Interpolation; Minimization; Motion pictures; Principal component analysis; Runtime; Rank regularization; categorical data; online sketching;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Sciences and Systems (CISS), 2015 49th Annual Conference on
Conference_Location :
Baltimore, MD
Type :
conf
DOI :
10.1109/CISS.2015.7086875
Filename :
7086875
Link To Document :
بازگشت