Title :
Tackling the Big Data 4 vs for anomaly detection
Author :
Camacho, Jorge ; Macia-Fernandez, Gabriel ; Diaz-Verdejo, Jesus ; Garcia-Teodoro, Pedro
Author_Institution :
Dept. of Signal Theor., Telematics & Commun. - OTIC, Univ. of Granada, Granada, Spain
fDate :
April 27 2014-May 2 2014
Abstract :
In this paper, a framework for anomaly detection and forensics in Big Data is introduced. The framework tackles the Big Data 4 Vs: Variety, Veracity, Volume and Velocity. The varied nature of the data sources is treated by transforming the typically unstructured data into a highly dimensional and structured data set. To overcome both the uncertainty (low veracity) and high dimension introduced, a latent variable method, in particular Principal Component Analysis (PCA), is applied. PCA is well known to present outstanding capabilities to extract information from highly dimensional data sets. However, PCA is limited to low size, thought highly multivariate, data sets. To handle this limitation, a kernel computation of PCA is employed. This avoids computational problems due to the size (number of observations) in the data sets and allows parallelism. Also, hierarchical models are proposed if dimensionality is extreme. Finally, to handle high velocity in analyzing time series data flows, the Exponentially Weighted Moving Average (EWMA) approach is employed. All these steps are discussed in the paper, and the VAST 2012 mini challenge 2 is used for illustration.
Keywords :
Big Data; digital forensics; firewalls; moving average processes; principal component analysis; time series; Big Data 4 Vs; EWMA approach; PCA; anomaly detection; computational problems; data sources; exponentially weighted moving average approach; forensics; hierarchical models; highly-dimensional structured data set; information extraction; kernel computation; latent variable method; parallelism; principal component analysis; time series data flow analysis; uncertainty problem; unstructured data transformation; variety; velocity; veracity; volume; Big data; Computational modeling; Conferences; Data privacy; Data visualization; Principal component analysis; Security;
Conference_Titel :
Computer Communications Workshops (INFOCOM WKSHPS), 2014 IEEE Conference on
Conference_Location :
Toronto, ON
DOI :
10.1109/INFCOMW.2014.6849282