Big data, deep data, and the effect of system architectures on performance

Author

Kogge, Peter M.

Author_Institution

Dept. of Comput. Sci. & Eng., Univ. of Notre Dame, Notre Dame, IN, USA

fYear

2013

fDate

20-24 May 2013

Firstpage

40

Lastpage

41

Abstract

Summary form only given. “Big Data” traditionally refers to some combination of high volume of data, high velocity of change, and/or wide variety and complexity of the underlying data. Solving such problems has evolved into using paradigms like MapReduce on large clusters of compute nodes. More recently, a growing number of “Deep Data” problems have arisen where it is the relationships between objects, and not necessarily the collections of objects, that are important, and for which the traditional implementation techniques are unsatisfactory. This talk addresses a study of a class of such “challenge problems” first formulated by David Bayliss of LexisNexis, and what are their execution characteristics on both current and future architectures. The goal is to discover, to at least a first order approximation, what are the tall poles preventing a speedup of their solution. A variety or architectures are considered, ranging from standard server blades in large scale configurations, to emerging variations that leverage simpler and more energy efficient chip sets, through systems built on 3D chip stacks, and on to new architectures that were designed from the ground up to “follow the links.” Such architectures are considered for two variants of such problems: a traditional partitioned data approach where data is “pre-boiled” to provide fast response, and one that uses very large graphs in very large shared memories. The results are not necessarily intuitive; the bottlenecks in such problems are not where current systems have the bulk of their capabilities or costs, nor where obvious near term upgrades will have major effects. Instead, it appears that only highly scalable memory-intensive architectures offer the potential for truly major gains in application performance.

Keywords

approximation theory; data mining; graph theory; shared memory systems; 3D chip stack; MapReduce; big data; deep data problem; first order approximation; graph; partitioned data approach; scalable memory-intensive architecture; server blade; shared memory; system architecture; Abstracts; Computer architecture; Computer science; Data handling; Data storage systems; Educational institutions; Information management; Algorithms; ECL; NORA; Performance; association mining; link discovery; non-obvious relationship analysis; parallel big data systems; performance;

fLanguage

English

Publisher

ieee

Conference_Titel

Collaboration Technologies and Systems (CTS), 2013 International Conference on

Conference_Location

San Diego, CA

Print_ISBN

978-1-4673-6403-4

Type

conf

DOI

10.1109/CTS.2013.6567201

Filename

6567201