• DocumentCode
    623993
  • Title

    Big data, deep data, and the effect of system architectures on performance

  • Author

    Kogge, Peter M.

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Univ. of Notre Dame, Notre Dame, IN, USA
  • fYear
    2013
  • fDate
    20-24 May 2013
  • Firstpage
    40
  • Lastpage
    41
  • Abstract
    Summary form only given. “Big Data” traditionally refers to some combination of high volume of data, high velocity of change, and/or wide variety and complexity of the underlying data. Solving such problems has evolved into using paradigms like MapReduce on large clusters of compute nodes. More recently, a growing number of “Deep Data” problems have arisen where it is the relationships between objects, and not necessarily the collections of objects, that are important, and for which the traditional implementation techniques are unsatisfactory. This talk addresses a study of a class of such “challenge problems” first formulated by David Bayliss of LexisNexis, and what are their execution characteristics on both current and future architectures. The goal is to discover, to at least a first order approximation, what are the tall poles preventing a speedup of their solution. A variety or architectures are considered, ranging from standard server blades in large scale configurations, to emerging variations that leverage simpler and more energy efficient chip sets, through systems built on 3D chip stacks, and on to new architectures that were designed from the ground up to “follow the links.” Such architectures are considered for two variants of such problems: a traditional partitioned data approach where data is “pre-boiled” to provide fast response, and one that uses very large graphs in very large shared memories. The results are not necessarily intuitive; the bottlenecks in such problems are not where current systems have the bulk of their capabilities or costs, nor where obvious near term upgrades will have major effects. Instead, it appears that only highly scalable memory-intensive architectures offer the potential for truly major gains in application performance.
  • Keywords
    approximation theory; data mining; graph theory; shared memory systems; 3D chip stack; MapReduce; big data; deep data problem; first order approximation; graph; partitioned data approach; scalable memory-intensive architecture; server blade; shared memory; system architecture; Abstracts; Computer architecture; Computer science; Data handling; Data storage systems; Educational institutions; Information management; Algorithms; ECL; NORA; Performance; association mining; link discovery; non-obvious relationship analysis; parallel big data systems; performance;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Collaboration Technologies and Systems (CTS), 2013 International Conference on
  • Conference_Location
    San Diego, CA
  • Print_ISBN
    978-1-4673-6403-4
  • Type

    conf

  • DOI
    10.1109/CTS.2013.6567201
  • Filename
    6567201