• DocumentCode
    692912
  • Title

    SIDR: Structure-aware intelligent data routing in hadoop

  • Author

    Buck, J. ; Watkins, N. ; Levin, Greg ; Crume, Adam ; Ioannidou, Kleoni ; Brandt, Scott ; Maltzahn, Carlos ; Polyzotis, N. ; TORRES, ABEL

  • Author_Institution
    Dept. of Comput. Sci., Univ. of California-Santa Cruz, Santa Cruz, CA, USA
  • fYear
    2013
  • fDate
    17-22 Nov. 2013
  • Firstpage
    1
  • Lastpage
    12
  • Abstract
    The MapReduce framework is being extended for domains quite different from the web applications for which it was designed, including the processing of big structured data, e.g., scientific and financial data. Previous work using MapReduce to process scientific data ignores existing structure when assigning intermediate data and scheduling tasks. In this paper, we present a method for incorporating knowledge of the structure of scientific data and executing query into the MapReduce communication model. Built in SciHadoop, a version of the Hadoop MapReduce framework for scientific data, SIDR intelligently partitions and routes intermediate data, allowing it to: remove Hadoop´s global barrier and execute Reduce tasks prior to all Map tasks completing; minimize intermediate key skew; and produce early, correct results. SIDR executes queries up to 2.5 times faster than Hadoop and 37% faster than SciHadoop; produces initial results with only 6% of the query completed; and produces dense, contiguous output.
  • Keywords
    Internet; distributed processing; scheduling; scientific information systems; MapReduce framework; SIDR; SciHadoop; Web applications; big structured data; financial data; intermediate data; intermediate key skew; scheduling tasks; scientific data; structure-aware intelligent data routing; Abstracts; Shape; Hadoop; MapReduce; Scientific Data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    High Performance Computing, Networking, Storage and Analysis (SC), 2013 International Conference for
  • Conference_Location
    Denver, CO
  • Print_ISBN
    978-1-4503-2378-9
  • Type

    conf

  • DOI
    10.1145/2503210.2503241
  • Filename
    6877506