• DocumentCode
    1914737
  • Title

    Mrs: MapReduce for Scientific Computing in Python

  • Author

    McNabb, Andrew ; Lund, James ; Seppi, Kevin

  • Author_Institution
    Comput. Sci. Dept., Brigham Young Univ., Provo, UT, USA
  • fYear
    2012
  • fDate
    10-16 Nov. 2012
  • Firstpage
    600
  • Lastpage
    608
  • Abstract
    The MapReduce parallel programming model is designed for large-scale data processing, but its benefits, such as fault tolerance and automatic message routing, are also helpful for computationally-intensive algorithms. However, popular MapReduce frameworks such as Hadoop are slow for many scientific applications and are inconvenient on supercomputers and clusters which are common in research institutions. Mrs is a Python-based MapReduce framework that is well suited for scientific computing. We present comparisons of programs and run scripts to argue that Mrs is more convenient than Hadoop, the most popular MapReduce implementation. We also demonstrate that Mrs outperforms Hadoop for several types of problems that are relevant to scientific computing. In particular, Mrs demonstrates per-iteration overhead of about 0.3 seconds for Particle Swarm Optimization, while Hadoop takes at least 30 seconds for each MapReduce operation, a difference of two orders of magnitude.
  • Keywords
    parallel programming; particle swarm optimisation; scientific information systems; MapReduce parallel programming model; Mrs framework; Python; computationally-intensive algorithms; large-scale data processing; particle swarm optimization; run scripts; scientific computing; MapReduce; Particle Swarm Optimization; Python; iterative algorithms; scientific computing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion:
  • Conference_Location
    Salt Lake City, UT
  • Print_ISBN
    978-1-4673-6218-4
  • Type

    conf

  • DOI
    10.1109/SC.Companion.2012.84
  • Filename
    6495866