Author :
Shuo Yang ; Spielman, Nicholas D. ; Jackson, Jadin C. ; Rubin, Brad S.
Abstract :
One of the most crucial challenges in scientific computing is scalability. Hadoop, an open-source implementation of the MapReduce parallel programming model developed by Google, has emerged as a powerful platform for performing large-scale scientific computing at very low costs. In this paper, we explore the use of Hadoop to model large-scale neural networks. A neural network is most naturally modeled by a graph structure with iterative processing. In this paper, we first present an improved graph algorithm design pattern in MapReduce called Mapper-side Schimmy. Experiments show that the application of our design pattern, combined with the current best practices, can reduce the running time of the neural network simulation on a neural network with 100,000 neurons and 2.3 billion edges by 64%. MapReduce, however, is inherently not efficient for iterative graph processing. To address the limitation of the MapReduce model, we then explore the use of Giraph, an open source large-scale graph processing framework that sits on top of Hadoop to implement graph algorithms with a vertex-centric approach. We show that our Giraph implementation boosted performance by 91% compared to a basic MapReduce implementation and by 60% compared to our improved Mapper-side Schimmy algorithm.
Keywords :
graph theory; neural nets; neurophysiology; parallel programming; public domain software; Giraph; Google; Hadoop; MapReduce parallel programming model; Mapper-side Schimmy algorithm; graph algorithm design pattern; graph structure; iterative graph processing; large-scale neural network modeling; large-scale scientific computing; neural network simulation; open source large-scale graph processing framework; running time; scalability; vertex-centric approach; Algorithm design and analysis; Best practices; Biological neural networks; Biomembranes; Computational modeling; Neurons; Synchronization;