Abstract :
The MapReduce (MR) framework is a programming environment that facilitates rapid parallel design of applications that process big data. While born in the Cloud arena, numerous other areas are now attempting to utilize it for their big data due to the speed of development. However, for HPC researchers and many others who already utilize centralized storage, MR marks a paradigm shift toward co-located storage and computation resources. In this work I attempt to reach the best of both worlds by exploring how to utilize MR on a network-attached parallel file system. This work is nearly complete and has unearthed key issues I´ve subsequently overcome to achieved desired high throughput. In my poster I describe many of these issues, demonstrate improvements possible with different architectural schemas, and provide reliability and fault-tolerance considerations for this novel combination of Cloud computation and HPC storage.
Conference_Titel :
High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion: