Title :
Towards efficient resource management for data-analytic platforms
Author :
Castillo, Claris ; Spreitzer, Mike ; Steinder, Malgorzata
Author_Institution :
T.J. Watson Res. Center, IBM, Hawthorne, NY, USA
Abstract :
We present architectural and experimental work exploring the role of intermediate data handling in the performance of MapReduce workloads. Our findings show that: (a) certain jobs are more sensitive to disk cache size than others and (b) this sensitivity is mostly due to the local file I/O for the intermediate data. We also show that a small amount of memory is sufficient for the normal needs of map workers to hold their intermediate data until it is read. We introduce Hannibal, which exploits the modesty of that need in a simple and direct way - holding the intermediate data in application-level memory for precisely the needed time - to improve performance when the disk cache is stressed. We have implemented Hannibal and show through experimental evaluation that Hannibal can make MapReduce jobs run faster than Hadoop when little memory is available to the disk cache. This provides better performance insulation between concurrent jobs.
Keywords :
cache storage; data handling; middleware; public domain software; Hadoop; Hannibal; MapReduce; MapReduce workloads; application-level memory; data-analytic platforms; disk cache size; intermediate data handling; open source middleware; resource management; Context; Insulation; Monitoring; Reliability; Resource management; Hadoop; Map-Reduce; disk; performance;
Conference_Titel :
Integrated Network Management (IM), 2011 IFIP/IEEE International Symposium on
Conference_Location :
Dublin
Print_ISBN :
978-1-4244-9219-0
Electronic_ISBN :
978-1-4244-9220-6
DOI :
10.1109/INM.2011.5990676