DocumentCode
3678335
Title
IOSIG+: On the Role of I/O Tracing and Analysis for Hadoop Systems
Author
Bo Feng;Xi Yang;Kun Feng;Yanlong Yin;Xian-He Sun
Author_Institution
Dept. of Comput. Sci., Illinois Inst. of Technol., Chicago, IL, USA
fYear
2015
Firstpage
62
Lastpage
65
Abstract
Hadoop, as one of the most widely accepted MapReduce frameworks, is naturally data-intensive. Its several dependent projects, such as Mahout and Hive, inherent this characteristic. Meanwhile I/O optimization becomes a daunting work, since applications´ source code is not always available. I/O traces for Hadoop and its dependents are increasingly important, because it can faithfully reveal intrinsic I/O behaviors without knowing the source code. This method can not only help to diagnose system bottlenecks but also further optimize performance. To achieve this goal, we propose a transparent tracing and analysis tool suite, namely IOSIG+, which can be plugged into Hadoop system. We make several contributions: 1) we describe our approach of tracing, 2) we release the tracer, which can trace I/O operations without modifying targets´ source code, 3) this work adopts several techniques to mitigate the introduced execution overhead at runtime, 4) we create an analyzer, which helps to discover new approaches to address I/O problems according to access patterns. The experimental results and analysis confirm its effectiveness and the observed overhead can be as low as 1.97%.
Keywords
"Java","Throughput","Optimization","Runtime","Tuning","Yarn","Performance evaluation"
Publisher
ieee
Conference_Titel
Cluster Computing (CLUSTER), 2015 IEEE International Conference on
Type
conf
DOI
10.1109/CLUSTER.2015.17
Filename
7307564
Link To Document