• DocumentCode
    3678335
  • Title

    IOSIG+: On the Role of I/O Tracing and Analysis for Hadoop Systems

  • Author

    Bo Feng;Xi Yang;Kun Feng;Yanlong Yin;Xian-He Sun

  • Author_Institution
    Dept. of Comput. Sci., Illinois Inst. of Technol., Chicago, IL, USA
  • fYear
    2015
  • Firstpage
    62
  • Lastpage
    65
  • Abstract
    Hadoop, as one of the most widely accepted MapReduce frameworks, is naturally data-intensive. Its several dependent projects, such as Mahout and Hive, inherent this characteristic. Meanwhile I/O optimization becomes a daunting work, since applications´ source code is not always available. I/O traces for Hadoop and its dependents are increasingly important, because it can faithfully reveal intrinsic I/O behaviors without knowing the source code. This method can not only help to diagnose system bottlenecks but also further optimize performance. To achieve this goal, we propose a transparent tracing and analysis tool suite, namely IOSIG+, which can be plugged into Hadoop system. We make several contributions: 1) we describe our approach of tracing, 2) we release the tracer, which can trace I/O operations without modifying targets´ source code, 3) this work adopts several techniques to mitigate the introduced execution overhead at runtime, 4) we create an analyzer, which helps to discover new approaches to address I/O problems according to access patterns. The experimental results and analysis confirm its effectiveness and the observed overhead can be as low as 1.97%.
  • Keywords
    "Java","Throughput","Optimization","Runtime","Tuning","Yarn","Performance evaluation"
  • Publisher
    ieee
  • Conference_Titel
    Cluster Computing (CLUSTER), 2015 IEEE International Conference on
  • Type

    conf

  • DOI
    10.1109/CLUSTER.2015.17
  • Filename
    7307564