• DocumentCode
    641330
  • Title

    Aggrandizing Hadoop in terms of node Heterogeneity & Data Locality

  • Author

    Sujitha, S. ; Jaganathan, Suresh

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Sri Sivasubramania Nadar Coll. of Eng., Chennai, India
  • fYear
    2013
  • fDate
    28-29 March 2013
  • Firstpage
    145
  • Lastpage
    151
  • Abstract
    The growth of data has increased exponentially in recent years. In this context, data-center scale computer systems are built to meet the high storage and processing demands of these applications. Such systems are composed of hundreds, thousands, or even millions of commodity computers connected through a LAN housed in a data center. It has a much larger scale than a traditional computer cluster. Hadoop enables the distributed processing of large data sets across clusters of commodity servers. It is designed to scale up from a single server to thousands of machines, with a very high degree of fault tolerance. Strength of Hadoop is in its ability to detect and handle failures. The original Hadoop native task scheduler implicitly assumes that cluster nodes are homogeneous. This assumption is used to identify a slow task and re-execute it. However, this assumption does not hold where the cluster nodes are heterogeneous, since speculatively identifying a slow task will give rise to erroneous conclusions. In MapReduce, the sub-task is transferred to a node for execution. The input to the subtask, if not present in the node, must be transferred from another node in the network. Transferring data takes time and delays execution. In this paper, we have proposed a methodology for improving Hadoop in-terms of Heterogeneity and Data Locality. The performance of improved version can be measured using these metrics, i)execution time, ii)response time, iii)tasks submitted, iv)time related to jobs i.e. arrival, start and completion, v)completed task, vi)fairness, vii)locality and viii)mean completion time.
  • Keywords
    computer centres; distributed processing; fault tolerant computing; Hadoop; MapReduce; commodity servers; completed task; data center scale computer systems; data locality; distributed processing; execution time; fairness; fault tolerance; large data sets; mean completion time; native task scheduler; node heterogeneity; response time; tasks submitted; time related to jobs; Computers; Monitoring; Big Data; Cloud Computing; Data Computing; Heterogeneity; Schedulers;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Smart Structures and Systems (ICSSS), 2013 IEEE International Conference on
  • Conference_Location
    Chennai
  • Print_ISBN
    978-1-4673-6240-5
  • Type

    conf

  • DOI
    10.1109/ICSSS.2013.6623017
  • Filename
    6623017