Title :
Data locality in Hadoop cluster systems
Author :
Khan, Mahrukh ; Yang Liu ; Maozhen Li
Author_Institution :
Sch. of Eng. & Design, Brunel Univ., Uxbridge, UK
Abstract :
MapReduce has become a major programming model that supports distributed and parallel processing for large-scale data-intensive applications such as Web data mining, network traffic analysis, machine learning and scientific simulation. Hadoop is the most popular open-source implementation of the MapReduce programming model. In Hadoop, input files are divided into many data blocks and these blocks are distributed over several nodes in cluster. To efficiently process the data blocks, Hadoop should provide an efficient scheduling mechanism for enhancing the performance of the system in a shared cluster environment. In Hadoop scheduling mainly caused by data locality issues due to limited network bandwidth. By introducing the scheduling issues with regarding to the data locality, this paper review different data locality aware scheduling algorithms that handling the data locality issues. In addition, this paper also evaluating their features, strength, weakness and provided some guidelines on how to improve further these scheduling algorithms.
Keywords :
parallel programming; pattern clustering; public domain software; scheduling; Hadoop cluster systems; MapReduce programming model; data block processing; data locality aware scheduling algorithms; data locality issue handling; distributed processing; feature evaluation; input files; large-scale data-intensive applications; limited network bandwidth; open-source implementation; parallel processing; scheduling mechanism; shared cluster environment; strength evaluation; system performance enhancement; weakness evaluation; Clustering algorithms; Delays; Distributed databases; Prefetching; Scheduling; Scheduling algorithms; MapReduce; data locality; job scheduling;
Conference_Titel :
Fuzzy Systems and Knowledge Discovery (FSKD), 2014 11th International Conference on
Conference_Location :
Xiamen
Print_ISBN :
978-1-4799-5147-5
DOI :
10.1109/FSKD.2014.6980924