Title :
Optimization Techniques within the Hadoop Eco-system: A Survey
Author :
Rumi, Giulia ; Colella, Claudia ; Ardagna, Danilo
Author_Institution :
Dipt. di Elettron., Inf. e Bioingegneria, Politec. di Milano, Milan, Italy
Abstract :
Nowadays, we live in a digital world producing data at an impressive speed: data are large, change quickly, and are often too complex to be processed by existing tools. The problem is to extract knowledge from all these data in an efficient way. MapReduce is a data parallel programming model for clusters of commodity machines that was created to address this problem. In this paper we provide an overview of the Hadoop ecosystem. We introduce the most significative approaches supporting automatic, on-line resource provisioning. Moreover, we analyse optimization approaches proposed in frameworks built on top of MapReduce, such as Pig and Hive, which point out the importance of scheduling techniques in MapReduce when multiple workflows are executed concurrently. Therefore, the default Hadoop schedulers are discussed along with some enhancements proposed by the research community. The analysis is performed to highlight how research contributions try to address common Hadoop points of weakness. As it stands out from our comparison, none of the frameworks surpasses the others and a fair evaluation is also difficult to be performed, the choice of the framework must be related to the specific application goal but there is no single solution that addresses all the issues typical of MapReduce.
Keywords :
data handling; knowledge acquisition; optimisation; parallel programming; pattern clustering; resource allocation; scheduling; Hadoop ecosystem; MapReduce; commodity machines; data parallel programming model; knowledge extraction; on-line resource provisioning; optimization techniques; research community; scheduling techniques; Optimization; Programming; Resource management; Scalability; Scheduling; Time factors; Yarn; Clouds; Design; Performance analysis; Resource management; Scheduling algorithms;
Conference_Titel :
Symbolic and Numeric Algorithms for Scientific Computing (SYNASC), 2014 16th International Symposium on
Conference_Location :
Timisoara
Print_ISBN :
978-1-4799-8447-3
DOI :
10.1109/SYNASC.2014.65