Title :
Memorization of Materialization Points
Author :
Hoger, Marek ; Kao, Odej
Author_Institution :
Tech. Unerversitat Berlin, Berlin, Germany
Abstract :
Data streaming frameworks, constructed to work on large numbers of processing nodes in order to analyze big data, are fault-prone. Not only the large amount of nodes and network components that could fail are a source of errors. Development of data analyzing jobs has the disadvantage that errors or wrong assumptions about the input data may only be detected in productive processing. This usually leads to a re-execution of the entire job and re-computing all input data. This can be a tremendous profuseness of computing time if most of the job´s tasks are not affected by these changes and therefore process and produce the same exact data again. This paper describes an approach to use materialized intermediate data from previous job executions to reduce the number of tasks that have to be re-executed in case of an updated job. Saving intermediate data to disk is a common technique to achieve fault tolerance in data streaming systems. These intermediate results can be used for memoization to avoid needless re-execution of tasks. We show that memoization can decrease the runtime of an updated job distinctly.
Keywords :
Big Data; fault tolerant computing; Big Data; data saving; data streaming framework; data streaming systems; disk; fault tolerance; fault-prone; job executions; job tasks; materialization points; materialized intermediate data; memoization; network components; processing nodes; task reduction; task reexecution; updated job runtime; Engines; Fault tolerance; Fault tolerant systems; Indexes; Optical character recognition software; Runtime; Terrestrial atmosphere; fault tolrance; materialization; memoization;
Conference_Titel :
Computational Science and Engineering (CSE), 2013 IEEE 16th International Conference on
Conference_Location :
Sydney, NSW
DOI :
10.1109/CSE.2013.186