Title :
Risk Intelligence: Profiting from Uncertainty in Data Processing System
Author :
Si Zheng ; Yunhuai Liu ; Shanshan Li ; Tian He ; Xiangke Liao
Abstract :
Fault-tolerance is essential in extreme-scale data processing systems. Pro-active fault-tolerance scheme (such as the speculative execution in MapReduce framework), can dramatically improve the response time of job executions when the failure becomes norm rather than an exception. Efficient pro-active fault-tolerance schemes require precise knowledge on the task executions, which has been an open challenges for decades. To well address the issue, in this paper we design and implement RiskI, a profile-based prediction algorithm in conjunction with a risk-aware task assignment algorithm to accelerate task executions, taking the uncertainty nature of tasks into account. Our design demonstrates that the nature uncertain not only brings great challenges but also new opportunities. With a careful design, we can benefit from such uncertainties. We implement the idea in Hadoop 0.21.0 systems and the experimental results show that compared with the traditional LATE algorithm, the response time can be improved by 46% with the same system throughput.
Keywords :
data handling; public domain software; risk management; software fault tolerance; Hadoop 0.21.0 systems; LATE algorithm; RiskI; extreme-scale data processing systems; proactive fault-tolerance scheme; profile-based prediction algorithm; risk intelligence; risk-aware task assignment algorithm; risk-management; task uncertainty; Algorithm design and analysis; Data processing; Fault tolerance; Fault tolerant systems; Prediction algorithms; Time factors; Uncertainty; MapReduce; data processing systems; fault-tolerance; prediction; risk-management; task assignment;
Conference_Titel :
Parallel Processing (ICPP), 2013 42nd International Conference on
Conference_Location :
Lyon
DOI :
10.1109/ICPP.2013.55