مرکز منطقه ای اطلاع رساني علوم و فناوري - A selective checkpointing mechanism for query plans in a parallel database system

DocumentCode :

659431

Title :

A selective checkpointing mechanism for query plans in a parallel database system

Author :

Ting Chen ; Taura, Koichi

Author_Institution :

Univ. of Tokyo, Tokyo, Japan

fYear :

2013

fDate :

6-9 Oct. 2013

Firstpage :

237

Lastpage :

245

Abstract :

Most existing parallel database systems achieve fault tolerance by aborting unfinished queries upon a failure and restart the entire from the beginning. This is inefficient for long running queries of OLAP workloads. To solve this problem, this paper presents a selective checkpointing mechanism which materializes the outputs of some necessary operators, enabling to resume queries from middle of the execution upon failures. Each query is represented by a DAG of relational operators in which data are typically pipelined between operators. The goal of the mechanism is to find a set of operators whose outputs are worth being checkpointed to minimize the expected runtime of the whole query. It firstly provides a cost model to estimate the expected runtime of a whole query plan under a given failure probability for each operator. Then a divide-and-conquer algorithm is proposed to find a close-to-optimal solution to the problem. The algorithm divides the query plan into subplans with smaller search spaces. For a given query plan with n operators, the algorithm runs in O(n) time. The mechanism is implemented in a shared-nothing parallel database system called ParaLite which provides a coordination layer to glue many SQLite instances together, and parallelizes SQL queries across them. The experimental results indicate that different fault-tolerant strategies affect the overall runtimes of queries. Our selective checkpointing mechanism can choose reasonable operators to be checkpointed and outperforms other fault-tolerant strategies. In addition, the divide-and-conquer algorithm taken by our mechanism has a smaller overhead than brute-force approach while keeping a similar effectiveness.

Keywords :

SQL; checkpointing; data mining; directed graphs; divide and conquer methods; fault tolerance; parallel databases; probability; query processing; DAG; OLAP workloads; ParaLite; SQL query parallelization; SQLite; brute-force approach; coordination layer; cost model; divide-and-conquer algorithm; failure probability; fault-tolerant strategies; query plan; relational operators; selective checkpointing mechanism; shared-nothing parallel database system; Checkpointing; Database systems; Fault tolerance; Fault tolerant systems; Program processors; Runtime;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Big Data, 2013 IEEE International Conference on

Conference_Location :

Silicon Valley, CA

Type :

conf

DOI :

10.1109/BigData.2013.6691580

Filename :

6691580

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=659431