Title :
Parallel mining of dependencies
Author :
Garnaud, Eve ; Hanusse, Nicolas ; Maabout, Sofian ; Novelli, Noel
Author_Institution :
LaBRI, Univ. of Bordeaux, Bordeaux, France
Abstract :
The problem of extracting functional dependencies (FDs) from databases has a long story dating back to the 90´s. Still, efficient solutions taking into account both material evolution, namely the advent of multicore machines, and the amount of data that are to be mined, are still needed. In this paper we propose a parallel algorithm which, upon small modifications, extracts (i) the minimal keys, (ii) the minimal exact FDs, (iii) the minimal approximate FDs and (iv) the Conditional functional dependencies (CFDs) holding in a table. Under some natural conditions, we prove a theoretical speed up of our solution with respect to a baseline algorithm which follows a depth first search strategy. Since mining most of these dependencies require a procedure for computing the number of distinct values (NDV) which is a space consuming operation, we show how sketching techniques for estimating the exact value of NDV can be used for reducing both memory consumption as well as communications overhead when considering distributed data while guaranteeing a certain quality of the result. Our solution is implemented and some experimental results are reported here showing the efficiency and scalability of our proposal. Most notably, the theoretical speed ups are confirmed by the experiments.
Keywords :
multiprocessing systems; parallel algorithms; relational databases; tree searching; CFD; NDV; communications overhead; conditional functional dependencies; database; depth first search strategy; distributed data; functional dependency extraction; memory consumption; minimal approximate functional dependency; minimal exact functional dependency; minimal keys; multicore machine; number of distinct values; parallel algorithm; parallel dependency mining; relational table; sketching technique; Approximation algorithms; Approximation methods; Computational fluid dynamics; Memory management; Parallel algorithms; Program processors;
Conference_Titel :
High Performance Computing & Simulation (HPCS), 2014 International Conference on
Conference_Location :
Bologna
Print_ISBN :
978-1-4799-5312-7
DOI :
10.1109/HPCSim.2014.6903725