Title :
An original approach for processing public open data with MapReduce: A case study
Author :
Arres, Billel ; Kabachi, Nadia ; Bentayeb, Fadila ; Boussaid, Omar
Author_Institution :
Univ. Lumiere Lyon 2, Bron, France
Abstract :
Nowadays, many governments and states are involved in an opening strategy of their public data. However, the volume of these opened data is constantly increasing, and will reach in the near future limitations of current treatment and storage capacity. On the other hand, the MapReduce paradigm is one of the most used parallel programming models. With a master-slave architecture, it allows parallel processing of very large data sets. In this paper, we propose a parallel approach based on Mapreduce to process public open data. Applied, as a case study, to the official data sets from the French Ministry of Communication. We implement a parallel algorithm as a solution to define a ranking of national museums and galleries according to the accessibility degrees for people with disabilities. We studied the feasibility of our approach in two main parts: The performance in terms of execution time, and, the visualization of the obtained results in order to integrate them into solutions such as geographic BI. This work can be applied to other cases with very large data sets.
Keywords :
data handling; parallel algorithms; parallel programming; French Ministry of Communication; MapReduce paradigm; data visualization; disabled people accessibility degree; execution time; geographic BI; master-slave architecture; national gallery ranking; national museum ranking; official data sets; parallel algorithm; parallel processing; parallel programming models; public open data; public open data processing; very-large data sets; Cities and towns; Clustering algorithms; Computer architecture; Distributed databases; Electronic mail; Europe; Mobile communication;
Conference_Titel :
Computer Systems and Applications (AICCSA), 2014 IEEE/ACS 11th International Conference on
DOI :
10.1109/AICCSA.2014.7073190