DocumentCode :
3192593
Title :
Data analytics in the cloud with flexible MapReduce workflows
Author :
Goncalves, Celso ; Assuncao, Luis ; Cunha, Joao Carlos
Author_Institution :
Inst. Super. de Eng. de Lisboa, Lisbon, Portugal
fYear :
2012
fDate :
3-6 Dec. 2012
Firstpage :
427
Lastpage :
434
Abstract :
Data analytic applications are characterized by large data sets that are subject to a series of processing phases. Some of these phases are executed sequentially but others can be executed concurrently or in parallel on clusters, grids or clouds. The MapReduce programming model has been applied to process large data sets in cluster and cloud environments. For developing an application using MapReduce there is a need to install/configure/access specific frameworks such as Apache Hadoop or Elastic MapReduce in Amazon Cloud. It would be desirable to provide more flexibility in adjusting such configurations according to the application characteristics. Furthermore the composition of the multiple phases of a data analytic application requires the specification of all the phases and their orchestration. The original MapReduce model and environment lacks flexible support for such configuration and composition. Recognizing that scientific workflows have been successfully applied to modeling complex applications, this paper describes our experiments on implementing MapReduce as sub-workflows in the AWARD framework (Autonomic Workflow Activities Reconfigurable and Dynamic). A text mining data analytic application is modeled as a complex workflow with multiple phases, where individual workflow nodes support MapReduce computations. As in typical MapReduce environments, the end user only needs to define the application algorithms for input data processing and for the map and reduce functions. In the paper we present experimental results when using the AWARD framework to execute MapReduce workflows deployed over multiple Amazon EC2 (Elastic Compute Cloud) instances.
Keywords :
cloud computing; data analysis; data mining; fault tolerant computing; natural sciences computing; pattern clustering; text analysis; workflow management software; AWARD framework; Amazon Cloud; Amazon EC2; Apache Hadoop; Elastic MapReduce; MapReduce programming model; autonomic workflow activities reconfigurable and dynamic framework; cloud environments; cluster environments; elastic compute cloud; flexible MapReduce workflows; large data set processing; processing phase; scientific workflows; text mining data analytic application; Awards activities; Cloud computing; Computational modeling; Corporate acquisitions; Data models; Programming; Text mining; Cloud; MapReduce; Text Mining; Workflow;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Cloud Computing Technology and Science (CloudCom), 2012 IEEE 4th International Conference on
Conference_Location :
Taipei
Print_ISBN :
978-1-4673-4511-8
Electronic_ISBN :
978-1-4673-4509-5
Type :
conf
DOI :
10.1109/CloudCom.2012.6427527
Filename :
6427527
Link To Document :
بازگشت