DocumentCode :
1799815
Title :
Aeromancer: A Workflow Manager for Large-Scale MapReduce-Based Scientific Workflows
Author :
Mohamed, N. ; Maji, Nabanita ; Jing Zhang ; Timoshevskaya, Nataliya ; Wu-Chun Feng
Author_Institution :
Dept. of Comput. Sci., Virginia Tech, Blacksburg, VA, USA
fYear :
2014
fDate :
24-26 Sept. 2014
Firstpage :
739
Lastpage :
746
Abstract :
The Hadoop framework has gained significant attention from the scientific community due to its applicability to large-scale data analysis in many areas. This analysis often involves multiple stages of processing, which in turn, constitutes a workflow. While some stages of a workflow are mandatory, others are subject to the type of analysis to be done. In addition, a workflow may possess data dependencies between stages that must be enforced, and it may exhibit varying levels of sensitivity. The resources needed for such data analysis can range from a laptop to in-house clusters (or private cloud) to a public cloud. Managing such workflows, while using such a gamut of computing resources, is an unnecessarily arduous task for domain scientists. To address the above challenges, we present Aeromancer, a feature-rich workflow manager for running Map Reduce-based workflows that utilizes both client and cloud resources. Aeromancer offers an ensemble of features, including the simultaneous use of client resources (e.g., On-premises clusters) and public cloud resources, automatic data-dependency and data-transfer handling, intra-flow, on-demand cluster provisioning, and support for directed-acyclic graphs (DAGs). To demonstrate its functionality, we apply Aeromancer to several bioinformatics pipelines, as part of a "big data" case study in the life sciences, which seeks to increase the adoption of hybrid computing environments, including the emerging "client cloud" computing model, for running data-intensive workflows.
Keywords :
bioinformatics; client-server systems; cloud computing; data analysis; directed graphs; electronic data interchange; parallel processing; pipeline processing; resource allocation; workflow management software; Aeromancer; DAG; automatic data-dependency; big data; bioinformatics pipelines; client resources; client-cloud computing model; data-intensive workflows; data-transfer handling; directed-acyclic graphs; feature-rich workflow manager; hybrid computing environments; intraflow on-demand cluster provisioning; large-scale MapReduce-based scientific workflow; large-scale data analysis; life sciences; public cloud resources; scientific community; Bioinformatics; Cloud computing; Computer architecture; Data analysis; Instruction sets; Pipelines; Servers; Hadoop; MapReduce; cloud; cluster; hybrid resources; workflow;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Trust, Security and Privacy in Computing and Communications (TrustCom), 2014 IEEE 13th International Conference on
Conference_Location :
Beijing
Type :
conf
DOI :
10.1109/TrustCom.2014.97
Filename :
7011321
Link To Document :
بازگشت