• DocumentCode
    2701284
  • Title

    Service-Oriented Reliable Problem Solving Environment for Scientific Computation

  • Author

    Liu, Cancan ; Zhang, Weimin ; Luo, Zhigang ; Liu, Hai ; Xiao, Lin

  • Author_Institution
    Sch. of Comput. Sci., Nat. Univ. of Defense Technol., Changsha
  • fYear
    2008
  • fDate
    9-12 Dec. 2008
  • Firstpage
    421
  • Lastpage
    426
  • Abstract
    Due to the large-scale and long running of scientific computation under the dynamic and unsteady grid architecture, the capability of fault-tolerance of scientific workflow management system becomes more and more important. In order to handle inevitable failures of activities in workflow, we present a three-level recovery strategy in this paper: in the service level, we provide a distributed Service Agent (SA) for each activity to monitor the execution status of workflow activities and implement the retry-based recovery strategy by submitting the failed activity multiple times; then in the workflow level, workflow engine implements replication-based strategy by request the Service Factory (SF) to create another service instance on a different node and invoke the new service instance for replacement; while in the user level, we provide a user interface for the users to handle the failure on demand. At last, a reliable Problem Solving Environment (PSE) in climate domain called Ensemble Prediction Scientific Workflow (EPSWFlow) is presented. This approach can seamlessly embed the complex control-flow intensive recovery strategies within the dataflow process network. Moreover, it can enable the prediction process more robust and more reusable.
  • Keywords
    fault tolerant computing; workflow management software; Ensemble Prediction Scientific Workflow; Problem Solving Environment; distributed Service Agent; scientific workflow management system; service-oriented reliable problem solving environment; unsteady grid architecture; workflow engine; Computer architecture; Condition monitoring; Engines; Fault tolerant systems; Grid computing; Large-scale systems; Problem-solving; Production facilities; User interfaces; Workflow management software;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Asia-Pacific Services Computing Conference, 2008. APSCC '08. IEEE
  • Conference_Location
    Yilan
  • Print_ISBN
    978-0-7695-3473-2
  • Electronic_ISBN
    978-0-7695-3473-2
  • Type

    conf

  • DOI
    10.1109/APSCC.2008.118
  • Filename
    4780711