• DocumentCode
    1605267
  • Title

    Self-Recovering Parallel Applications in Multi-core Systems

  • Author

    Bizot, Gilles ; Avresky, Dimiter ; Chaix, Fabien ; Zergainoh, Nacer-Eddine ; Nicolaidis, Michael

  • Author_Institution
    TIMA Lab., INP, Grenoble, France
  • fYear
    2011
  • Firstpage
    51
  • Lastpage
    58
  • Abstract
    In this paper, a Self-Recovering strategy, which is able to "re-map" dynamically application tasks on a multi-core system, is presented. Based on run-time failure aware techniques, this Self-Recovering strategy guarantees seamlessly termination and delivering the expected results despite multiple node and link failures in a 2D mesh topology. It has been demonstrated, based on a statistical analysis, that the proposed technique is able to re-map the tasks of faulty nodes in a bounded number of steps. The theoretical results have been validated by simulations. The proposed technique is allowing to bypass multiple nodes, routers and links failures with a predictable number of hops. It has been demonstrated that the Motion JPEG-2000 application can be parallelized and formally represented as a Directed Acyclic Graph (DAG). It is worth noting that the proposed technique has been validated by the simulation of a 1000 cores system, in the presence of nodes and links failures up to 10%. Therefore, the proposed technique has been shown to be efficient for seamless execution of parallel streaming applications and to provide the Execution Time Reduction Ratio close to ideal.
  • Keywords
    multiprocessing systems; parallel processing; statistical analysis; 2D mesh topology; Motion JPEG-2000; directed acyclic graph; execution time reduction ratio; link failures; multicore systems; run-time failure aware techniques; self-recovering parallel applications; statistical analysis; Fault tolerance; Fault tolerant systems; Heuristic algorithms; Peer to peer computing; Routing; Search problems; Adaptive Fault-Tolerant Routing; Multi-Core Chip; Parallel Streaming Application; Seamless Execution; Self-Recovering;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Network Computing and Applications (NCA), 2011 10th IEEE International Symposium on
  • Conference_Location
    Cambridge, MA
  • Print_ISBN
    978-1-4577-1052-0
  • Electronic_ISBN
    978-0-7695-4489-2
  • Type

    conf

  • DOI
    10.1109/NCA.2011.14
  • Filename
    6038584