• DocumentCode
    2815034
  • Title

    Detecting anomalies in high-performance parallel programs

  • Author

    Florez, German ; Liu, Zhen ; Bridges, Susan ; Vaughn, Rayford ; Skjellum, Anthony

  • Author_Institution
    The Center for Comput. Security Res., Mississippi State Univ., MS, USA
  • Volume
    2
  • fYear
    2004
  • fDate
    5-7 April 2004
  • Firstpage
    30
  • Abstract
    Message passing interface (MPI) is an effective programming technique for implementing parallel programs for distributed computation. As these applications run, a number of different types of irregularities can occur including those that result from intrusions, user misbehavior, corrupted data, deadlocks or failure of cluster components. We perform a comparison of different artificial intelligence (AI) techniques that can be used to implement a lightweight monitoring and detection system for parallel applications on a cluster of Linux workstations. We study the accuracy and performance of deterministic and stochastic algorithms when we observe the flow of function library and OS system calls of parallel programs written with MPI. We demonstrate that monitoring of MPI programs can be achieved with high accuracy and in some cases with a 0% false positive rate in real-time, and we show that the added computational load on each node is small. Finally we demonstrate that simple deterministic methods perform poorly when the program flow grows in size and variety, and that more complex methods are required.
  • Keywords
    Unix; application program interfaces; artificial intelligence; deterministic algorithms; hidden Markov models; message passing; neural nets; parallel programming; system monitoring; workstation clusters; Linux workstation clusters; MPI program monitoring; OS system calls; anomaly detection; artificial intelligence techniques; deterministic algorithms; distributed computation; function library; high-performance parallel programs; lightweight monitoring detection system; message passing interface; parallel applications; stochastic algorithms; Artificial intelligence; Computer interfaces; Concurrent computing; Condition monitoring; Distributed computing; Linux; Message passing; Parallel programming; System recovery; Workstations;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Technology: Coding and Computing, 2004. Proceedings. ITCC 2004. International Conference on
  • Print_ISBN
    0-7695-2108-8
  • Type

    conf

  • DOI
    10.1109/ITCC.2004.1286585
  • Filename
    1286585