• DocumentCode
    506007
  • Title

    DMTracker: finding bugs in large-scale parallel programs by detecting anomaly in data movements

  • Author

    Gao, Qi ; Qin, Feng ; Panda, Dhabaleswar K.

  • Author_Institution
    The Ohio State University, Columbus, OH
  • fYear
    2007
  • fDate
    10-16 Nov. 2007
  • Firstpage
    1
  • Lastpage
    12
  • Abstract
    While software reliability in large-scale systems becomes increasingly important, debugging in large-scale parallel systems remains a daunting task. This paper proposes an innovative technique to find hard-to-detect software bugs that can cause severe problems such as data corruptions and deadlocks in parallel programs automatically via detecting their abnormal behaviors in data movements. Based on the observation that data movements in parallel programs typically follow certain patterns, our idea is to extract data movement (DM)-based invariants at program runtime and check the violations of these invariants. These violations indicate potential bugs such as data races and memory corruption bugs that manifest themselves in data movements. We have built a tool, called DMTracker, based on the above idea: automatically extract DM-based invariants and detect the violations of them. Our experiments with two real-world bug cases in MVAPICH/MVAPICH2, a popular MPI library, have shown that DMTracker can effectively detect them and report abnormal data movements to help programmers quickly diagnose the root causes of bugs. In addition, DMTracker incurs very low runtime overhead, from 0.9% to 6.0%, in our experiments with High Performance Linpack (HPL) and NAS Parallel Benchmarks (NPB), which indicates that DMTracker can be deployed in production runs.
  • Keywords
    Computer bugs; Data mining; Debugging; Large-scale systems; Libraries; Production; Programming profession; Runtime; Software reliability; System recovery; anomaly detection; bug detection; data movements; parallel programs;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Supercomputing, 2007. SC '07. Proceedings of the 2007 ACM/IEEE Conference on
  • Conference_Location
    Reno, NV, USA
  • Print_ISBN
    978-1-59593-764-3
  • Electronic_ISBN
    978-1-59593-764-3
  • Type

    conf

  • DOI
    10.1145/1362622.1362643
  • Filename
    5348838