• DocumentCode
    3223247
  • Title

    FlowChecker: Detecting Bugs in MPI Libraries via Message Flow Checking

  • Author

    Chen, Zhezhe ; Gao, Qi ; Zhang, Wenbin ; Qin, Feng

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH, USA
  • fYear
    2010
  • fDate
    13-19 Nov. 2010
  • Firstpage
    1
  • Lastpage
    11
  • Abstract
    Many MPI libraries have suffered from software bugs, which severely impact the productivity of a large number of users. This paper presents a new method called FlowChecker for detecting communication-related bugs inMPI libraries. The main idea is to extract program intentions of message passing (MPintentions), and to check whether theseMP-intentions are fulfilled correctly by the underlying MPI libraries, i.e., whether messages are delivered correctly from specified sources to specified destinations. If not, FlowChecker reports the bugs and provides diagnostic information. We have built a FlowChecker prototype on Linux and evaluated it with five real-world bug cases in three widely-used MPI libraries, including Open MPI, MPICH2, and MVAPICH2. Our experimental results show that FlowChecker effectively detects all five evaluated bug cases and provides useful diagnostic information. Additionally, our experiments with HPL and NPB show that FlowChecker incurs low runtime overhead (0.9-9.7% on three MPI libraries).
  • Keywords
    message passing; program debugging; program diagnostics; FlowChecker; HPL; Linux; MPI libraries; MPICH2; MVAPICH2; NPB; Open MPI; communication related bugs detection; message flow checking; message passing program intentions; software bugs; Buffer storage; Computer bugs; Libraries; Runtime; Semantics; Software; Tracking;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    High Performance Computing, Networking, Storage and Analysis (SC), 2010 International Conference for
  • Conference_Location
    New Orleans, LA
  • Print_ISBN
    978-1-4244-7557-5
  • Electronic_ISBN
    978-1-4244-7558-2
  • Type

    conf

  • DOI
    10.1109/SC.2010.27
  • Filename
    5644886