Title :
Communicating efficiently on cluster based grids with MPICH-VMI
Author :
Pant, Avneesh ; Jafri, Hassan
Author_Institution :
Nat. Center for Supercomput. Applications, Univ. of Illinois, Urbana-Champaign, Urbana, IL, USA
Abstract :
Emerging infrastructure of computational grids composed of clusters-of-clusters (CoC) interlinked through high throughput channels promises unprecedented raw compute power for terascale applications. Projects such as the NSF Teragrid and EU Datagrid deploy CoCs across multiple geographical sites providing tens ofteraflops. Efficient scaling of terascale applications on these grids poses a challenge due to the heterogeneous nature of the resources (operating systems and SANs) present at each site that makes interoperability among multiple clusters difficult. In addition, due to the enormous disparity in latency and throughput of the channels within the SAN and those interlinking multiple clusters, these CoC grids contain deep communication hierarchies that prohibit efficient scaling of tightly-coupled applications. We present a design of a grid-enabled MPI called MPICH-VMI for running terascale applications over CoC based computational grids. MPICH- VMI is based on MPICH implementation of MPI 1.1 standard and utilizes a middleware messaging library called the virtual machine interface (VMI). VM enables MPICH- VMI to communicate over heterogeneous networks common in CoC based grid. MPICH-VMI also features novel optimizations for hiding communication hierarchies present in CoC based grids. We also present some preliminary results with MPICH-VMI running on the TeraGridfor MPl benchmarks and applications.
Keywords :
application program interfaces; grid computing; message passing; software libraries; virtual machines; workstation clusters; CoC based computational grids; CoC grids; EU Datagrid; MPI 1.1 standard; MPICH-VMI; NSF Teragrid; SAN; TeraGridfor MPI benchmarks; cluster based grids; cluster interoperability; clusters-of-clusters grid; communication hierarchies; grid-enabled MPI; heterogeneous networks; interlinking multiple clusters; middleware messaging library; multiple geographical sites; virtual machine interface; Bandwidth; Delay; Grid computing; Large-scale systems; Libraries; Operating systems; Storage area networks; Throughput; Virtual machining; Virtual manufacturing;
Conference_Titel :
Cluster Computing, 2004 IEEE International Conference on
Print_ISBN :
0-7803-8694-9
DOI :
10.1109/CLUSTR.2004.1392598