• DocumentCode
    3501593
  • Title

    A Simple Synchronous Distributed-Memory Algorithm for the HPCC RandomAccess Benchmark

  • Author

    Plimpton, Steven J. ; Brightwell, Ron ; Vaughan, Courtenay ; Underwood, K. ; Davis, Michael H.

  • Author_Institution
    Sandia Nat. Labs., Cray Inc., Seattle, WA
  • fYear
    2006
  • fDate
    25-28 Sept. 2006
  • Firstpage
    1
  • Lastpage
    7
  • Abstract
    The RandomAccess benchmark as defined by the high performance computing challenge (HPCC) tests the speed at which a machine can update the elements of a table spread across global system memory, as measured in billions (giga) of updates per second (GUPS). The parallel implementation provided by HPCC typically performs poorly on distributed-memory machines, due to updates requiring numerous small point-to-point messages between processors. We present an alternative algorithm which treats the collection of P processors as a hypercube, aggregating data so that larger messages are sent, and routing individual datums through dimensions of the hypercube to their destination processor. The algorithm\´s computation (the GUP count) scales linearly with P while its communication overhead scales as log 2(P), thus enabling better performance on large numbers of processors. The new algorithm achieves a GUPS rate of 19.98 on 8192 processors of Sandia\´s Red Storm machine, compared to 1.02 for the HPCC-provided algorithm on 10350 processors. We also illustrate how GUPS performance varies with the benchmark\´s specification of its "look-ahead"parameter. As expected, parallel performance degrades for small look-ahead values, and improves dramatically for large values
  • Keywords
    benchmark testing; distributed algorithms; distributed memory systems; hypercube networks; message passing; parallel processing; GUP count; HPCC; Hypercube; P processors; RandomAccess benchmark; Sandia´s Red Storm machine; communication overhead; distributed-memory machines; global system memory; high performance computing challenge tests; parallel implementation; point-to-point messages; synchronous distributed-memory algorithm; Bandwidth; Benchmark testing; Delay; Filtering; High performance computing; Hypercubes; Laboratories; Routing; Storms; World Wide Web;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cluster Computing, 2006 IEEE International Conference on
  • Conference_Location
    Barcelona
  • ISSN
    1552-5244
  • Print_ISBN
    1-4244-0327-8
  • Electronic_ISBN
    1552-5244
  • Type

    conf

  • DOI
    10.1109/CLUSTR.2006.311859
  • Filename
    4100365