• DocumentCode
    746317
  • Title

    Latency, occupancy, and bandwidth in DSM multiprocessors: a performance evaluation

  • Author

    Chaudhuri, Mainak ; Heinrich, Mark ; Holt, Chris ; Singh, Jaswinder Pal ; Rothberg, Edward ; Hennessy, John

  • Author_Institution
    Comput. Syst. Lab., Cornell Univ., Ithaca, NY, USA
  • Volume
    52
  • Issue
    7
  • fYear
    2003
  • fDate
    7/1/2003 12:00:00 AM
  • Firstpage
    862
  • Lastpage
    880
  • Abstract
    While the desire to use commodity parts in the communication architecture of a DSM multiprocessor offers advantages in cost and design time, the impact on application performance is unclear. We study this performance impact through detailed simulation, analytical modeling, and experiments on a flexible DSM prototype, using a range of parallel applications. We adapt the logP model to characterize the communication architectures of DSM machines. The l (network latency) and o (controller occupancy) parameters are the keys to performance in these machines, with the g (node-to-network bandwidth) parameter becoming important only for the fastest controllers. We show that, of all the logP parameters, controller occupancy has the greatest impact on application performance. Of the two contributions of occupancy to performance degradation-the latency it adds and the contention it induces-it is the contention component that governs performance regardless of network latency, showing a quadratic dependence on o. As expected, techniques to reduce the impact of latency make controller occupancy a greater bottleneck. Surprisingly, the performance impact of occupancy is substantial, even for highly-tuned applications and even in the absence of latency hiding techniques. Scaling the problem size is often used as a technique to overcome limitations in communication latency and bandwidth. Through experiments on a DSM prototype, we show that there are important classes of applications for which the performance lost by using higher occupancy controllers cannot be regained easily, if at all, by scaling the problem size.
  • Keywords
    distributed shared memory systems; parallel architectures; performance evaluation; virtual machines; analytical modeling; application performance; bandwidth; controller occupancy; distributed shared memory multiprocessors; experiments; latency; latency hiding techniques; logP model; network latency; node-to-network bandwidth; occupancy; parallel applications; performance evaluation; simulation; Analytical models; Bandwidth; Communication system control; Costs; Degradation; Delay; Performance analysis; Prototypes; Size control; Virtual prototyping;
  • fLanguage
    English
  • Journal_Title
    Computers, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0018-9340
  • Type

    jour

  • DOI
    10.1109/TC.2003.1214336
  • Filename
    1214336