• DocumentCode
    1293369
  • Title

    Fast parallel sorting under LogP: experience with the CM-5

  • Author

    Dusseau, Andrea C. ; Culler, David E. ; Schauser, Klaus Erik ; Martin, Richard P.

  • Author_Institution
    Comput. Sci. Div., California Univ., Berkeley, CA, USA
  • Volume
    7
  • Issue
    8
  • fYear
    1996
  • fDate
    8/1/1996 12:00:00 AM
  • Firstpage
    791
  • Lastpage
    805
  • Abstract
    In this paper, we analyze four parallel sorting algorithms (bitonic, column, radix, and sample sort) with the LogP model. LogP characterizes the performance of modern parallel machines with a small set of parameters: the communication latency (L), overhead (o), bandwidth (g), and the number of processors (P). We develop implementations of these algorithms in Split-C, a parallel extension to C, and compare the performance predicted by LogP to actual performance on a CM-5 of 32 to 512 processors for a range of problem sizes. We evaluate the robustness of the algorithms by varying the distribution and ordering of the key values. We also briefly examine the sensitivity of the algorithms to the communication parameters. We show that the LogP model is a valuable guide in the development of parallel algorithms and a good predictor of implementation performance. The model encourages the use of data layouts which minimize communication and balanced communication schedules which avoid contention. With an empirical model of local processor performance, LogP predictions closely match observed execution times on uniformly distributed keys across a broad range of problem and machine sizes. We find that communication performance is oblivious to the distribution of the key values, whereas the local processor performance is not; some communication phases are sensitive to the ordering of keys due to contention. Finally, our analysis shows that overhead is the most critical communication parameter in the sorting algorithms
  • Keywords
    parallel algorithms; parallel machines; performance evaluation; sorting; CM-5; LogP model; Split-C; bandwidth; bitonic; column; communication latency; communication parameters; data layouts; fast parallel sorting; local processor performance; overhead; parallel algorithms; parallel machines; performance; radix; robustness; sample sort; sorting algorithms; Algorithm design and analysis; Bandwidth; Context modeling; Delay; Network topology; Parallel algorithms; Parallel machines; Predictive models; Processor scheduling; Sorting;
  • fLanguage
    English
  • Journal_Title
    Parallel and Distributed Systems, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1045-9219
  • Type

    jour

  • DOI
    10.1109/71.532111
  • Filename
    532111