مرکز منطقه ای اطلاع رساني علوم و فناوري - Scalable Parallel Coset Enumeration: Bulk Definition and the Memory Wall

Abstract :

Coset enumeration, like Gröbner bases, is a notoriously difficult algorithm to parallelize. We demonstrate a successful shared memory parallelization achieving a seven times speedup on an Origin 2000 CC-NUMA computer using 16 CPUs. We take as a testbed, an enumeration of Lyons’s group (8 835 156 cosets). This provides comparability with previous efforts in the literature (Cooperman and Havas, 1997; Havas and Ramsay, 2000) for which the best previous speedup was a factor of 4. The new parallelization depends on two new heuristics, clouds and shallow scan. Clouds is an example of bulk definition of cosets, which forms the key to our more efficient parallelization. The parallelization is implemented using TOP-C. By taking advantage of TOP-C’s option to compile for either shared or distributed memory, we also demonstrate the first efficient parallelization of a coset enumeration program using distributed memory. Our faster results expose for the first time in the context of coset enumeration the “memory wall", i.e. the latency barrier of the RAM. We verify this memory wall by showing on an Origin 2000 that a memory-bound parallel program with 64 CPUs doing nothing but randomly accessing RAM achieves at best a speedup of only 3.75 over the single-CPU version. Further, it is demonstrated that even sequential versions of coset enumerations programs are memory latency-bound in today’s technology. The lessons of bulk definition and of the memory wall carry over to related algorithms such as Gröbner bases, Knuth-Bendix, and other symbolic algebra algorithms with intermediate swell.