Title :
Parallel multiple precision division by a single precision divisor
Author :
Emmart, Niall ; Weems, Charles
Author_Institution :
Comput. Sci. Dept., Univ. of Massachusetts, Amherst, MA, USA
Abstract :
We report an algorithm for division of a multi- precision integer by a single-precision value using a graphics processing unit (GPU). Our algorithm combines a parallel version of Jebelean´s exact division algorithm with a left-to- right algorithm for computing the borrow chain, to relax the requirement of exactness. We also employ Takahashi´s recently reported cyclic reduction technique [10] for GPU division to further enhance performance. The result is that our algorithm is asymptotically faster, at O(n/p + log p), than Takahashi´s algorithm at 0(n/p log p). We report results for dividends with precisions of 1024, 2048, and 4096 bits running on an NVIDIA GTX 480, and show that, for non-constant divisors, our algorithm is 20% slower at 1024 bits (due to startup overhead), by 2048 we are 40% faster, and at 4096 bits we are able to run 2.5 times faster. For division by constants, with precomputed tables, our algorithm is faster at all sizes with a speedup ranging from 2.3 to 6 times faster.
Keywords :
digital arithmetic; graphics processing units; GPU division; Jebelean exact division algorithm; NVIDIA GTX 480; borrow chain; cyclic reduction technique; graphics processing unit; left-to-right algorithm; multiprecision integer; nonconstant divisors; parallel multiple precision division; single precision divisor; Algorithm design and analysis; Cognition; Context; Educational institutions; Graphics processing unit; Indexes; Parallel algorithms;
Conference_Titel :
High Performance Computing (HiPC), 2011 18th International Conference on
Conference_Location :
Bangalore
Print_ISBN :
978-1-4577-1951-6
Electronic_ISBN :
978-1-4577-1949-3
DOI :
10.1109/HiPC.2011.6152712