Title :
Pushing the Performance Envelope of Modular Exponentiation Across Multiple Generations of GPUs
Author :
Emmart, Niall ; Weems, Charles
Author_Institution :
Sch. of Comput. Sci., Univ. of Massachusetts, Amherst, MA, USA
Abstract :
Multiprecision modular exponentiation is a key operation in popular encryption schemes such as RSA, but is computationally expensive. Contexts such as handling many secure web connections in a server can demand higher rates of exponent operations than a traditional multicore can support. Graphics processors offer an opportunity to accelerate batches of exponent calculations both by executing them in parallel as well as through parallelizing the operations within the multiprecision arithmetic itself. However, obtaining performance close to the theoretical peak can be extremely challenging. Furthermore, each new generation of GPU architecture can require a substantially different approach to achieve maximum performance. In this paper we show how we improve modular exponentiation performance over prior results by at factors ranging from 2.6 to 24, across generations of NVIDIA GPU, from compute capability 1.1 onward. Of particular interest is the parameter space that must be searched to find the optimal configuration of memory layout, launch geometry, and algorithm for each architecture at different problem sizes. Our efforts have resulted in a set of tools for generating library functions in the PTX assembly language and searching to find these optima. From our experience it can be argued that a new programming paradigm is needed to achieve full performance potential on core library components as GPUs evolve through multiple generations.
Keywords :
assembly language; graphics processing units; software libraries; GPU architecture; NVIDIA GPU; PTX assembly language; RSA; compute capability; core library components; encryption schemes; exponent operations; graphics processing unit; graphics processors; launch geometry; library functions; memory layout; multiprecision modular exponentiation performance; multiprocessing arithmetic; optimal configuration; secure Web connections; Computational modeling; Computer architecture; Generators; Graphics processing units; Load modeling; Message systems; Registers; GPU accelerated modular exponentiation; SSL acceleration with GPUs; asymmetric cryptography on GPUs; modular exponentiation;
Conference_Titel :
Parallel and Distributed Processing Symposium (IPDPS), 2015 IEEE International
Conference_Location :
Hyderabad
DOI :
10.1109/IPDPS.2015.69