Title :
A Warp-Synchronous Implementation for Multiple-Length Multiplication on the GPU
Author :
Takumi Honda;Yasuaki Ito;Koji Nakano
Author_Institution :
Dept. of Inf. Eng., Hiroshima Univ., Higashi-Hiroshima, Japan
Abstract :
If we process large-integers on the computer, they are represented by multiple-length integer. Multiple-length multiplication is widely used in areas such as scientific computation and cryptography processing. However, the computation cost is very high since CPU does not support a multiple-length integer. In this paper, we present a GPU implementation of bulk multiple-length multiplications. The idea of our GPU implementation is to adopt warp-synchronous programming. We assign each multiple-length multiplication to one warp that consists of 32 threads. In parallel processing using multiple threads, usually, it is costly to synchronize execution of threads and communicate within threads. In warp-synchronous programming, however, execution of threads in a warp can be synchronized instruction by instruction without any barrier synchronous operations. Also, inter-thread communication can be performed by warp shuffle functions without accessing shared memory. The experimental results show that our GPU implementation on NVIDIA GeForce GTX 980 attains a speed-up factor of 62 for 1024-bit multiple-length multiplication over the single CPU implementation.
Keywords :
"Graphics processing units","Instruction sets","Synchronization","Registers","Kernel","Programming"
Conference_Titel :
Computing and Networking (CANDAR), 2015 Third International Symposium on
Electronic_ISBN :
2379-1896
DOI :
10.1109/CANDAR.2015.13