DocumentCode
3757149
Title
A Warp-Synchronous Implementation for Multiple-Length Multiplication on the GPU
Author
Takumi Honda;Yasuaki Ito;Koji Nakano
Author_Institution
Dept. of Inf. Eng., Hiroshima Univ., Higashi-Hiroshima, Japan
fYear
2015
Firstpage
96
Lastpage
102
Abstract
If we process large-integers on the computer, they are represented by multiple-length integer. Multiple-length multiplication is widely used in areas such as scientific computation and cryptography processing. However, the computation cost is very high since CPU does not support a multiple-length integer. In this paper, we present a GPU implementation of bulk multiple-length multiplications. The idea of our GPU implementation is to adopt warp-synchronous programming. We assign each multiple-length multiplication to one warp that consists of 32 threads. In parallel processing using multiple threads, usually, it is costly to synchronize execution of threads and communicate within threads. In warp-synchronous programming, however, execution of threads in a warp can be synchronized instruction by instruction without any barrier synchronous operations. Also, inter-thread communication can be performed by warp shuffle functions without accessing shared memory. The experimental results show that our GPU implementation on NVIDIA GeForce GTX 980 attains a speed-up factor of 62 for 1024-bit multiple-length multiplication over the single CPU implementation.
Keywords
"Graphics processing units","Instruction sets","Synchronization","Registers","Kernel","Programming"
Publisher
ieee
Conference_Titel
Computing and Networking (CANDAR), 2015 Third International Symposium on
Electronic_ISBN
2379-1896
Type
conf
DOI
10.1109/CANDAR.2015.13
Filename
7424695
Link To Document