A Warp-Synchronous Implementation for Multiple-Length Multiplication on the GPU

Author

Takumi Honda;Yasuaki Ito;Koji Nakano

Author_Institution

Dept. of Inf. Eng., Hiroshima Univ., Higashi-Hiroshima, Japan

fYear

2015

Firstpage

96

Lastpage

102

Abstract

If we process large-integers on the computer, they are represented by multiple-length integer. Multiple-length multiplication is widely used in areas such as scientific computation and cryptography processing. However, the computation cost is very high since CPU does not support a multiple-length integer. In this paper, we present a GPU implementation of bulk multiple-length multiplications. The idea of our GPU implementation is to adopt warp-synchronous programming. We assign each multiple-length multiplication to one warp that consists of 32 threads. In parallel processing using multiple threads, usually, it is costly to synchronize execution of threads and communicate within threads. In warp-synchronous programming, however, execution of threads in a warp can be synchronized instruction by instruction without any barrier synchronous operations. Also, inter-thread communication can be performed by warp shuffle functions without accessing shared memory. The experimental results show that our GPU implementation on NVIDIA GeForce GTX 980 attains a speed-up factor of 62 for 1024-bit multiple-length multiplication over the single CPU implementation.

Keywords

"Graphics processing units","Instruction sets","Synchronization","Registers","Kernel","Programming"

Publisher

ieee

Conference_Titel

Computing and Networking (CANDAR), 2015 Third International Symposium on

Electronic_ISBN

2379-1896

Type

conf

DOI

10.1109/CANDAR.2015.13

Filename

7424695