Title :
Scalable Montgomery Modular Multiplication Architecture with Low-Latency and Low-Memory Bandwidth Requirement
Author :
Wen-Ching Lin ; Jheng-Hao Ye ; Ming-Der Shieh
Author_Institution :
Dept. of Electr. Eng., Nat. Cheng Kung Univ., Tainan, Taiwan
Abstract :
Montgomery modular multiplication is widely used in public-key cryptosystems. This work shows how to relax the data dependency in conventional word-based algorithms to maximize the possibility of reusing the current words of variables. With the greatly relaxed data dependency, we then proposed a novel scheduling scheme to alleviate the number of memory access in the developed scalable architecture. Analytical results show that the memory bandwidth requirement of the proposed scalable architecture is almost 1/(w - 1) times that of conventional scalable architectures, where w denotes word size. The proposed one also retains a latency of exactly one cycle between the operations of the same words in two consecutive iterations of the Montgomery modular multiplication algorithm when employing enough processing elements. Compared to the design in the related work, experimental results demonstrate that the proposed one achieves an almost 54 percent reduction in power consumption with no degradation in throughput. The reduced number of memory access not only leads to lower power consumption, but also facilitates the design of scalable architectures for any precision of operands.
Keywords :
matrix multiplication; public key cryptography; scheduling; storage management; greatly relaxed data dependency; low-latency low-memory bandwidth requirement; memory access; operands; power consumption; processing elements; public-key cryptosystems; scalable Montgomery modular multiplication architecture; scalable architecture; scheduling scheme; throughput; word-based algorithms; Algorithm design and analysis; Bandwidth; Equations; Memory management; Scheduling; Strontium; Algorithm design and analysis; Bandwidth; Cryptosystems; Equations; Memory management; Montgomery modular multiplication; Scheduling; Strontium; VLSI; low-power design; scalable architecture;
Journal_Title :
Computers, IEEE Transactions on
DOI :
10.1109/TC.2012.218