DocumentCode :
941300
Title :
Effects of Instruction-Set Extensions on an Embedded Processor: A Case Study on Elliptic Curve Cryptography over GF(2m)
Author :
Bartolini, Sandro ; Branovic, Irina ; Giorgi, Roberto ; Martinelli, Enrico
Author_Institution :
Univ. di Siena, Siena
Volume :
57
Issue :
5
fYear :
2008
fDate :
5/1/2008 12:00:00 AM
Firstpage :
672
Lastpage :
685
Abstract :
Elliptic-Curve cryptography (ECC) is promising for enabling information security in constrained embedded devices. In order to be efficient on a target architecture, ECCs require accurate choice/tuning of the algorithms that perform the underlying mathematical operations. This paper contributes with a cycle-level analysis of the dependencies of ECC performance from the interaction between the features of the mathematical algorithms and the actual architectural and microarchitectural features of an ARM-based Intel XScale processor. Another contribution is the cycle-level analysis of a modified ARM processor that includes a word-level finite field polynomial multiplier (poly_mul) in its data path. This extension constitutes a good trade-off between applicability in a number of contexts, the simplicity of integration within the processor, and performance. This paper points out the most advantageous mix of elliptic curve (EC) parameters both for the standard ARM-based Intel XScale platform and for the one equipped with the polyjnul unit. In particular, the latter case allows for more than 41 percent execution time reduction on the considered benchmarks. Last, this paper investigates the correlation between the possible architectural organizations of a processor equipped with poly_mul unit(s) and EC benchmark performance. For instance, only superscalar pipelines can exploit the features of out-of-order execution and only very complex organizations (for example, four way superscalar) can exploit a high number of available ALUs. Conversely, we show that there are no benefits in endowing the processor with more than one poly_mul, and we point out a possible trade-off between performance and complexity increase: A two-way in-order/out-of-order pipeline allows +50 percent and +90 percent of Instructions per Cycle (IPC), respectively. Finally, we show that there are no critical constraints on the latency and pipelining capability of the polyjnul unit for the basic EC point mult- iplication.
Keywords :
Galois fields; computer architecture; digital arithmetic; embedded systems; instruction sets; public key cryptography; ARM-based Intel XScale processor architecture; Galois field; cycle-level analysis; data path; elliptic-curve cryptography; elliptic-curve point multiplication; embedded processor; information security; instruction-set extension; mathematical algorithm; superscalar pipeline; word-level finite field polynomial multiplier; Algorithm design and analysis; Elliptic curve cryptography; Elliptic curves; Galois fields; Information security; Microarchitecture; Out of order; Performance analysis; Pipeline processing; Polynomials; Cryptography; Elliptic curves; Hardware/software interfaces; Instruction set design; Microprocessor/microcomputer applications; Performance Evaluation; Pipeline processors; Portable devices; Processor Architectures; Public key cryptosystems;
fLanguage :
English
Journal_Title :
Computers, IEEE Transactions on
Publisher :
ieee
ISSN :
0018-9340
Type :
jour
DOI :
10.1109/TC.2007.70832
Filename :
4358294
Link To Document :
بازگشت