• DocumentCode
    941300
  • Title

    Effects of Instruction-Set Extensions on an Embedded Processor: A Case Study on Elliptic Curve Cryptography over GF(2m)

  • Author

    Bartolini, Sandro ; Branovic, Irina ; Giorgi, Roberto ; Martinelli, Enrico

  • Author_Institution
    Univ. di Siena, Siena
  • Volume
    57
  • Issue
    5
  • fYear
    2008
  • fDate
    5/1/2008 12:00:00 AM
  • Firstpage
    672
  • Lastpage
    685
  • Abstract
    Elliptic-Curve cryptography (ECC) is promising for enabling information security in constrained embedded devices. In order to be efficient on a target architecture, ECCs require accurate choice/tuning of the algorithms that perform the underlying mathematical operations. This paper contributes with a cycle-level analysis of the dependencies of ECC performance from the interaction between the features of the mathematical algorithms and the actual architectural and microarchitectural features of an ARM-based Intel XScale processor. Another contribution is the cycle-level analysis of a modified ARM processor that includes a word-level finite field polynomial multiplier (poly_mul) in its data path. This extension constitutes a good trade-off between applicability in a number of contexts, the simplicity of integration within the processor, and performance. This paper points out the most advantageous mix of elliptic curve (EC) parameters both for the standard ARM-based Intel XScale platform and for the one equipped with the polyjnul unit. In particular, the latter case allows for more than 41 percent execution time reduction on the considered benchmarks. Last, this paper investigates the correlation between the possible architectural organizations of a processor equipped with poly_mul unit(s) and EC benchmark performance. For instance, only superscalar pipelines can exploit the features of out-of-order execution and only very complex organizations (for example, four way superscalar) can exploit a high number of available ALUs. Conversely, we show that there are no benefits in endowing the processor with more than one poly_mul, and we point out a possible trade-off between performance and complexity increase: A two-way in-order/out-of-order pipeline allows +50 percent and +90 percent of Instructions per Cycle (IPC), respectively. Finally, we show that there are no critical constraints on the latency and pipelining capability of the polyjnul unit for the basic EC point mult- iplication.
  • Keywords
    Galois fields; computer architecture; digital arithmetic; embedded systems; instruction sets; public key cryptography; ARM-based Intel XScale processor architecture; Galois field; cycle-level analysis; data path; elliptic-curve cryptography; elliptic-curve point multiplication; embedded processor; information security; instruction-set extension; mathematical algorithm; superscalar pipeline; word-level finite field polynomial multiplier; Algorithm design and analysis; Elliptic curve cryptography; Elliptic curves; Galois fields; Information security; Microarchitecture; Out of order; Performance analysis; Pipeline processing; Polynomials; Cryptography; Elliptic curves; Hardware/software interfaces; Instruction set design; Microprocessor/microcomputer applications; Performance Evaluation; Pipeline processors; Portable devices; Processor Architectures; Public key cryptosystems;
  • fLanguage
    English
  • Journal_Title
    Computers, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0018-9340
  • Type

    jour

  • DOI
    10.1109/TC.2007.70832
  • Filename
    4358294