Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Hardware Acceleration of ECC

Download as pdf or txt
Download as pdf or txt
You are on page 1of 102

Magnus Hirth

Hardware Acceleration of Asymmetric


Elliptic Curve Cryptography
Master’s thesis

Master’s thesis in Electronics Systems Design and Innovation


Supervisor: Per Gunnar Kjeldsberg
July 2019
NTNU
Norwegian University of Science and Technology
Faculty of Information Technology and Electrical
Engineering
Department of Electronic Systems
Magnus Hirth

Hardware Acceleration of Asymmetric


Elliptic Curve Cryptography

Master’s thesis in Electronics Systems Design and Innovation


Supervisor: Per Gunnar Kjeldsberg
July 2019

Norwegian University of Science and Technology


Faculty of Information Technology and Electrical Engineering
Department of Electronic Systems
i

Asymmetric cryptography, which is also known as public-key cryptography,


provide algorithms for encryption and decryption of data, digital signatures and
authentication. Compared with traditional asymmetric techniques, e.g. the RSA
algorithm, the elliptic curve cryptography (ECC) achieves an equivalent level of
security with smaller key sizes resulting in memory as well as bandwidth savings.
Computational intensive operations like scalar multiplication on elliptic curves are
required during the processing of ECC protocols. Using dedicated hardware units
for these operations improves execution time in an energy efficient manner. Most
implementations are based on high-end CPUs and GPUs and their use in mobile
devices with limited power resources such as smartcards is untested.
This assignment is a continuation of an autumn project focusing on a theoretical
and practical study of ECC, including experiments and profiling using Python and
C-based code versions. Based on the results from these profiling experiments, this
master thesis work will test the hypothesis that a hardware accelerated ECC
implementation where the entire scalar multiplication operation is optimized to
minimize memory transfers leads to a more energy efficient yet generic
implementation.
iii

NTNU

Abstract
Faculty Name
IE

Master Thesis

Hardware Acceleration of Asymmetric Elliptic Curve Cryptography


by Magnus H IRTH

With the great number of mobile, battery powered devices and IoT de-
vices being developed, there is a need for efficient, energy effective cryptog-
raphy. Elliptic curve cryptography (ECC) provides high security with small
key size, and seems very well suited for use in embedded, low-power sys-
tems.
The mathematics of ECC are based on set theory, performing operations
on elliptic curves, usually over finite prime fields or binary fields. The secu-
rity of these mathematical operations are based on the Elliptic Curve Discrete
Logarithm Problem.
This thesis has explored how to design a coprocessor for accelerating el-
liptic curve cryptography, based on the results from a pre-study. The copro-
cessor designed in the thesis, ECCo, was designed for use with the ARM
CM33 processor. The CM33 provides a coprocessor interface for tight inte-
gration of coprocessors, which allows instructions to be issued to connected
coprocessors from software. This motivated the design of an instruction set
for the coprocessor.
For the design in this thesis the operations of modular addition, modu-
lar multiplication and integer division was implemented. The design used
for testing consisted of a controller, register bank and arithmetic module.
A pure software implementation of elliptic curve cryptography, libecc, was
compared to the ECCo. Results showed that the hardware accelerated de-
signed performed 3.8x - 27x times better than the pure software implemen-
tation.
Area estimates of the design was aquired through synthesis, using Ques-
tasim. The ECCo accounted for 45% of the area when synthesizing ECCo+CM33.
The estimates showed that the ECCo area consumption was largely domi-
nated by the divisor (73.18% of the total ECCo area), which was implemented
using the SystemVerilog division operator, "/", and no optimization in syn-
thesis. However, the atomic operations of ECC, Modular Multiplication and
Modular Addition, only occupied 1.97% and 1.92%, respectively.
v

Preface
This thesis is a continuation of an autumn project which explored how an
hardware accelerator of elliptic curve cryptography should be implemented
in order to address the shortcomings of elliptic curve cryptography in soft-
ware. Part of the theory is reused from the project. The project will from now
on be referred to as the pre-study.
vii

Contents

Abstract iii

Preface v

1 Introduction 1
1.1 Asymmetric Cryptography . . . . . . . . . . . . . . . . . . . . 1
1.2 Objective and Approach . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Main Contributions . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Background 5
2.1 Set theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.1 Finite Field Arithmetic . . . . . . . . . . . . . . . . . . . 6
2.2 Elliptic Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.1 EC over F p . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.2 EC over F2k . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.3 Point Arithmetics . . . . . . . . . . . . . . . . . . . . . . 9
2.3 Scalar Multiplication . . . . . . . . . . . . . . . . . . . . . . . . 10
2.4 Coordinate Systems . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.5 ECC Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.6 Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.7 ARM Cortex M33 . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.8 Hardware Acceleration . . . . . . . . . . . . . . . . . . . . . . . 13
2.9 libecc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.10 Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3 Previous Work 15
3.1 Modular Addition Implementation . . . . . . . . . . . . . . . . 15
3.2 Modular Multiplication Implementation . . . . . . . . . . . . . 16
3.3 FPGA Elliptic Curve Coprocessor . . . . . . . . . . . . . . . . . 18
3.4 Pre-Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

4 Methodology and Architecture Design 19


4.1 ECCo Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.2 Choice of Alorithms . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.3 Interpretation of Algorithms . . . . . . . . . . . . . . . . . . . . 21
4.4 Test Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.5 Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.6 Internal Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.7 Area Measurement . . . . . . . . . . . . . . . . . . . . . . . . . 23
viii

5 Implementation 25
5.1 ECCo Instruction Set . . . . . . . . . . . . . . . . . . . . . . . . 25
5.2 ECCo Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 27
5.3 Internal Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5.4 Register Bank . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.5 Arithmetic Module . . . . . . . . . . . . . . . . . . . . . . . . . 30
5.5.1 Negation . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.5.2 Integer Division . . . . . . . . . . . . . . . . . . . . . . . 32
5.5.3 Modular Addition . . . . . . . . . . . . . . . . . . . . . 33
5.5.4 Modular Multiplication . . . . . . . . . . . . . . . . . . 33
5.5.5 Test Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.5.6 Verification - Arithmetic Module . . . . . . . . . . . . . 35
5.6 Controller Module . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.6.1 Verification - Controller Module . . . . . . . . . . . . . 37
5.7 Verification - ECCo . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.8 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.8.1 ECCo Wrapper . . . . . . . . . . . . . . . . . . . . . . . 39
5.8.2 Big Number library . . . . . . . . . . . . . . . . . . . . . 39
5.8.3 Benchmark Software . . . . . . . . . . . . . . . . . . . . 40

6 Results 43
6.1 Speed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
6.2 Area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

7 Future Work 47
7.1 Instruction Set Architecture . . . . . . . . . . . . . . . . . . . . 47
7.2 Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
7.3 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

8 Conclusion 49

A Test Data Python script 51

B Internal Interfaces SV Code 55

C Test Data 57

D ECCo C Wrapper 59

E ECCo Big Number library 65

F Benchmark & Test program 77


ix

List of Abbreviations

CM33 ARM Cortex M33


CP CoProcessor
EC Elliptic Curve
ECC Elliptic Curve Cryptography
ECCo Elliptic curve Cryptography Coprocessor
DMA Direct Memory Access
DSP Digital Signal Processor
DUT Design Under Test
FPU Floating Point Unit
FSM Finite State Machine
ISA Instruction Set Architecture
LSB Least Significant Bit
MA Modular Addition
MM Modular Multiplication
MSB Most Significant Bit
OOP Object Oriented Programming
SIMD Singel Instruction Multiple Data
SM Scalar Multiplication
SV SystemVerilog
SVA SystemVerilog Assertions
TLS Transport Level Security protocol
1

Chapter 1

Introduction

Today, many mobile and embedded devices are being used daily, and the
number of such devices are ever increasing. Embedded devices are used in
many applications where security is a concern, be it for a company or per-
sonal privacy: In hospitals, smart cards (banking, SIM, access control), mo-
bile phones, wifi routers, etc. Many of these use battery powered devices,
which in addition to security issues require low power solutions. This issue
motivates the exploration of low-power implementation of cryptographic al-
gorithms. A field of cryptography which seems suited for low-power appli-
cations is Elliptic Curve Cryptography (ECC), which was introduced in the
80s by Neil Koblitz [1] and Victor Miller [2]. It has gained popularity for
desktop and server use, and many of the algorithms in the Transport Level
Security protocol 1.3 (TLS 1.3) are elliptic curve (EC) algorithms.
In this thesis an implementation of a coprocessor for the ARM Cortex-
M33 (CM33) designed for accelerating Elliptic Curve Cryptography (ECC)
is designed and tested. The work is a continuation of the autumn project
on hardware acceleration of ECC, which concluded that the optimal use of
a hardware accelerator were to perform the entire operation of scalar multi-
plication (SM) in hardware. The implementation in this thesis aims at accel-
erating the entire SM in hardware, and taking advantage of the features the
coprocessor interface of the CM33 provides.
In this thesis cryptosystem is used in the same way as defined in [3]: “A
cryptosystem is a general term referring to a set of cryptographic primitives
used to provide information security services. Most often the term is used in
conjunction with primitives providing confidentiality, i.e., encryption.”
Also, the term big numbers are used to refer to numbers of bit length longer
than a processors word length.

1.1 Asymmetric Cryptography


Asymmetric cryptography, also known as public key cryptography, are cryp-
tosystems which uses key pairs: A public key and a private key. The private
key is only known to the owner, while the public key can be obtained by any-
one without compromising the security of the system. The private key may
be used to create a digital signature of a message, which allows anyone who
got both the public key and the message to verify that the message has not
2 Chapter 1. Introduction

been corrupted, or the private key may be used to decrypt a message which
has been encrypted using the public key.
The security of public key cryptography systems relies on the private key
being infeasible for an attacker to compute, but not impossible given infinite
time and resources. That is, public key cryptosystems are computationally se-
cure and it is infeasible for an attacker to compute the private key if it requires
≈ 10100 instructions [4].
Another very common type of cryptosystems are symmetric cryptogra-
phy which uses a single shared key. These systems usually require smaller
key sizes and have lower power consumption compared to public key sys-
tems [5][6]. Because of this symmetric cryptosystems are prefered when en-
crypting large amounts of data, but since they require the shared key to be
shared over a secure channel it is usually not sufficient to rely solely on sym-
metric key cryptography. As a possible solution to this, a public key cryp-
tosystem was introduced in 1976 by Whitfield Diffie and Martin E. Hellman
[4] which enables two parties to securely share a key over an insecure chan-
nel, thus allowing secure communication through a combination of asym-
metric and symmetric cryptosystems.
This combination of symmetric and asymmetric cryptosystems are now
standard and the TLS 1.3 [7] standard describes a set of cryptosystems to
use for secure communication over insecure channels. A number of these
systems are public key systems and with the increasing demand for high
security without reducing the efficiency of low power devices such as IoT
[8][9] and mobile devices [10] it seems like a good incentive to explore the
possibilities of accelerating public key cryptosystems.
Further more, TLS defines a number of ellptic curve (EC) cryptosystems
to use. EC cryptosystems are systems that uses mathematics based on elliptic
curves and have traits that makes them suited for use in resource limited
environments, such as for IoT devices. ECC algorithms are often considered
safer than their non-EC counterparts [1], and this safety is provided with
smaller key sizes. The benefit of smaller key sizes is that less storage for the
variables of the algorithm is required and less data needs to be transfered
between devices. An efficient and good implementation of ECC algorithms
could potentially benefit IoT devices by reducing power consumption while
still maintaning high security.

1.2 Objective and Approach


The objective of this thesis is to explore how to design a coprocessor for accel-
erating elliptic curve cryptography, based on the conclusion of the pre-study
[11]. This thesis tries to describe how such a coprocessor could be imple-
mented, and implement as much of the proposed design as possible. The
implemented design should be benchmarked and compared to the perfor-
mance of a pure software implementation, to show what benefits a coproces-
sor could provide.
The design approach is to consider multiple possible designs before choos-
ing one that is appropriate for the setup used in this thesis. All modules
1.3. Main Contributions 3

should be tested separately during the development process, using test data
generated by software scripts, providing reliable test data.

1.3 Main Contributions


The main contributions of this thesis is the design of a flexible coprocessor
aimed at accelerating elliptic curve cryptography, with the possibility of ex-
tending use to non-EC asynchronous cryptography. Detailing both the de-
sign and the design process.
Also, for this thesis a generic modular addition algorithm was designed.
A C library for big numbers was implemented. The library was designed
for use with the elliptic curve coprocessor, supporting conversion to and
from string representation and loading/storing to/from coprocessor regis-
ters.

1.4 Structure
Chapter 2 presents mathematical and other related background information
necessary for the rest of the thesis. In Chapter 3 previous work relevant
for this thesis is presented. Chapter 4 details the methodology and design
choices of the coprocessor. Chapter 5 describes the implementation details
of the design, and Chapter 6 presents the results of the thesis. Finally, Chap-
ter 7 discusses thoughts on future work on the coprocessor, and Chapter 8
concludes the report.
5

Chapter 2

Background

This thesis is mainly concerned with elliptic curve cryptography, which are
cryptosystems that uses mathematical operations on elliptic curves over fi-
nite fields. In order to give the reader a better understanding of these subjects
this chapter gives a brief introduction into the mathematical field of set the-
ory, focusing on the understanding of finite fields, and explaining the funda-
mentals of elliptic curves and related arithmetic operations on elliptic curves.
Further, this chapter describes algorithms for implementation of modular
arithmetic and elliptic curve operations in hardware, which are used later
in the implementation of the coprocessor. Lastly this chapter also briefly de-
scribes the tools used.

2.1 Set theory


A set is (informally) a collection of objects (or elements). Sets are classified
according to their mathematical properties. In this report the sets of interest
are the finite fields, also called Galois fields, denoted by GF (q) or Fq . Finite
fields are, without going into details, a set with a finite number, q, of elements
where q = pk (p is prime and k > 0), on which the multiplication, addition,
subtraction and division operations are defined [12, p.310]. In this thesis we
are only interested in finite fields of integers, and, in particular, finite fields
Fq containing all integers from 0 up to, but not including, q. For the rest of
the thesis all fields will be assumed to be of this kind. These fields can be
constructed with the modulo operator, because: x = y mod q, where y can
be any integer, x will always be in the range 0 ≤ x < q. A simple example of
such a finite field is F7 , shown in Equation 2.1. It is a field with 7 elements,
and can be constructed with modulo 7.

F7 = {0, 1, 2, 3, 4, 5, 6} (2.1)
If there exists a positive integer n such that n · a = 0 for all a ∈ F then the
smallest such number is called the characteristic of F. If no such number exist
then the characteristic of F is said to be zero [12, p.170]. In our example of F7
the characteristic is 7, since 7 · a ≡ 0 (mod 7) for a ∈ F7 . The characteristic
of any finite field GF ( pk ) is p [12, p.311]. The size of a field, q, is also called
the order of the field.
Of particular interest when working with elliptic curves are finite fields
where q = p1 , prime fields, and finite fields where q = 2k , binary fields.
6 Chapter 2. Background

2.1.1 Finite Field Arithmetic


For this report we are only concerned with finite fields, which implies that
all arithmetic operations in field elements are, in fact, moldular arithmetic
operations.
The reader is assumed to have basic knowledge of modular arithmetics,
but examples of the basic operations on F7 are illustrated in Equations 2.2-
2.5.

4+6 = 3 (2.2)
1−5 = 3 (2.3)
2·5 = 3 (2.4)
5 · 4−1 = 3 (2.5)

Equations 2.2, 2.4 and 2.5 is 3 since 10 ≡ 3 (mod 7) and Equation 2.3 is
3 since −4 ≡ 3 (mod 7). Equation 2.5 is an example of modular division
which is the most complicated operation of the four. In order to perform
modular division one needs to find the modular inverse of the divisor, which
is why modular division often is written as in Equation 2.5, avoiding the
division operator, "/", to avoid confusion with integer division. [13]
To find the modular inverse of a field element the Extended Euclidean
Algorithm is used [14]. It is an extension to the Euclidean Algorithm which
is an algorithm for finding the greatest common divisor of two numbers, a
and b [15]. The extended algorithm can further be used to find two numbers,
x and y, such that:

ax + by = gcd( a, b) (2.6)
For the level of details needed in this report we can now simply say that
a and b has to be co-prime (gcd( a, b) = 1) and assign b = q, the field size. It
can be shown that this leads to Equation 2.7.

ax ≡ 1 (mod q) (2.7)
This allows us to find the inverse x of element a by solving for x (x ∈ Fq ).
In Equation 2.5 a = 4 and q = 7, and so, we can find the inverse of 4 by
solving for x in Equation 2.7:

4x ≡ 1 (mod 7)

x=2

Equation 2.5 can then be explained by replacing 4−1 with the modular
inverse of 4:

5·2 ≡ 3 (mod 7)
2.2. Elliptic Curves 7

2.2 Elliptic Curves


Only elliptic curves over F p and F2m are presented as these are the most com-
mon in ECC. Details will not be provided, only required conditions and a
brief explanation of arithmetic on the curves are provided. A more detailed
explanation can be found in [16]. The goal of this section is to get an intu-
itive understanding of what elliptic curves are, and the difference between
continuous and discrete elliptic curves.

2.2.1 EC over F p
“Let F p be a prime finite field so that p is an odd prime number, and let
a, b ∈ F p satisfy 4a3 + 27b2 6≡ 0 (mod p). Then an elliptic curve E(F p ) over
F p defined by the parameters a, b ∈ F p consists of the set of solutions or
points P = ( x, y) for x, y ∈ F p to the equation:

y2 ≡ x3 + ax + b (mod p) (2.8)
together with an extra point O called the point at infinity.” [16]

(0,6) (3,6) (4,6)


(5,5)
(6,4)
(6,3)
(5,2)
(0,1) (3,1)(4,1)
(1,0) x

F IGURE 2.1: Illustration of y2 = x3 − 2x + 1 with the solutions


to Equation 2.8 in F7 plotted.

Figure 2.1 illustrates the elliptic curve y2 = x3 − 2x + 1, x ∈ [−7, 7]. The


continuous curve is the common way to illustrate an elliptic curve, over an
8 Chapter 2. Background

infinite field. However, in cryptography finite fields are used, in which case
there only exists discrete solutions to the elliptic curve, and for all of the
solutions the x and y values must be in F p .
The discrete solutions to the elliptic curve (Equation 2.8) are plotted in
Figure 2.1, and it is apparent that only the solutions (0, 1) and (1, 0) lie on
the curve itself. This is because the x and/or y values resulting in the other
solutions produced a LHS or RHS value in Equation 2.8 which were ≥ 7.

2.2.2 EC over F2k


“Let F2m be a characteristic 2 finite field, and let a, b ∈ F2m satisfy b 6= 0
in F2m . Then a elliptic curve E(F2m ) over F2m defined by the parameters
a, b ∈ F2m consists of the set of solutions or points P = ( x, y) for x, y ∈ F2m to
the equation:

y2 + xy ≡ x3 + ax2 + b (mod p) (2.9)


together with an extra point O called the point at infinity.” [16]

(0,6)(1,6)

(6,4)
(2,3)
(2,2)(3,2)(4,2)
(0,1) (4,1)(5,1)
(1,0) x

F IGURE 2.2: Illustration of y2 + xy = x3 − 2x2 + 1 with the


solutions to Equation 2.9 in F7 plotted.

Figure 2.2 illustrates the elliptic curve y2 + xy = x3 − 2x2 + 1, x ∈ [−7, 7].


Also here both the continuous curve over an infinite field is plottet, along
with the discrete solutions to the elliptic curve.
2.2. Elliptic Curves 9

2.2.3 Point Arithmetics


In this report the arithmetic operations we are interested in on elliptic curves
are point addition and point doubling. An intuitive geometric understand-
ing of these operations where provided by Neal Koblitz [1], as illustrated in
Figure 2.3.

y y

P1
P2 P1

x x
P3 = P1 + P2
P3 = 2P1

F IGURE 2.3: Illustration of elliptic curve point addition and


doubling.

Let P1 = ( x1 , y1 ), P2 = ( x2 , y2 ) and P3 = ( x3 , y3 ) be points on an elliptic


curve, where P3 = P1 + P2 . Draw a line P1 P2 through P1 and P2 , then their
sum P3 will be the negative of the intersection of P1 P2 and the curve.
The following equations is a result of the observations from Figure 2.3,
but there is not provided enough information to prove it. For a detailed ex-
planation see [1].

x3 ≡ − x1 − x2 + α2 (mod p) (2.10)
y3 ≡ −y1 + α( x1 − x3 ) (mod p) (2.11)

where
( y2 − y1
x2 − x1 ifP1 6= P2
α= 3x12 + a (2.12)
2y1 ifP1 = P2
In the case of elliptic curves over F2m , when P1 6= P2 :

x3 ≡ α2 + α + x1 + x2 + a (mod p) (2.13)
y3 ≡ α( x1 + x3 ) + x3 + y1 (mod p) (2.14)
y + y2
α= 1 (2.15)
x1 + x2
and when P1 = P2 :
10 Chapter 2. Background

x3 ≡ α2 + α + a (mod p) (2.16)
y3 ≡ x12
+ ( α + 1) x3 (mod p) (2.17)
y
α = x1 + 1 (2.18)
x1
Note that all of these operations require modular inversion for the divi-
sion in the calculation of α, which is an expensive operation.

2.3 Scalar Multiplication


The central mathematical operation in all EC cryptosystems are the scalar
multiplication, which is to multiply a scalar with a point on an elliptic curve.
There are multiple different algorithms for performing a scalar multiplica-
tion. Most of these are based on the observation that any multiplication of a
point and a scalar can be expressed as a combination of point additions and
doublings, e.g. 11P = P + 2( P + 2(2P)). There are many optimized algo-
rithms for this, and in many applications it is desirable to use algorithms that
have a constant execution time, for security reasons. However, in this thesis
a basic algorithm, with varying execution time, is presented.
Algorithm 1 displays the pseudocode for this algorithm, called Double-
and-add (left-to-right).

Algorithm 1 Double-and-add (left-to-right) [17]


INPUT: Base point P ∈ EF , scalar k = (k t−1 , ..., k0 )2
OUTPUT: Point Q = k · P
1: R0 ← ∞; R1 ← P
2: for i from t − 1 downto 0 do
3: R0 ← 2R0
4: if k i = 1 then
5: R0 ← R0 + R1
6: end if
7: end for
8: Q ← R0

In this algorithm P is the base point on the curve, which is being multi-
plied with the scalar k, and Q is the resulting point on the curve. t is the bit
length of k. What Algorithm 1 does is to iterate through all the bits in k, start-
ing to the left (most significant bit). First R0 is set to the point at infinity, and
R1 to the base point P. For each iteration it performes point doubling of R0
(doubling of point at infinity returns the point at infinity), and if the current
bit i is 1 then the point addition of R0 and R1 is stored in R0 (addition of a
point at infinity and a point P returns the point P).
This algorithm will perform t point doublings and, in worst case, t point
additions.
2.4. Coordinate Systems 11

2.4 Coordinate Systems


Elliptic Curves are often represented using affine coordinates, ( x, y), as we
have done so far, but there are several different coordinate systems with dif-
ferent attributes available. The purpose for using different coordinate sys-
tems is usually to increase performance. The way computation time is com-
pared between coordinate systems is by calculating how many inversions (I),
multiplications (M), and squarings (S) an addition or doubling operation re-
quire. From equations 2.10, 2.11 and 2.12 we see that in affine coordinates (A)
the computation times are t(A + A) = I + 2M + S and t(2A) = I + 2M + 2S.
[18]
An alternative coordinate representation often used in practice is projec-
tive coordinates (P ). Here a point P is represented by a touple ( X, Y, Z ),
where x = X Y
Z and y = Z . Using projective coordinates the computation time
is t(P + P ) = 12M + 2S and t(2P ) = 7M + 5S. [18] The main motivation
for using projective coordinates is reduced computation time since there is
no inversion using projective coordinates, which is an expensive operation,
as noted in Chapter 2.2.
There are other common alternatives for coordinates, as described in [3,
p.86] and [18], but they will not be discussed here.

2.5 ECC Algorithms


Elliptic curve cryptography is commonly used for handshakes and digital
signatures, such as in the Transport Layer Security (TLS) protocol 1.3 [7]. To
add some perspective as to how the scalar multiplication is used in ECC this
section will outline the Elliptic Curve Digital Signature Algorithm (ECDSA)
[19].
The two parties involved will be refered to as Alice and Bob [20], where
Alices private and public key are d A and Q A , respectively. Same for Bob,
d B and Q B . For all ECC algorithms Alice and Bob have to agree on a set of
parameters, D. In the case of F p these parameters are D = (q, a, b, G, n, h),
where:

q Are the field order (Number of elements in the field. See Chapter 2.1)

a, b Are the elliptic curve coefficients (See Equation 2.8)

G Is the base point on the curve.

n Is the order of G; The smallest positive number such that n · G = O


n
h Is a number such that h = q

For F2m the parameters are D = (m, f ( x ), a, b, G, n, h), where f ( x ) is an irre-


ducible binary polynomial of degree m specifying the representation of F2m .
12 Chapter 2. Background

Algorithm 2 ECDSA signature generation [19]


INPUT: Domain parameters D, private key d and message m
OUTPUT: Signature (r, s)
1: Select k ∈ [1, n − 1]
2: Compute kG = ( x, y)
3: Compute r = x mod n. If r = 0 then go to step 1
4: Compute e = H (m)
5: Compute s = k −1 (e + dr ) mod n. If s = 0 the go to step 1
6: Return (r, s)

If Alice wants to send a message to Bob with a digital signature to verify


that the message has not been corrupted during sending, she can use ECDSA,
as shown in Algorithm 2. First, a random number k are multiplied with the
base point G, and the resulting x value are used to compute r, one of the two
parts of the signature. Then, a hash function H (m) are used to produce a hash
from the message. A hash function is a one-way function, where the message
is very difficult to guess for anyone who knows the hash value. The hash and
Alices private key is used to produce the second part of the signature s.

Algorithm 3 ECDSA signature verification [19]


INPUT: Domain parameters D, public key Q, message m and signature (r, s)
OUTPUT: Acceptance or rejection of the signature
1: Verify that r and s are integers in the interval [1, n − 1] If any verification
fails then return(“Reject the signature”).
2: Compute e = H (m)
3: Compute w = s−1 mod n
4: Compute u1 = ew mod n and u2 = rw mod n
5: Compute X = u1 G + u2 Q.
6: If X = ∞ then reject the signature
7: Convert the x-coordinate x of X to an integer x compute v = x mod n
8: If v = r then accept the signature

When Bob then receives the message and the signature from Alice he can
use Algorithm 3 to verify that the message has not been corrupted during
sending, and be sure that it is the exact same message as Alice sent. The proof
of the verification is out of scope for this thesis, but note that the verification
requires two scalar multiplications.
Relating to the TLS 1.3 [7] standard: ECDH [4] [21] is often used to pass
a symmetric key between Alice and Bob, along with an ECDSA-signature
which verifies that the symmetric key has not been corrupted during trans-
mission.
2.6. Tools 13

2.6 Tools
For simulation and synthesis the tool Questasim [22] is used. Questasim is
developed by Mentor [23]. It is a high-performance tool supporting sim-
ulation, debugging and functional coverage using HDL languages such as
VHDL [24], Verilog [25], and SystemVerilog [26], including SystemVerilogs
object oriented features and SVA.

2.7 ARM Cortex M33


The Cortex-M33 [27] (CM33) is a processor developed my ARM [28]. It uses
the ARMv8-M [29] instruction set architecture and is developed for embed-
ded applications, allowing low power consumption while still providing ef-
ficient security and debug capabilities. It contains features such as an FPU
and DSP with SIMD instructions.
The CM33 also features a coprocessor interface, which allows for tight
integration of coprocessors and accelerators with the CM33. The coproces-
sors are accessible from software using assembly instructions provided in
the ARMv8-M instruction set [29]:

CPD, CPD2 Coprocessor data processing instructions.

MCR, MCR2 32-bit data transfer to the coprocessor.

MRC, MRC2 32-bit data transfer to the CM33.

MCRR, MCRR2 64-bit data transfer to the coprocessor.

MRRC, MRRC2 64-bit data transfer to the CM33.

2.8 Hardware Acceleration


Hardware acceleration is commonly known as a method to speed up calcu-
lations by using specialized hardware, designed for a specific task, which
often supplements a general purpose CPU [30]. A very common applica-
tion of hardware acceleration is graphical processing units (GPUs), which are
used in virtually every desktop. Other areas where hardware acceleration is
common is in the field of AI and neural networks, and relevant to this the-
sis: cryptography. The security of cryptosystems are based on mathematics
which often require heavy computations, which usually can greatly benefit
from dedicated hardware.

2.9 libecc
libecc [31] is a library implementing EC mathematics hierarchically, as illus-
trated in Figure 2.4. The library provides separate modules which provides
natural numbers arithmetics, field arithmetics (Chapter 2.1), elliptic curve
14 Chapter 2. Background

+−−−−−−−−−−−−−−−−−−−−−−−−−+
|EC * DSA s i g n a t u r e |
|algorithms | <−−−−−−−−−−−−−−−−−−+
|( ISO 14888 − 3) | |
+−−−−−−−−−−−+−−−−−−−−−−−−−+ |
^ |
| |
+−−−−−−−−−−−+−−−−−−−−−−−−−+ +−−−−−−−−−−+−−−−−−−−−−−−+
|Curves ( SECP , Brainpool , | | Hash |
|FRP , . . . ) | | functions |
| | | |
+−−−−−−−−−−−+−−−−−−−−−−−−−+ +−−−−−−−−−−−−−−−−−−−−−−−+
^ @@@@@@@@@@@@@@@@@@@@@@@@@@@@
| @{ Useful a u x i l i a r y modules }@
+−−−−−−−−−−−+−−−−−−−−−−−−−+ @+−−−−−−−−−−−−−−−−−−−−−−−−+@
| E l l i p t i c curves | @| Utils |@
| c o r e ( s c a l a r mul , . . . ) | @+−−−−−−−−−−−−−−−−−−−−−−−−+@
+−−−−−−−−−−−+−−−−−−−−−−−−−+ @| Sig S e l f t e s t s |@
^ @| A r i t h S e l f t e s t s |@
| @| User Examples |@
| @+−−−−−−−−−−−−−−−−−−−−−−−−+@
| @| E x t e r n a l deps |@
+−−−−−−−−−−−+−−−−−−−−−−−−−+ @+−−−−−−−−−−−−−−−−−−−−−−−−+@
| Fp f i n i t e f i e l d s | @| LibECC conf f i l e s |@
| arithmetic | @+−−−−−−−−−−−−−−−−−−−−−−−−+@
+−−−−−−−−−−−+−−−−−−−−−−−−−+ @| Scripts |@
^ @+−−−−−−−−−−−−−−−−−−−−−−−−+@
| @@@@@@@@@@@@@@@@@@@@@@@@@@@@
+−−−−−−−−−−−+−−−−−−−−−−−−−+ +−−−−−−−−−−−−−−−−−−−−−−−−+
| NN n a t u r a l | <−−−−−−+ Machine r e l a t e d |
| numbers a r i t h m e t i c | | ( words , . . . ) |
+−−−−−−−−−−−−−−−−−−−−−−−−−+ +−−−−−−−−−−−−−−−−−−−−−−−−+

F IGURE 2.4: libecc architecture [31]

operations (Chapter 2.2), hardcoded values for curves, and implementation


of the ECDSA algorithm (Chapter 2.5). Also, as seen in Figure 2.4, it provides
implementation of some required hash function, self tests and some utilies,
which will not be described here (see [31] for details).
Libecc does not actually implement multiple precision arithmetics but im-
plements finite field and point arithmetics on big numbers up to a maximum
integer width, which is determined at compile time. It uses projective coor-
dinates, no dynamic memory allocation and is written without any depen-
dencies, including the standard libc library.

2.10 Python
Python [32] is an interpreted, general-purpose programming language with
dynamic type checking. Python has several interesting features which makes
it flexible and easy to use, e.g. Python integers have an unlimited range [33]
which makes handling of big numbers trivial. Internally Python represents
big numbers as an array of fixed sized integers, but it is hidden when work-
ing with Python. Python also supports object oriented programming.
15

Chapter 3

Previous Work

In this chapter, existing algorithms for hardware implementations of modu-


lar addition and modular multiplication is presented. A thorough explana-
tion and proof of correctness for these algorithms are not provided, see their
respective references for more details.
An FPGA implementation of ECC coprocessors are presented, and finally
the results from the pre-study is presented.

3.1 Modular Addition Implementation


Modular addition (MA) is the operation of calculating S = X + Y (mod n),
and is in effect the same operation for both addition and subtraction, if using
2’s complement to represent signed numbers.
A straight forward way of implementing MA is to assume that 0 ≤ A, B <
n and do Algorithm 4 [34]. This algorithm may be performed in a single
cycle with minimal control logic, depending on the timing constraints and
the critical path through the additions on line 1 and 2.

Algorithm 4 Modular Addition Algorithm


INPUT: Addends A & B, modulo n
OUTPUT: Sum S
1: Compute S0 = A + B
2: Compute S00 = S0 − n
3: if S00 ≥ 0 then
4: S = S00
5: else
6: S = S0
7: end if

The operations on lines 1 and 2 are normal addition and subtraction, and
the subtraction will require the 2’s complement of n to either be calculated
during operation or precomputed and be an input to the HW module. Algo-
rithm 4 is restricted to positive numbers smaller than n.
Another method was proposed in [35]. Let n < 2k and m = 2k − n, where
k may be the word size of the system. It is assumed that A, B < 2k . Modular
addition can the be computed as in Algorithm 5.
16 Chapter 3. Previous Work

Algorithm 5 Omura’s Method, Modular Addition Algorithm


INPUT: Addends A & B
OUTPUT: Sum S
1: Compute S0 = A + B
2: if there is a carry then
3: S = S0 + m
4: else
5: S = S0
6: end if

The value of m will need to either be computed during operation or pre-


computed and be an input to the HW module. Here the additions in line 1
and 3 are normal additions. If there is no carry the result is A + B, which
may be larger than n, in which case it will be reduced later. However, if there
is a carry it will be ignored, which implies that S0 = A + B − 2k . And the
correctness of the algorithm is given by:

S = S0 + m
= ( A + B − 2k ) + (2k − n )
= A+B−n

Omura’s algorithm is still restricted to positive numbers, but accepts ad-


dends greater than the modulo.

3.2 Modular Multiplication Implementation


Modular multiplication (MM) is the operation of calculating P = A · B (mod n).
There are many algorithms for performing MM, many of which relies on
alternative number representations for higher efficiency, such as the Mont-
gomery modular multiplication [34]
An intuitive way of calculating MM is the multiply-and-divide method
[34], illustrated in Algorithm 6.

Algorithm 6 Multiply and Divide Algorithm


INPUT: Multiplicand A, multiplier B, modulo n
OUTPUT: Product P
1: P0 = A · B
2: P = P0 % n
3: return P

This is, however, not an efficient implementation. The word size of P0


will have to be twice that of A and B in order to avoid overflow, and the need
to optimize the modulo reduction % will introduce unnecessary complexity
to the design. Unless the product P0 is needed an interleaving algorithm is
usually to be preferred.
3.2. Modular Multiplication Implementation 17

A basic interleaving algorithm is presented in Algorithm 7, where A and


B are k-bit numbers between 0 ≤ A, B < n of which Ai and Bi represents the
ith bit.

Algorithm 7 Modular Multiplication Interleaving Algorithm


INPUT: Multiplicand A, multiplier B, modulo n
OUTPUT: Product P
1: P = 0
2: for i = 0 to k − 1 do
3: P = 2 · P + A · Bk−1−i
4: P=P%n
5: end for
6: return P

Since A, B, P < n it follows that

2P + A · Bj ≤ 2(n − 1) + (n − 1) = 3n − 3
Thus, maximum two subtractions are needed to reduce P to 0 ≤ P < n,
which means the modulo operation in line 4 may be implemented as condi-
tional subtractions.
Another efficient modular multiplication algorithm was proposed by Pe-
ter Montgomery in [36]. The result from the Montgomery algorithm is

P = A · B · r −1 (mod n)
where A, B < n and gcd(n, r) = 1. This adds overhead by requiring con-
version of the result. The number of bits in A or B is less than k, and we take
r = 2k [34]. The multiplication is shown in Algorithm 8.

Algorithm 8 Montgomery Modular Multiplication Algorithm


INPUT: Multiplicand A, multiplier B, modulo n
OUTPUT: Product P = A · B · r −1 (mod n)
1: P = 0
2: for i = 0 to k − 1 do
3: P = P + Ai · B
4: if P is odd then
5: P = P+n
6: end if
7: P = P/2
8: end for
9: return P

Here, the division on line 7 is just a right shift, and the operations on line
3 and 5 can be combined: the LSB of P can be calculated before computing
the sum on line 3.
18 Chapter 3. Previous Work

Coprocessor Modular Modular Modular Point Point Scalar


Multiplication Addition Subtraction Doubling Addition Multiplication
CP 1 100 - - - - -
CP 2 100 99 99 - - -
CP 3 147 146 146 899 801 -
CP 4 147 146 146 899 801 240000

TABLE 3.2: Execution times of coprocessors, in clock cycles.

3.3 FPGA Elliptic Curve Coprocessor


In [17] four different EC coprocessors were implemented and tested on an
FPGA, each one implementing different arithmetic operations: CP 1 imple-
mented modular multiplication (Chapter 2.1.1); CP 2 implemented modu-
lar multiplication, addition and subtraction (Chapter 2.1.1); CP 3 also imple-
mented point doubling and addition (Chapter 2.2.3); and CP 4 implemented
SM in addition to the arithmetic operations (Chapter 2.3).
The execution time of the implemented operations in each CP is listed in
Table 3.2. The execution time is displayed in clock cycles.
The tests were performed using 256-bit values. The connected microcon-
troller used 8-bit word width, and the coprocessors were connected to and
read the operands from RAM. Execution times includes reading operands
and writing results.

3.4 Pre-Study
In the pre-study [11] possible partitioning between hardware and software
for an ECC accelerator was explored. Using a pure software implementation
of ECC profiling results were analyzed, trying to determine which parts of
the software implementation could benefit the most from hardware acceler-
ation.
The results showed that roughly 18.8% of execution time during testing
was spent on managing the software implementation of big numbers: ini-
tialization, checking correct behavior, and handling number meta data. The
conclusion was that as much as possible of an EC cryptosystem, in particular
the scalar multiplication, should be performed by a coprocessor to reduce the
overhead of dealing with big numbers in software.
19

Chapter 4

Methodology and Architecture


Design

The main goal for this thesis is to implement an Elliptic Curve Cryptography
Coprocessor (ECCo) which primary purpose is to accelerate the scalar multi-
plication in EC cryptosystems, as was the conclusion of the pre-study [11]. To
perform the scalar multiplication the fundamental mathematical operations
needed are modular multiplication and modular addition (Chapter 2.1.1),
and integer division, when using affine coordinates (Chapter 2.4). These op-
erations are enough to perform point doubling and point addition (Chapter
2.2.3), which allows implementation of an entire scalar multiplication (SM).
The primary goal when designing the ECCo is therefore to implement the
modular arithmetic operations.
The design of a coprocessor are potentially a complex and lengthy pro-
cess. In the design process of the ECCo, to try to simplify this process,
reusable design patterns was actively used: communication between sub-
modules in the ECCo was generalized with clearly defined protocols; test
data for all arithmetic operations was generated with a single Python script,
utilizing Pythons OOP features; and a common testbench setup was used for
all modules. These design patterns are further explained in their respective
methodology and implementation chapters.
This chapter discusses which choices where made during the design and
testing of the ECCo, and why these choices were made. Further, it highlights
important aspects of the design process, specifically where and why reusable
design patterns where used.

4.1 ECCo Design


The goal of the ECCo is to be able to perform scalar multiplication. With-
out any restrictions from any specific systems this allows for a number of
different implementations.

1. It may be designed as a SM module which only performs the SM, simi-


lar to familiar division and multiplication modules. This module could
be integrated in a processor, or connected to a buss, possibly using
DMA to fetch operands.
20 Chapter 4. Methodology and Architecture Design

2. It may be designed as a collection of modules, each implementing an


atomic operation (i.e. modular addition or modular multiplication, see
Chapters 3.1 - 3.2), similar to an FPU. This would be particularly suited
for tight integration with a processor, and provide a flexible design
which could be used for non-EC cryptosystems which also rely on finite
field arithmetic, like RSA.

3. It may be designed as a combination of the previous solutions: Pro-


viding both the atomic operations and the SM operation. This could
provide both a flexible design and an optimized SM, and would also be
very well suited for tight integration with a processor.

The ECCo design in this thesis will interface with the ARM Cortex M33
(Chapter 2.7) for use from software. The CM33 provides a coprocessor inter-
face which allows for tight integration of coprocessors and issuing opcodes to
the coprocessor from software. Because of this, Solutions 2. and 3. are good
choices. Ideally, Solution 3. would be chosen, but due to time limitations So-
lution 2. is the choice for this thesis. Allowing for estimates of SM speedup
with and without the coprocessor by comparing speed of atomic operations
in hardware and software. This minimal implementation will also be able to
give an indication on how the size of the coprocessor will compare to that of
the CM33 core itself.
Since the ECCo will be controlled from software through the coprocessor
interface an instruction set has to be defined for the ECCo. The instruction
set proposed in this thesis is presented in Chapter 5.1. The proposed instruc-
tion set includes more than the atomic operations and data transfer; It also
includes logical, comparison, and shift operations. The pre-study concluded
that an entire SM should be performed in the coprocessor in order to max-
imize the benefit of the coprocessor. By including these flow-control and
common operations the ECCo will be able to perform an entire SM without
datatransfer between the ECCo and CM33 during execution, even though it
is being controlled from SW.

4.2 Choice of Alorithms


The two essential atomic operations are modular addition and modular mul-
tiplication, both of which can be implemented with multiple different algo-
rithms (as described in Chapters 3.1 - 3.2). When choosing which algorithms
to implement, this thesis chose the simplest algorithms in order to reduce
time spent on implementation. Optimizations of the algorithms will be left
for furute work.
The modular multiplication algorithm implemented is the modular multi-
plication interleaving algorithm (Algorithm 7), which is described in Chapter
3.2. This algorithm requires no overhead or added complexity from number
conversion, but is not the most efficient algorithm and is not designed for
security.
4.3. Interpretation of Algorithms 21

For the modular addition Algorithm 4 is the simplest presented algo-


rithm, but it does not support negative numbers (i.e. no subtraction) nor in-
termediate sums greater than 2n. To address these limitations an improved,
generic version of the algorithm was designed. The new algorithm is de-
scribed in Algorithm 9.

Algorithm 9 Generic Modular Addition Algorithm


INPUT: Addends A & B, modulo n
OUTPUT: Sum S
1: Compute S0 = A + B
2: while S0 ≥ n do
3: S0 = S0 − n
4: end while
5: while S0 < 0 do
6: S0 = S0 + n
7: end while
8: S = S0

This algorithm can handle both positive and negative numbers, and in-
termediate sums larger than 2n. Notice that the while loops are mutually
exclusive; After the intermediate sum, S0 = A + B, has been calculated, S0
will either be reduced or increased. Clearly, the while loops are not syn-
thesizable. Details on the interpretation of this algorithm are presented in
Chapter 5.

4.3 Interpretation of Algorithms


The mathematical foundation of ECC requires several abstract concepts and
algorithms to be "translated" into hardware, i.e. the modulo operator; mul-
tiplication over a finite field (see Chapters 2.1.1 and 3); EC point addition
(Chapter 2.2.3 and 3). There are often many ways of doing this, depending
on the algorithm being implemented and system requirements. A significant
decision when designing the implementation is the choice between sequen-
tial or combinatorial. Combinatorial designs are much more restricted by
the clock frequency of the system, and can make it harder to meet timing
requirements. For this thesis the sequential approach is preferred, and state
machines has been designed to implement the chosen algorithms. The rea-
son being that a sequential implementation is more similar to a state machine
representation of the system, which makes it easier to reason about the be-
havior of the system.

4.4 Test Data


In order to verify the results from the implementations of arithmetic oper-
ations a set of known test data is required. In the pre-study [11] test data
for the scalar multiplication and point arithmetic from reliable sources was
22 Chapter 4. Methodology and Architecture Design

used. This test data will be reused in this thesis. Test data for simpler opera-
tions (i.e. modular addition, division, etc.) is easy to generate using a Python
script. Using a Python script will also allow generating more test data for SM
and point arithmetic, since a Python implementation of these operations was
written for the pre-study. The details of this script are described in Chapter
5, and full source code is listed in Appendix A.
Generation of test data contains a repeating pattern, regardless of what
data is being generated: reading data from file, and writing properly format-
ted data to file. This can be handled by Pythons OOP features (see Chapter
5.5.5 and 2.10).

4.5 Verification
In order to both verify correct behavior and to speed up the development
process, the entire ECCo and each sub-module are separately tested with a
testbench verifying correct behavior. In the case of the arithmetic operations
this includes checking results with test data, previously mentioned in Chap-
ter 4.4.
Design of testbenches are a repeating process, which can be simplified
by following a design pattern. During the development of ECCo the chosen
pattern was:

• Each testbench consisted of a module, for instantiating and connect-


ing the design under test (DUT); An interface connected to the DUT; A
package with module specific parameters; A test program.

• All signals in the DUTs interface are connected to, and controlled by,
the testbench. Allowing independent testing of all sub-modules.

• The testbench uses drivers and dummy implementation of modules to


control the DUT. These dummies and driver can be reused between
testbenches, and can utilize system verilogs OOP features.

4.6 Internal Interfaces


During design of the ECCo a repeating design question is how to commu-
nicate between sub-modules. The sub modules of the system are primarily
modules implementing the operations defined by the instruction set, all of
which may share a common communication protocol. Because of this all
communication between sub-modules have been cleary defined using two
interfaces: one for all communication with the register bank, another for all
communication with the ECCo controller module. See Chapter 5.3 for further
details.
4.7. Area Measurement 23

4.7 Area Measurement


To aquire the results for area measurement the design was synthezised. The
results presented are relative values, compared between synthesis of the CM33+ECCo
and the CM33 only.
The speed results were measured during simulation, counting clock cy-
cles used to execute benchmarking code of modular addition and modular
multiplication, for both software and hardware implementations of those op-
erations. Further details in Chapter 5.8.3.
25

Chapter 5

Implementation

This chapter describes implementation details about the work done for this
thesis: proposed instruction set for the ECCo; the implementation of the
ECCo and its integration with the CM33; testbench architecture and verifi-
cation of the ECCo and its sub-modules; test data generation using a Python
script; C implementation of the big numbers library, and the ECCo software
wrapper; benchmarking of modular arithmetic operations, using the ECCo
and a pure software implementation.
The logical, shift and comparison operations mentioned are not imple-
mented in the ECCo for this thesis. The proposed instruction set includes
these instructions, and discusses why they should be included in a future
implementatin of an elliptic curve coprocessor.

5.1 ECCo Instruction Set


The ECCo instruction set was aimed at allowing software controlled imple-
mentations of SM, while reducing data transfer between between CM33 and
ECCo. The instruction set designed in this thesis is listed in Table 5.2.
The connection between these instructions and the coprocessor instruc-
tions of the ARMv8-M instruction set (Chapter 2.7) is: the MCRR and MRRC
are used to for the Load and Store instructions; the CPD and CPD2 instruc-
tions are used for all other instructions, where the opc1 and opc2 arguments
are opcodes for the issued operation (see [29] for description of assembly in-
structions).
In the instruction set the conditional operations are not explicily listed,
the reason being that all operations has a conditional conterpart, using the
CPD2 instruction.
While further evaluation about the necessity of all instructions are re-
quired, the instruction set proposed in this thesis are based on the following
reasoning:
• The arithmetic instructions are fundamental for the SM (as discussed in
Chapter 4).
• The logical instructions allows functionality like masking and setting
registers to zero.
• Shift instructions allows efficient divide/multiply by 2, as required in
algorithms like Montgomery (Algorithm 8)
26 Chapter 5. Implementation

Operation Parameter 1 Parameter 2 Parameter 3


(register) (register) (register)
Modular Multiplicand Multiplier Product
Multiplication
Modular Addition Addend Addend Sum

Integer division Dividend Divisor Quotient

Negate 2’s complement Operand Result

or Operand 1 Operand 2 Result

and Operand 1 Operand 2 Result

xor Operand 1 Operand 2 Result

not Operand Result

Left shift Operand Shift size Result

Logic right shift Operand Shift size Result

Arithmetic right shift Operand Shift size Result

Is zero Operand

Is equal Operand 1 Operand 2

Less than Operand 1 Operand 2

Greater than Operand 1 Operand 2

Load Offset Index

Store Offset Index

Increment Operand Result

Decrement Operand Result

Invert comparison
Set signed bit Index

Unset signed bit Index

TABLE 5.2: Instruction set for elliptic curve coprocessor.


5.2. ECCo Architecture 27

• Comparison and conditional instructions allow control flow.

• Increment and decrement are common operations. Since immediate


values are not available for the coprocessor instructions this avoids the
need of using a register for increment/decrement value.

• Inverting comparison allows for comparisons like greater or equal to, by


inverting Less than.

• Set/Unset are required because the signed bit is not accessible through
the data transfer instructions (see Chapter 5.4 for details).

An implementation of this instruction set will therefore allow an entire


scalar multiplication to be performed in the ECCo, without data transfer dur-
ing execution, while still being controlled by the CM33.

5.2 ECCo Architecture


The architecture of the ECCo were based on Solution 2 in Chapter 4.1. The
architecture is illustrated in Figure 5.1.

F IGURE 5.1: Architecture of ECCo, connected to the CM33 pro-


cessor through the coprocessor interface.

The ECCo is connected to the CM33 through the coprocessor interface. In-
ternally the sub-modules are connected through two interfaces, as discussed
in Chapter 4.6. These interfaces are described in Chapter 5.3.
28 Chapter 5. Implementation

5.3 Internal Interfaces


There were used two internal interfaces in the design: in_OpModule which
defines the protocol for issuing an operation to one of the operation-modules
(a sub-module implementing one or more of the operations in the instruction
set), and in_Registers which defines the protocol for reading from and writing
to the register bank of the ECCo.
The in_OpModule interface uses a valid-ready protocol: when the sub-
module is ready to accept a new operation a ready signal is asserted. An
operation is issued by raising the valid signal, and it is accepted on the first
clock cycle where valid and ready are both asserted. As long as valid is asserted
all parameter values of the interface must be valid and stable. The interface
also defines an error signal, which is asserted whenever an operation fails.
The parameters of in_OpModule are:

op1Reg Register index of operand 1


op2Reg Register index of operand 2
resReg Register index of result
opcode Opcode for the requested operation

Figure 5.2 illustrates the protocol of the in_Opmodule interface. At t3 an


operation is accepted. The controller issues another operation at t6, and has
to wait, while keeping the parameters valid, until the previous operation
has completed. At t9 the operation completed successfully, and the second
operation is accepted. The second operation fails, as indicated by the error
signal at t11. When the following, third, operation is accepted at t13, both
the ready and error signals are deasserted. The SV interface implementation
of in_OpModules is listed in Appendix B.

F IGURE 5.2: Illustration of in_OpModule communication proto-


col.

Because of this generalization of communication with all operation sub-


modules, a common state machine is implemented as the controller in all of
them, which is illustrated in Figure 5.3.
5.4. Register Bank 29

F IGURE 5.3: Illustration of FSM implementing the in_OpModule


communication protocol.

In the state machine in Figure 5.3 StartT, ReadyT, and WaitT are names
of possible transitions. This is because the output of the state machine are
determined by both state and input. In IDLE the ready signal is asserted, and
the value of error may be either 0 or 1. In WAIT both ready and error is always
0.
The in_Registers interface exposes all the registers directly, for reading. To
write, the signals enable, register, and data are used, indicating when to enable
writing, which register to write to, and the write data, respectively. The SV
interface implementation of in_Registers are listed in Appendix B.

5.4 Register Bank


The register bank is a module containing 16 registers, which may be read
from and written to. The choice of 16 registers was done based on a limita-
tion from the CM33 which required the indexing of register using no more
than 4 bits. However, it may not be necessary with these many registers
to perform the SM. An evaluation of necessary number of registers are left
for future work, considering both the area usage of the register bank and
required number of registers for the SM implementation. All 16 registers
are exposed for reading through the in_Registers interface. Writing is imple-
mented following the in_Registers protocol.
The registers are of width WORD_W IDTH + 1, e.g. if the ECCo is in-
stantiated with a word width of 256-bit the word width of the registers will
be 257-bit. The reason for this is that parameter values from standards such
as [37] and [38] require WORD_W IDTH-bits to represent positive values.
Because of this the signed bit of registers are manipulated through dedicated
instructions, to avoid using a 64-bit data transfer to access the signed bit.
30 Chapter 5. Implementation

Register Name Register Index Writable Readable

CR0 0 X X
CR1 1 X X
... ... ... ...
CR13 13 X X
Modulo Register 14 X X
Status Register 15 X

TABLE 5.4: List of ECCo registers.

Table 5.4 lists all registers in the register bank. There is only two non-
general registers: the modulo register and the status register. The modulo
register is used for storing the modulo during modular arithmetic operations.
The status register is read-only (all writing to it is done inside the register
bank) and contains information about the current status of the ECCo:

Bit 0 Comparison result bit.

Bit 1-15 Active bits. These are reserved for future use in an asynchronous de-
sign, for indicating which operation modules are currently working
and which are idle.

Bit 16-30 Signed bits. The signed bits of register 0-14, respectively.

Bit 31- Unused.

5.5 Arithmetic Module


The arithmetic operations sub-module is implemented as a controller imple-
menting the in_OpModule protocol and wrapping the modules implement-
ing each individual arithmetic operation: negation, integer division, modu-
lar addition, and modular multiplication. In Figure 5.4 the block diagram of
the arithmetic module are shown. The arithmetic controller implements the
in_OpModule FSM, as illustrated in Figure 5.3.
5.5. Arithmetic Module 31

F IGURE 5.4: Block diagram of arithmetic module.


32 Chapter 5. Implementation

5.5.1 Negation
The negation operation is a single cycle operation which is straight forward
to implement, and performs a 2’s complement negation of the operand. It is
continually calculated:

1 a s s i g n r e s = ~( operand ) + 1 ;

5.5.2 Integer Division


The integer division is a necessary operation when using Affine coordinates,
but its implementation is not very interresting in regards to the ECCo. There-
fore, it was initially implemented using an opensource design from Open-
Cores [39]. However, this design did not function properly and instead in-
teger division was implemented using the SystemVerilog division operator,
"/".
It is also a single cycle operation, but requires divide-by-zero detection
and handling of negative numbers: If the divisor and/or dividend is negative
its positive 2’s complement is used in the division and the sign of the result
is calculated using basic algebra rules, as shown in Listing 5.1.

1 // MSB o f dividend ( op1 ) and d i v i s o r ( op2 )


2 l o g i c msbOp1 , msbOp2 ;
3 // I n t e r n a l s i g n a l s
4 l o g i c [WORD_WIDTH: 0 ] intOp1 ;
5 l o g i c [WORD_WIDTH: 0 ] intOp2 ;
6 l o g i c [WORD_WIDTH: 0 ] i n t R e s ;
7
8 // The division i s continuously calculated .
9 assign divideByZero = ( op2 == 0 ) ;
10 assign i n t R e s = intOp1 / intOp2 ;
11 assign msbOp1 = op1 [WORD_WIDTH] ;
12 assign msbOp2 = op2 [WORD_WIDTH] ;
13
14 always_comb begin
15 intOp1 = op1 ;
16 intOp2 = op2 ;
17 i f ( msbOp1 && msbOp2 ) begin
18 intOp1 = (~ op1 ) + 1 ;
19 intOp2 = (~ op2 ) + 1 ;
20 end
21 e l s e i f ( msbOp1 )
22 intOp1 = (~ op1 ) + 1 ;
23 e l s e i f ( msbOp2 )
24 intOp2 = (~ op2 ) + 1 ;
25 end
26
27 a l w a y s _ f f @( posedge ck )
28 r e s <= ( msbOp1 ^ msbOp2 ) ? (~ i n t R e s ) + 1 : i n t R e s ;
L ISTING 5.1: Division SV implementation.
5.5. Arithmetic Module 33

5.5.3 Modular Addition


The modular addition is implemented using Algorithm 9, designed for this
thesis, as discussed in Chapter 4.2. This algorithm is interpreted as illustrated
by the FSM in Figure 5.5, and the datapath in Figure 5.6. The transitions in
the illustration are referred to by name.

F IGURE 5.5: FSM interpretation of Generic Modular Addition


Algorithm.

DoneT Transition to IDLE when an addition has finished. Asserting done for
one cycle.
WaitT Transition in IDLE when not performing an operation.
ReduceT Transition to REDUCE when the intermediate sum is greater than
the modulo, and need to be reduced to 0 ≤ Sum < Modulo.
IncreaseT Transition to INCREASE when the intermediate sum is less than
0, and need to be increased to 0 ≤ Sum < Modulo.

If initially: op1 + op2 < mod then the calculation only takes one cycle to
complete, or else op1 mux selects the intermediate result as operand 1 and
op2 mux selects either mod or −mod as operand 2, depending on if the state
is INCREASE or REDUCE, respectively. In worst case the addition could
take 2WORD_W IDTH − 1 cycles to perform, calculating ((2WORD_W IDTH − 1) +
0) % 1.

5.5.4 Modular Multiplication


The modular multiplication is implemented using the Algorithm 7, as dis-
cussed in Chapter 4.2. This algorithm is interpreted as illustrated by the FSM
in Figure 5.7, and the datapath in Figure 5.8. The transitions in the illustration
are referred to by name.

DoneT Transition to IDLE when an multiplication has finished. Asserting


done for one cycle.
34 Chapter 5. Implementation

F IGURE 5.6: Block diagram of modular addition module.

WaitT Transition in IDLE when not performing an operation.

AddT Transition to ADD when calculating the sum of 2 · P + A · Bk−1−i (as


described in Chapter 3.2).

ReduceT Transition to REDUCE when the intermediate sum is greater than


the modulo, and need to be reduced to 0 ≤ Sum < Modulo.

ReduceDoneT Transition to REDUCE_DONE when the intermediate sum


is greater than the modulo, and need to be reduced to 0 ≤ Sum <
Modulo, before finishing to operation.

The modular multiplication always has an execution time of at least WORD_WIDTH


cycles since it has to iterate through all bits of op2, except the signed bit. None
of op1, op2, or mod are allowed to be negative. The emphpartial product mux
selects the current value of A · Bk−1−i . op1 mux and op2 mux selects whether
to calculate 2 · P + A · Bk−1−i or to reduce the intermediate result.

5.5.5 Test Data


Test data was generated using a python script, which was written with an
architecture as illustrated in Figure 5.9. The test data solutions are created by
python operators, as shown in Listing 5.2.

1 def modular_addition ( op1 : i n t , op2 : i n t , mod : i n t ) −> i n t :


2 r e t u r n ( op1 + op2 ) % mod
3
4 def m o d u l a r _ m u l t i p l i c a t i o n ( op1 : i n t , op2 : i n t , mod : i n t ) −>
int :
5.5. Arithmetic Module 35

F IGURE 5.7: FSM interpretation of Multiply and Divide Algo-


rithm.

5 r e t u r n ( op1 * op2 ) % mod


6
7 def i n t e g e r _ d i v i s i o n ( op1 : i n t , op2 : i n t ) −> i n t :
8 i f op1 < 0 and op2 < 0 :
9 r e s = abs ( op1 ) // abs ( op2 )
10 e l i f op1 < 0 :
11 r e s = −(abs ( op1 ) // op2 )
12 e l i f op2 < 0 :
13 r e s = −(op1 // abs ( op2 ) )
14 else :
15 r e s = op1 // op2
16 return res
L ISTING 5.2: Test data solution calculations.

Notice the integer division // does not handle division of negative num-
bers correctly. Instead any negative numbers are negated, and basic algebra
rules are used to determine the sign of the result, just as it is implemented in
hardware.
The script source code is listed in Appendix A. Test data values used for
verification are listed in Appendix C.

5.5.6 Verification - Arithmetic Module


The arithmetic module was tested using a TB design as illustrated in Figure
5.10. The test program communicates with the arithmetic module through
an in_OpModule driver, and controls and verifies the register content during
testing through a dummy register bank, connected to the arithmetic module.
During testing the values listed in Appendix C were used to verify correct
results from arithmetic operations.
36 Chapter 5. Implementation

F IGURE 5.8: Block diagram of modular multiplication module.

5.6 Controller Module


The controllers primary purpose is to handle communication with the CM33
using the coprocessor interface, the FSM in Figure 5.11 illustrates the imple-
mented state machine which does this. This is a synchronous design: the
controller will wait for any multicycle operation to finish before signaling to
the CM33 that it is ready to accept further instructions.
The outputs of the FSM is the coprocessor interface signals valid and er-
ror, and an internal valid, which are used in the in_OpModule interface. The
transitions in the illustration are referred to by name. The output signals of
the FSM are determined by both state and input, easiest described as the set
of all possible transitions:

RyT - ready transition Transition to READY, with ready asserted and error
deasserted, waiting for an instruction to be issued.

ET - error transition Transition to READY, with both ready and error asserted.
May be from an write error, read error, data processing error or an in-
valid instruction being issued.

WaT - wait transition Transition to WAIT when valid is asserted and a data
processing operation is issued.

WaWT - wait wait transition Transition to WAIT, from WAIT, while current
data processing operation is not yet finished.
5.6. Controller Module 37

F IGURE 5.9: Class diagram of python script generating test


data.

WaRT - wait ready transition Transition to WAIT, from WAIT, when a data
processing operation finished successfully and valid is asserted, request-
ing a new data processing operation immediately.

WaET - wait error transition Transition to WAIT, from WAIT, when a data
processing operation finished with error and valid is asserted, request-
ing a new data processing operation immediately.

ReT - read transition Transition to READ, when the processor wants to read
from a coprocessor register.

ReRT - read ready transition Transition to READ, from WAIT, when a data
processing operation finished successfully and valid is asserted, request-
ing a data transfer operation (read) immediately.

ReET - read error transition Transition to READ, from WAIT, when a data
processing operation finished with error and valid is asserted, request-
ing a data transfer operation (read) immediately.

WrT - write transition Transition to WRITE, when the processor wants to


write to a coprocessor register.

WrRT - write ready transition Transition to WRITE, from WAIT, when a data
processing operation finished successfully and valid is asserted, request-
ing a data transfer operation (write) immediately.

WrET - write error transition Transition to WRITE, from WAIT, when a data
processing operation finished with error and valid is asserted, request-
ing a data transfer operation (write) immediately.

5.6.1 Verification - Controller Module


The testbench setup for the verification of the controller module is illustrated
in Figure 5.12.
38 Chapter 5. Implementation

F IGURE 5.10: Block diagram of Arithmetic Module TB.

Operation module dummies for the arithmetic, logical, comparison and


shift modules are connected to the controller, and controlled by the test pro-
gram. A dummy register bank is connected to the controller, and the con-
troller is tested using a coprocessor interface driver for communication.

5.7 Verification - ECCo


The testbench setup for verification of the entire ECCo is illustrated in Figure
5.13.
A coprocessor interface driver is used to communicate with the ECCo,
and the test values from Appendix C are used to check for correct behavior
of the implemented operations.

5.8 Software
For this thesis three software components were implemented: a wrapper for
the coprocessor interface instructions; a big number library for use with the
ECCo; and a benchmarking program.
5.8. Software 39

F IGURE 5.11: FSM of ECCo controller module.

The big number library and ECCo wrapper were used to verify that com-
munication with the ECCo using the coprocessor interface was working as
expected. To verify correct behavior of the ECCo controller and the imple-
mented operations the test data form Appendix C were used. The source
code of the test programs used for verification are listed in Appendix F.

5.8.1 ECCo Wrapper


The ECCo wrapper was implemented to simplify calling the ECCo from C
using the coprocessor interface. The coprocessor instructions of the ARMv8-
M instruction set have to be called from assembly, using string literals to
refer to coprocessor registers and opcodes. Therefore a series of macros were
created for all the instructions in the proposed instruction set (Table 5.2). The
code for the wrapper is listed in Appendix D.

5.8.2 Big Number library


When using the ECCo some minor handling of big numbers in software are
still required. For this a big number library was implemented for use with
the ECCo. The functionality it provided was:
40 Chapter 5. Implementation

F IGURE 5.12: Testbench setup for verification of the controller


module.

• Converting to and from number strings on hexadecimal format.


• Comparing two numbers.
• Loading a number to an ECCo register.
• Storing a number from an ECCo register.
• Some other convenient functionality.
The source code for the big number library is listed in Appendix E.

5.8.3 Benchmark Software


For benchmarking the pure software implementation of ECC, ANSSI libecc
(Chapter 2.9), were compared to the ECCo. The benchmarked operations
5.8. Software 41

F IGURE 5.13: Testbench setup for verification of ECCo.

were the modular multiplication and modular addition. As these are the
fundamental operations of SM the execution time of these will give an in-
dication of the possible speedup. The benchmarking was performed by do-
ing the setup of parameters once, instantiating operand 1 (OP1), operand 2
(OP2), and modulo (MOD) to large 256-bit values. The same values were
used for the libecc and ECCo benchmarks. Then the operation OP1 = OP1 +
OP2 % MOD were performed for the modular addition benchmark, and
OP1 = OP1 ∗ OP2 % MOD for the modular multiplication benchmark.
The benchmarks were performed doing runs of 10 and 100 iterations, i.e.
performing the operation 10 or 100 times, updating the OP1 value each time.
The test values were large 256-bit values, making them similar to values used
during 256-bit SM. These benchmarks does, however, not include tests of
edge cases, such as when MOD << OP1 + OP2 in which case the ECCo will
have a very long execution time, nor does it guarantee coverage of the case
when MOD > OP1 + OP2 or MOD > OP1 ∗ OP2.
The source code for the benchmarking programs are listed in Appendix
F.
43

Chapter 6

Results

The simulation tests described in Chapter 5, verifying correct behavior of all


sub-modules and correct results from implemented arithmetic operations, all
succeeded.
This chapter presents the results from the benchmark, comparing the ex-
ecution time between the modular arithmetic software implementation by
libecc and the ECCo implementation. Lastly, the area estimates from synthe-
sis are presented.

6.1 Speed
The execution time of modular addition and modular multiplication is com-
pared between benchmark code running the operations on ECCo and using
the software implementation from libecc. Table 6.2 summarizes the bench-
marking results. The execution time is measured in clock cycles. As a ref-
erence, a simulation run without any operation was performed in order to
measure the setup time of the system. This empty run had an execution time
of 36,790 cycles (this is included in the results presented in Table 6.2).
The results show that the ECCo performed 3.8 times faster for modular
addition at 10 iterations, and 8 times faster at 100 iterations. As for the mod-
ular multiplication the ECCo performed 7.8 times faster at 10 and 27 times
faster at 100 iterations.
While the ECCo is significantly faster than the compared software imple-
mentation another notable result is how the ECCo and software implemen-
tation scales differently: From 10 to 100 iterations the ECCo had an increase

Operation Exec. Time - 10 Exec. Time - 100


Iterations Iterations
Modular Addition - ECCo 42,818 43,294
Modular Addition - libecc 164,906 347,966
Modular Multiplication - ECCo 46,840 87,864
Modular Multiplication - libecc 367,664 2,375,744

TABLE 6.2: Execution time of atomic operations. Measured in


clock cycles.
44 Chapter 6. Results

Measurement Increase

Combinational Area 3.12x


Noncombinational Area 1.36x
Total Area 1.83x

TABLE 6.4: Area increase for design when adding ECCo.

Module Sub-Module ECCo Acc. Area Comb. Area Noncomb. Area

Arithmetic 84.63% 5.80% 13.61%


Multiplication* 1.97% 1.56% 4.81%
Addition* 1.92% 1.53% 4.61%
Negation 0.78% 0.25% 4.49%
Division 73.18% 83.02% 4.50%
Controller 4.65% 5.25% 0.42%
Register Bank 10.72% 2.59% 67.31%

TABLE 6.6: Area distribution of ECCo modules. (*modular)

in execution time of 1.01x (addition) and 1.8x (multiplication), while the soft-
ware implementation had an increase of 2.1x (addition) and 6.5x (multiplica-
tion). This gives an indication on the benefit of having a coprocessor which
allows an extensive amount of operations to be performed without the need
for data transfer between processor and coprocessor.

6.2 Area
The design of the CM33 with the ECCo was synthesizable, and did not have
any negative slack. It was synthesized without any optimization, at a fre-
quency of 128MHz. The area results are presented as a comparison between
synthesis estimates of the design with and without the ECCo included (Ta-
ble 6.4), and a area distribution between the sub-modules of the ECCo (Table
6.6).
The values shown in Table 6.4 are percentage increase in area when syn-
thesizing the CM33 and CM33+ECCo. Clearly, the ECCo contains a great
deal of combinatorial logic, increasing area of combinatorial cell area by 312%.
In total the ECCo’s area equals 83% of existing design.
The values shown in Table 6.6 are the area distribution of the ECCo sub-
modules.
6.2. Area 45

ECCo Accumulative Area The area percentage of the ECCo occupied by this
module, included its sub-modules. The percentages of Arithmetic, Con-
troller, and Register Bank modules add up to 100%, being all the sub-
modules of the ECCo. The percentages of Multiplication, Addition, Nega-
tion, and Division are included in the Arithmetic percentage, but they do
not sum up to 84.63% since the Arithmetic module contains some logic
of its own.

Combinatorial Area The area percentage of combinatorial cells for only this
module, not including any of its sub-modules. E.g. the Arithmetic mod-
ule uses 5.8% of the total area of combinatorial cells in the ECCo, ex-
cluded its sub-modules, and the Division module uses 83.02% of the
total combinatorial area of the ECCo.

Noncombinatorial Area Same as for combinatorial.

Not surprisingly, a majority of the noncombinational area are occupied by


the register bank. However, most of the area of the ECCo are occupied by the
divider, which were synthesized using the SV division operator "/" without
any optimization from the synthesizer.
The implementation of the most essential modules, Modular Multpilcation
and Modular Addition, only occupied 1.97% and 1.92%, respectively. Com-
bined with the benchmark results, this gives an indication of the advantages
of using the ECCo: Significant speedup, with only a small area increase, as-
suming the divisor can be more efficiently implemented. Assuming a more
efficient divisor implementation: the register bank may be the module occu-
pying the largest area, currently being 5x the size of the Modular Multiplica-
tion and Modular Addition modules, and 2x the size of the controller.
47

Chapter 7

Future Work

The ECCo implementation in this thesis has only included a small subset
of necessary operations and features for the suggested design of a complete
elliptic curve coprocessor. This chapter discusses possible changes and con-
siderations for future work on the coprocessor proposed in this thesis.

7.1 Instruction Set Architecture


The instruction set proposed in Table 5.2 is intended for a design aimed for
solution 2 in Chapter 4.1. The desired solution, however, is solution 3, which
requires some additional, higher level operations to be included in the in-
struction set. More specifically point arithmetic (Chapter 2.2.3) and/or scalar
multiplication (Chapter 2.3).
Also, another desirable functionality would be to have a way of generat-
ing random numbers of the coprocessors word size. This is because random
numbers used in many cryptography algorithms, like ECDSA (Chapter 2.5).
The currently implemented arithmetic operations of modular addition
and modular multiplication are also the fundamental operations of common,
non-EC crypto systems, like RSA [20] and Diffie-Hellman [4]. Adding in-
structions for these common algorithms could be usefull, but would require
the possibility of working with numbers of bit sizes up to 4096-bit to provide
acceptable security.

7.2 Security
An issue which has not been addressed in this thesis, but which must be
considered for future work, is security of the implementation against attacks
such as side-channel attacks. A way of trying to defend against side-channel
attacks is by using constant time algorithms for calculations, which should
be considered both for the finite-field arithmetic, point operations and the
scalar multiplication algorithm.

7.3 Algorithms
While the implemented algorithms for modular addition and modular mul-
tiplication are simple, with more complex and efficient methods available
48 Chapter 7. Future Work

(Chapters 3.1 and 3.2), the current implementation already provides signifi-
cant speedup over pure software implementation. A future change in choice
of algorithms is necessary for further development, a decision in which a
compromise between security and efficiency surely is needed.
The integer division will, however, need a more area efficient implemen-
tation. Reducing the area consumption of the divisor module could, poten-
tially, significantly reduce the total area of the ECCo.
49

Chapter 8

Conclusion

This thesis has explored how to design a coprocessor for accelerating elliptic
curve cryptography, based on the results from the prestudy [11]. The co-
processor designed in the thesis, ECCo, was designed for use with the ARM
CM33 processor. The CM33 provides a coprocessor interface for tight integra-
tion of coprocessors, which allows the instructions to be issued to connected
coprocessors from software.
This lead to the ECCo being designed with an instruction set providing
the atomic mathematical operations for ECC, with the possibility of adding
implementations of scalar multiplication to the instruct set in a future work.
As time did not allow for the entire proposed instruction set to be im-
plemented only the atomic arithmetic operations were implemented, and an
ECCo design with a controller, register bank and arithmetic module were
used to compare execution time with an ECC software implementation, and
to estimate area usage by synthesis. The ECCo accounted for 45% of the
area when synthesizing ECCo+CM33. The estimates showed that the ECCo
area consumption was largely dominated by the divisor (73.18% of the total
ECCo area), which was implemented using the SystemVerilog division oper-
ator, "/", and no optimization in synthesis. However, the atomic operations
of ECC, Modular Multiplication and Modular Addition, only occupied 1.97%
and 1.92%, respectively. These modules also performed 3.8x - 27x faster than
a pure software implementation of ECC.
While the implemented algorithms for modular addition and modular
multiplication are simple, with more complex and efficient methods avail-
able (Chapters 3.1 and 3.2), the current implementation already provides
significant speedup over pure software implementation. Providing a com-
plete system which allows efficiency to be achieved through several meth-
ods: reducing data transfers, optimizing implementation of mathematical
operations and flexibility and ease-of-use.
51

Appendix A

Test Data Python script

1 import a r g p a r s e
2 import csv
3 import i o
4 import os
5 import r e
6 import s h u t i l
7 import sys
8 from abc import ABC, a b s t r a c t c l a s s m e t h o d
9 from typing import *
10
11
12 # E xce pt ion c l a s s used t o d i f f e r e n t i o t e between known and unknown e r r o r s .
13 c l a s s DataError ( E xce pt ion ) :
14 pass
15
16
17 # ##############################################################################
18 # #
19 # Baseclass #
20 # #
21 # ##############################################################################
22
23 c l a s s DataABC (ABC) :
24 " " " DataABC i s t h e b a s e c l a s s f o r a l l c a l c u l a t i o n s . I t handles reading from
25 and w r i t i n g t o csv data f i l e s , w r i t i n g t o C f i l e s , and number f o r m a t t i n g
26 ( decimal , hex & b i n a r y ) .
27 """
28 headers = [ ]
29 data = []
30
31 def _ _ i n i t _ _ ( s e l f , headers , f i l e : i o . IOBase , numBase : i n t ) −> None :
32 s e l f . headers = headers
33 rd = csv . r e a d e r ( f i l e )
34 # F i r s t l i n e o f t h e f i l e must be t h e headers
35 f i l e H e a d e r s = rd . __next__ ( )
36 i f s e l f . headers ! = f i l e H e a d e r s :
37 r a i s e DataError ( f ’ [ ! ! ] DataABC , _ _ i n i t _ _ : I n v a l i d headers ! Want { s e l f . headers } − got
,→ { f i l e H e a d e r s } ’ )
38
39 # Read a l l data
40 f o r j , c o l s i n enumerate ( rd ) :
41 # Report and s k i p empty l i n e s
42 i f not c o l s :
43 p r i n t ( f ’ [ ] DataABC , _ _ i n i t _ _ : Reading { f i l e } : Found empty l i n e ( { j + 2 } ) .
,→ Ignoring . . . ’ )
44 continue
45 # Represent t h e data as a d i c t , indexed by header names
46 tmp = d i c t ( )
47 f o r i , h i n enumerate ( s e l f . headers ) :
48 # S a n i t y c h e c k s t o avoid decimal i n t e r p r e t e d as hex e t c .
49 i f not r e . match ( r ’^−?\d+$ ’ , c o l s [ i ] ) and numBase == 10 :
50 r a i s e DataError ( f ’ DataABC , _ _ i n i t _ _ : Reading { f i l e } : T r i e d i n t e r p r e t i n g
,→ non−decimal number as decimal : " { c o l s [ i ] } " ’ )
51 e l i f not r e . match ( r ’ ^−?0x[0 − 9a−fA−F ]+ $ ’ , c o l s [ i ] ) and numBase == 16 :
52 r a i s e DataError ( f ’ DataABC , _ _ i n i t _ _ : Reading { f i l e } : T r i e d i n t e r p r e t i n g non−hex
,→ number as hexadecimal : " { c o l s [ i ] } " ’ )
53 e l i f not r e . match ( r ’ ^−?0b [ 0 1 ] + $ ’ , c o l s [ i ] ) and numBase == 2 :
54 r a i s e DataError ( f ’ DataABC , _ _ i n i t _ _ : Reading { f i l e } : T r i e d i n t e r p r e t i n g
,→ non−b i n a r y number as b i n a r y : " { c o l s [ i ] } " ’ )
55 tmp [ h ] = i n t ( c o l s [ i ] , numBase )
56 s e l f . data . append ( tmp )
57
58 @abstractclassmethod
59 def c a l c u l a t e ( s e l f ) −> None :
60 pass
61
62 @staticmethod
63 def _formatNumber (num : i n t , numFormat : i n t ) −> s t r :
64 # Determine number format s t r i n g
65 i f numFormat == 16 :
66 r e t u r n f ’ 0x {num : x } ’ i f num >= 0 e l s e f ’ −0x { abs (num) : x } ’
67 e l i f numFormat == 2 :
68 r e t u r n f ’ 0b {num : b } ’ i f num >= 0 e l s e f ’ −0b { abs (num) : b } ’
52 Appendix A. Test Data Python script

69 else :
70 r e t u r n f ’ {num} ’
71
72
73 def _formatDataCsv ( s e l f , numFormat : i n t ) −> Generator [ D i c t [ s t r , s t r ] , None , None ] :
74 # I t e r a t e through data values , y i e l d d i c t i o n a r i e s with s t r i n g s o f form atted numbers
75 f o r d i n s e l f . data :
76 tmp = d i c t ( )
77 f o r k , v i n d . items ( ) :
78 tmp [ k ] = s e l f . _formatNumber ( v , numFormat )
79 y i e l d tmp
80
81 def writeCsv ( s e l f , f i l e : i o . IOBase , numFormat : i n t ) −> None :
82 wr = csv . D i c t W r i t e r ( f i l e , f i e l d n a m e s= s e l f . headers )
83 # F i r s t writeCsv t h e header l i n e
84 wr . w r i t e h e a d e r ( )
85 # Write a l l data t o t h e f i l e
86 f o r d i n s e l f . _formatDataCsv ( numFormat ) :
87 wr . writerow ( d )
88
89 def _formatDataC ( s e l f , numFormat : i n t ) −> Generator [ L i s t [ s t r ] , None , None ] :
90 f o r d i n s e l f . data :
91 tmp = l i s t ( )
92 for v in d . values ( ) :
93 tmp . append ( s e l f . _formatNumber ( v , numFormat ) )
94 y i e l d tmp
95
96 def writeC ( s e l f , f i l e : i o . IOBase , numFormat : i n t , fileName : s t r , arrayName : s t r ) −> None :
97 # Need t o know s i z e o f a l l t h e a r r a y s dimensions
98 numEntries = l e n ( s e l f . data ) + 1 # Zero t e r m i n a t e d
99 numHeaders = l e n ( s e l f . headers )
100 numChars = 0
101 # I t e r a t e through a l l v a l u e s and f i n d t h e l o n g e s t s t r i n g
102 f o r d i n s e l f . data :
103 for v in d . values ( ) :
104 l = l e n ( s e l f . _formatNumber ( v , numFormat ) )
105 i f l > numChars :
106 numChars = l
107 numChars += 1 # One e x t r a , f o r t e r m i n a t i n g zero
108
109 # P r i n t some g e n e r a l i n f o r m a t i o n comments
110 p r i n t ( f ’ // Created by { sys . argv [ 0 ] } with data from { fileName }\n// Number base : { numFormat } ’ ,
,→ f i l e = f i l e , end= ’ \n\n ’ )
111 # P r i n t some macros with meta data
112 p r i n t ( f ’ # d e f i n e { arrayName . upper ( ) }_NUM_ENTRIES { numEntries − 1} ’ , f i l e = f i l e )
113 p r i n t ( f ’ # d e f i n e { arrayName . upper ( ) }_NUM_HEADERS { numHeaders } ’ , f i l e = f i l e )
114 p r i n t ( f ’ # d e f i n e { arrayName . upper ( ) }_NUM_CHARS { numChars − 1} ’ , f i l e = f i l e , end= ’ \n\n ’ )
115 # P r i n t a comment with t h e headers
116 p r i n t ( f ’ // [ { " , " . j o i n ( s e l f . headers ) } ] ’ , f i l e = f i l e )
117 # Write t h e a c t u a l data
118 p r i n t ( f ’ char { arrayName } [ { numEntries } ] [ { numHeaders } ] [ { numChars } ] = { { ’ , f i l e = f i l e )
119 f o r data i n s e l f . _formatDataC ( numFormat ) :
120 print ( f """ { { " { ’ " , " ’ . j o i n ( data ) } " } } , " " " , f i l e = f i l e )
121 # End with zero t e r m i n a t i o n
122 print ( ’ {0}\n } ; ’ , f i l e = f i l e )
123
124
125
126 # ##############################################################################
127 # #
128 # Addition #
129 # #
130 # ##############################################################################
131
132 c l a s s ModAddData ( DataABC ) :
133 def _ _ i n i t _ _ ( s e l f , f i l e : i o . IOBase , numBase : i n t ) :
134 super ( ) . _ _ i n i t _ _ ( [ ’ modulo ’ , ’ operand1 ’ , ’ operand2 ’ , ’ r e s u l t ’ ] , f i l e , numBase )
135
136 def c a l c u l a t e ( s e l f ) :
137 # For each e n t r y c a l c u l a t e op1+op2 % mod
138 f o r i , d i n enumerate ( s e l f . data ) :
139 s e l f . data [ i ] [ ’ r e s u l t ’ ] = ( d [ ’ operand1 ’ ] + d [ ’ operand2 ’ ] ) % d [ ’ modulo ’ ]
140
141
142 # ##############################################################################
143 # #
144 # Multiplication #
145 # #
146 # ##############################################################################
147
148 c l a s s ModMulData ( DataABC ) :
149 def _ _ i n i t _ _ ( s e l f , f i l e : i o . IOBase , numBase : i n t ) :
150 super ( ) . _ _ i n i t _ _ ( [ ’ modulo ’ , ’ operand1 ’ , ’ operand2 ’ , ’ r e s u l t ’ ] , f i l e , numBase )
151
152 def c a l c u l a t e ( s e l f ) :
153 # For each e n t r y c a l c u l a t e op1 * op2 % mod
154 f o r i , d i n enumerate ( s e l f . data ) :
155 s e l f . data [ i ] [ ’ r e s u l t ’ ] = ( d [ ’ operand1 ’ ] * d [ ’ operand2 ’ ] ) % d [ ’ modulo ’ ]
156
157
158 # ##############################################################################
159 # #
160 # Division #
161 # #
162 # ##############################################################################
163
164 c l a s s DivData ( DataABC ) :
Appendix A. Test Data Python script 53

165 def _ _ i n i t _ _ ( s e l f , f i l e : i o . IOBase , numBase : i n t ) :


166 super ( ) . _ _ i n i t _ _ ( [ ’ operand1 ’ , ’ operand2 ’ , ’ r e s u l t ’ ] , f i l e , numBase )
167
168 def c a l c u l a t e ( s e l f ) :
169 # For each e n t r y c a l c u l a t e op1/op2 , i n t e g e r d i v i s i o n
170 f o r i , d i n enumerate ( s e l f . data ) :
171 op1 = d [ ’ operand1 ’ ]
172 op2 = d [ ’ operand2 ’ ]
173 # I n t e g e r d i v i s i o n doesn ’ t behave as expected when d e a l i n g with
174 # n e g a t i v e numbers ( e . g . i t t h i n k s 3//−4 = − 1) so j u s t g i v e i t
175 # p o s i t i v e numbers i n s t e a d and use b a s i c a r i t h m e t i c r u l e s f o r
176 # determining r e s u l t s i g n .
177 i f op1 < 0 and op2 < 0 :
178 s e l f . data [ i ] [ ’ r e s u l t ’ ] = abs ( op1 ) // abs ( op2 )
179 e l i f op1 < 0 :
180 s e l f . data [ i ] [ ’ r e s u l t ’ ] = −(abs ( op1 ) // op2 )
181 e l i f op2 < 0 :
182 s e l f . data [ i ] [ ’ r e s u l t ’ ] = −(op1 // abs ( op2 ) )
183 else :
184 s e l f . data [ i ] [ ’ r e s u l t ’ ] = op1 // op2
185
186
187 # ##############################################################################
188 # #
189 # Main code #
190 # #
191 # ##############################################################################
192
193 i f __name__ == " __main__ " :
194 # Setup a r g p a r s e
195 par = a r g p a r s e . ArgumentParser ( )
196 par . add_argument ( ’ FILE ’ , type= s t r , help= ’ data f i l e on e i t h e r hexa , b i n a r y or decimal format . ’ )
197 par . add_argument ( ’−o ’ , metavar= " FILE " , type= s t r , help= ’ o p t i o n a l output f i l e ’ )
198 par . add_argument ( ’−c ’ , a c t i o n = ’ s t o r e _ t r u e ’ , help= ’ output t h e data as C−a r r a y i n s t e a d o f CSV ’ )
199 par . add_argument ( ’−b ’ , a c t i o n = ’ s t o r e _ t r u e ’ , help= ’ c r e a t e a backup f i l e ’ )
200 # Use a mutually e x c l u s i v e group f o r s e l e c t i n g number format
201 formatGroup = par . add_mutually_exclusive_group ( r e q u i r e d =True )
202 formatGroup . add_argument ( ’−−dec ’ , a c t i o n = ’ s t o r e _ t r u e ’ , help= ’ input data i s on decimal format . ’ )
203 formatGroup . add_argument ( ’−−hex ’ , a c t i o n = ’ s t o r e _ t r u e ’ , help= ’ input data i s on hexadecimal
,→ format . ’ )
204 formatGroup . add_argument ( ’−−bin ’ , a c t i o n = ’ s t o r e _ t r u e ’ , help= ’ input data i s on b i n a r y format . ’ )
205 # Use a mutually e x c l u s i v e group f o r s e l e c t i n g output number format
206 formatGroup = par . add_mutually_exclusive_group ( r e q u i r e d = F a l s e )
207 formatGroup . add_argument ( ’−−outDec ’ , a c t i o n = ’ s t o r e _ t r u e ’ , help= ’ output data i s on decimal
,→ format . ’ )
208 formatGroup . add_argument ( ’−−outHex ’ , a c t i o n = ’ s t o r e _ t r u e ’ , help= ’ output data i s on hexadecimal
,→ format . ’ )
209 formatGroup . add_argument ( ’−−outBin ’ , a c t i o n = ’ s t o r e _ t r u e ’ , help= ’ output data i s on b i n a r y
,→ format . ’ )
210 # Use a mutually e x c l u s i v e group f o r s e l e c t i n g o p e r a t i o n
211 operationGroup = par . add_mutually_exclusive_group ( r e q u i r e d =True )
212 operationGroup . add_argument ( ’−−add ’ , a c t i o n = ’ s t o r e _ t r u e ’ , help= ’ c a l c u l a t e data f o r modular
,→ a d d i t i o n . ’ )
213 operationGroup . add_argument ( ’−−mul ’ , a c t i o n = ’ s t o r e _ t r u e ’ , help= ’ c a l c u l a t e data f o r modular
,→ m u l t i p l i c a t i o n . ’ )
214 operationGroup . add_argument ( ’−−div ’ , a c t i o n = ’ s t o r e _ t r u e ’ , help= ’ c a l c u l a t e data f o r i n t e g e r
,→ d i v i s i o n . ’ )
215
216 args = v a r s ( par . p a r s e _ a r g s ( ) )
217 dataFile = a r g s [ ’ FILE ’ ]
218 bkupFile = f ’ { d a t a F i l e } . backup ’
219 outFile = args [ ’o ’ ] i f args [ ’o ’ ] e l s e d a t a F i l e
220 csvOut = not a r g s [ ’ c ’ ]
221
222 # S e l e c t data o p e r a t i o n
223 i f a r g s [ ’ add ’ ] :
224 d a t a C l a s s = ModAddData
225 cArrayName = ’ dataAdd ’
226 e l i f a r g s [ ’ mul ’ ] :
227 d a t a C l a s s = ModMulData
228 cArrayName = ’ dataMul ’
229 e l i f a r g s [ ’ div ’ ] :
230 d a t a C l a s s = DivData
231 cArrayName = ’ dataDiv ’
232
233 # S e l e c t input number base
234 i f a r g s [ ’ dec ’ ] :
235 i n B a s e = 10
236 e l i f a r g s [ ’ hex ’ ] :
237 i n B a s e = 16
238 e l i f a r g s [ ’ bin ’ ] :
239 inBase = 2
240 # S e l e c t output number base
241 i f a r g s [ ’ outDec ’ ] :
242 outBase = 10
243 e l i f a r g s [ ’ outHex ’ ] :
244 outBase = 16
245 e l i f a r g s [ ’ outBin ’ ] :
246 outBase = 2
247 else :
248 outBase = i n B a s e
249 cArrayName = f ’ { cArrayName } { outBase } ’
250
251 # Perform c a l c u l a t i o n
252 try :
253 with open ( d a t a F i l e , ’ r ’ , newline= ’ ’ ) as f i n :
254 data = d a t a C l a s s ( f i n , i n B a s e )
54 Appendix A. Test Data Python script

255 data . c a l c u l a t e ( )
256 i f args [ ’b ’ ] :
257 s h u t i l . copy ( d a t a F i l e , bkupFile )
258 with open ( o u t F i l e , ’w’ , newline= ’ ’ ) as f o u t :
259 i f csvOut :
260 data . writeCsv ( fout , outBase )
261 else :
262 data . writeC ( fout , outBase , o u t F i l e , cArrayName )
263 e x c e p t DataError as e :
264 p r i n t ( e , f i l e =sys . s t d e r r )

L ISTING A.1: Python script for generating test data


55

Appendix B

Internal Interfaces SV Code

1 interface in_Registers ;
2 l o g i c [NUM_REGS−1 : 0 ] [WORD_WIDTH: 0 ] registers ;
3 l o g i c [WORD_WIDTH: 0 ] wData ;
4 logic [3 : 0] wReg ;
5 logic wEnable ;
6
7 modport s l a v e (
8 output r e g i s t e r s ,
9 in pu t wData ,
10 in pu t wReg ,
11 in pu t wEnable
12 );
13 modport master (
14 in pu t r e g i s t e r s ,
15 output wData ,
16 output wReg ,
17 output wEnable
18 );
19 endinterface
20

21 i n t e r f a c e in_OpModule ;
22 logic ready ;
23 logic error ;
24 logic valid ;
25 l o g i c [ 3 : 0 ] opcode ;
26 l o g i c [ 3 : 0 ] op1Reg ;
27 l o g i c [ 3 : 0 ] op2Reg ;
28 l o g i c [ 3 : 0 ] resReg ;
29
30 modport s l a v e (
31 output ready ,
32 output e r r o r ,
33 in pu t v a l i d ,
34 in pu t opcode ,
35 in pu t op1Reg ,
36 in pu t op2Reg ,
37 in pu t resReg
38 );
39 modport master (
40 in pu t ready ,
41 in pu t e r r o r ,
42 output v a l i d ,
43 output opcode ,
56 Appendix B. Internal Interfaces SV Code

44 output op1Reg ,
45 output op2Reg ,
46 output resReg
47 );
48 endinterface
L ISTING B.1: SystemVerilog code for the internal
interfaces of ECCo.
57

Appendix C

Test Data

modulo , operand1 , operand2 , r e s u l t


7 ,15 ,1 ,2
11 ,3 ,2 ,5
11 ,3 , − 4 ,10
233 ,75 ,77 ,152
233 ,567 ,895 ,64
233 ,567 , − 895 ,138
28657 ,16578 ,19504 ,7425
514229 ,546500 ,357980 ,390251
99194853094755497 ,98275954794755497 ,12457956214 ,98275967252711711
99194853094755497 ,98275954794755497 , − 12457956214 ,98275942336799283
92567853094755497 ,98275954794755497 ,92657924597654697 ,5798173202899200

92567853094755497 , − 98275954794755497 , − 92657924597654697 ,86769679891856297

75356465794755497 ,65245765798756497 ,70253759756423697 ,60143059760424697

74225698149877013133163669918490695756676765155849109751738796007550114900164 ,55228977

55228977394393414412853003502097247104908965897402951232160234933662925082798 ,45228977

74225698149877013133163669918490695756676765155849109751738796007550114900164 ,55228977

74225698149877013133163669918490695756676765155849109751738796007550114900164 ,65228977

74225698149877013133163669918490695756676765155849109751738796007550114900164 ,35289773

74225698149877013133163669918490695756676765155849109751738796007550114900164 ,95289773

74225698149877013133163669918490695756676765155849109751738796007550114900164 ,85289773

74225698149877013133163669918490695756676765155849109751738796007550114900164 ,45228977

L ISTING C.1: Modular addition test data.

modulo , operand1 , operand2 , r e s u l t


7 ,15 ,1 ,1
11 ,3 ,2 ,6
233 ,75 ,77 ,183
233 ,567 ,895 ,224
58 Appendix C. Test Data

28657 ,16578 ,19504 ,381


514229 ,546500 ,357980 ,218095
99194853094755497 ,98275954794755497 ,12457956214 ,31017271154744113
92567853094755497 ,98275954794755497 ,92657924597654697 ,48036520782282743

75356465794755497 ,65245765798756497 ,70253759756423697 ,65782237743603078

74225698149877013133163669918490695756676765155849109751738796007550114900164 ,5522

55228977394393414412853003502097247104908965897402951232160234933662925082798 ,4522

74225698149877013133163669918490695756676765155849109751738796007550114900164 ,5522

74225698149877013133163669918490695756676765155849109751738796007550114900164 ,6522

74225698149877013133163669918490695756676765155849109751738796007550114900164 ,3528

74225698149877013133163669918490695756676765155849109751738796007550114900164 ,9528

74225698149877013133163669918490695756676765155849109751738796007550114900164 ,8528

74225698149877013133163669918490695756676765155849109751738796007550114900164 ,4522

L ISTING C.2: Modular multiplication test data.

operand1 , operand2 , r e s u l t
5 ,1 ,5
3 ,2 ,1
3 , − 4 ,0
75 ,77 ,0
567 ,895 ,0
567 , − 895 ,0
16578 ,19504 ,0
546500 ,357980 ,1
98275954794755497 ,12457956214 ,7888609
98275954794755497 , − 12457956214 , − 7888609
98275954794755497 ,92657924597654697 ,1
98275954794755497 ,97 ,1013154173141809
98275954794755497 , − 97 , − 1013154173141809
65245765798756497 ,70256423697 ,928680
55228977394654679572853003502097247104908965897402951232160234933662925082798 ,4128

65228977394654679572853003502097247104908965897402951232160234933662925082798 ,4128

3528977394654679572853003502097247104908965897402951232160234933662925082798 ,41285

9528977394654679572853003502097247104908965897402951232160234933662925082798 ,91285

8528977394654679572853003502097247104908965897402951232160234933662925082798 ,91285

45228977394393414412853003502097247104908965897402951232160234933662925082798 ,1329

L ISTING C.3: Integer division test data.


59

Appendix D

ECCo C Wrapper

1 # i f n d e f ECC_H
2 # d e f i n e ECC_H
3
4 /* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
5 * *
6 * I n t e r n a l e c c . h macros *
7 * *
8 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
9
10 // Coprocessor number o f t h e ECCo
11 # d e f i n e __ECC_COPROC " p0 "
12
13 /* * * * * * * * * * * * *
14 * Opcodes *
15 * * * * * * * * * * * * */
16
17 // A r i t h m e t i c
18 # d e f i n e __ECC_OPC1_MUL " 0 x0 "
19 # d e f i n e __ECC_OPC1_ADD " 0 x1 "
20 # d e f i n e __ECC_OPC1_DIV " 0 x2 "
21 # d e f i n e __ECC_OPC1_NEG " 0 x3 "
22 // L o g i c a l
23 # d e f i n e __ECC_OPC1_LOG " 0xd "
24 # d e f i n e __ECC_OPC2_OR " 0 x0 "
25 # d e f i n e __ECC_OPC2_AND " 0 x1 "
26 # d e f i n e __ECC_OPC2_XOR " 0 x2 "
27 # d e f i n e __ECC_OPC2_NOT " 0 x3 "
28 // S h i f t
29 # d e f i n e __ECC_OPC1_SFT " 0 xe "
30 # d e f i n e __ECC_OPC2_LSL " 0 x0 "
31 # d e f i n e __ECC_OPC2_LSR " 0 x1 "
32 # d e f i n e __ECC_OPC2_ASR " 0 x2 "
33 // Comparison
34 # d e f i n e __ECC_OPC1_CMP " 0 xf "
35 # d e f i n e __ECC_OPC2_ZR " 0 x0 "
36 # d e f i n e __ECC_OPC2_NZR " 0 x1 "
37 # d e f i n e __ECC_OPC2_EQ " 0 x2 "
38 # d e f i n e __ECC_OPC2_NEQ " 0 x3 "
39 # d e f i n e __ECC_OPC2_LT " 0 x4 "
40 # d e f i n e __ECC_OPC2_GT " 0 x5 "
41 // M i s c e l l a n e o u s
42 # d e f i n e __ECC_OPC1_INC " 0 xa "
43 # d e f i n e __ECC_OPC1_DEC " 0xb "
44 # d e f i n e __ECC_OPC1_SSB " 0 xc "
45 # d e f i n e __ECC_OPC2_SSB " 0 x0 "
46 # d e f i n e __ECC_OPC1_USB " 0 xc "
47 # d e f i n e __ECC_OPC2_USB " 0 x1 "
48
49
50 /* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
51 * *
52 * Exported e c c . h macros *
53 * *
54 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
60 Appendix D. ECCo C Wrapper

55
56 # i f n d e f NULL
57 # d e f i n e NULL ( ( void * ) 0 )
58 # endif
59
60 /* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
61 * Coprocessor i n t e r f a c e meta *
62 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
63
64 # define ECC_OP1_WIDTH 4
65 # define ECC_OP1_MAX 15
66 # define ECC_OP2_WIDTH 3
67 # define ECC_OP2_MAX 7
68 # define ECC_REG_IDX_WIDTH 4
69 # define ECC_REG_IDX_MAX 15
70 # define ECC_WORD_WIDTH 256
71 # define ECC_WORD_WIDTH_BYTE (ECC_WORD_WIDTH/8)
72 # define ECC_MODULO_REG " 14 "
73 # define ECC_STATUS_REG " 15 "
74
75 /* * * * * * * * * * * * * * * * * * * * * * * * * * *
76 * Arithmetic operations *
77 * * * * * * * * * * * * * * * * * * * * * * * * * * */
78
79 // A l l arguments a r e c o p r o c e s s o r r e g i s t e r indexes , which must be i n t e g e r s
i n double quotes .
80 # d e f i n e ECC_MUL( op1Reg , op2Reg , resReg ) asm v o l a t i l e ( " cdp "__ECC_COPROC" ,
# "__ECC_OPC1_MUL" , c r " op2Reg " , c r " op1Reg " , c r " resReg " , #0 " )
81 # d e f i n e ECC_ADD( op1Reg , op2Reg , resReg ) asm v o l a t i l e ( " cdp "__ECC_COPROC" ,
# "__ECC_OPC1_ADD" , c r " op2Reg " , c r " op1Reg " , c r " resReg " , #0 " )
82 # d e f i n e ECC_DIV ( op1Reg , op2Reg , resReg ) asm v o l a t i l e ( " cdp "__ECC_COPROC" ,
# " __ECC_OPC1_DIV " , c r " op2Reg " , c r " op1Reg " , c r " resReg " , #0 " )
83 # d e f i n e ECC_NEG( opReg , resReg ) asm v o l a t i l e ( " cdp "__ECC_COPROC" ,
# "__ECC_OPC1_NEG" , cr0 , c r " opReg " , c r " resReg " , #0 " )
84
85
86 /* * * * * * * * * * * * * * * * * * * * * * * *
87 * Logical operations *
88 * * * * * * * * * * * * * * * * * * * * * * * */
89
90 // A l l arguments a r e c o p r o c e s s o r r e g i s t e r indexes , which must be i n t e g e r s
i n double quotes .
91 # d e f i n e ECC_OR( op1Reg , op2Reg , resReg ) asm v o l a t i l e ( " cdp "__ECC_COPROC" ,
# "__ECC_OPC1_LOG" , c r " op2Reg " , c r " op1Reg " , c r " resReg " , # "__ECC_OPC2_OR
)
92 # d e f i n e ECC_AND( op1Reg , op2Reg , resReg ) asm v o l a t i l e ( " cdp "__ECC_COPROC" ,
# "__ECC_OPC1_LOG" , c r " op2Reg " , c r " op1Reg " , c r " resReg " , # "
__ECC_OPC2_AND)
93 # d e f i n e ECC_XOR( op1Reg , op2Reg , resReg ) asm v o l a t i l e ( " cdp "__ECC_COPROC" ,
# "__ECC_OPC1_LOG" , c r " op2Reg " , c r " op1Reg " , c r " resReg " , # "
__ECC_OPC2_XOR )
94 # d e f i n e ECC_NOT( opReg , resReg ) asm v o l a t i l e ( " cdp "__ECC_COPROC" ,
# "__ECC_OPC1_LOG" , cr0 , c r " op1Reg " , c r " resReg " , # "
__ECC_OPC2_NOT )
95
96
97 /* * * * * * * * * * * * * * * * * * * * * *
98 * S h i f t operations *
99 * * * * * * * * * * * * * * * * * * * * * */
100
101 // A l l arguments a r e c o p r o c e s s o r r e g i s t e r indexes , which must be i n t e g e r s
i n double quotes .
102 # d e f i n e ECC_LSL ( op1Reg , op2Reg , resReg ) asm v o l a t i l e ( " cdp "__ECC_COPROC" ,
# " __ECC_OPC1_SFT " , c r " op2Reg " , c r " op1Reg " , c r " resReg " , # " __ECC_OPC2_LSL
)
103 # d e f i n e ECC_LSR ( op1Reg , op2Reg , resReg ) asm v o l a t i l e ( " cdp "__ECC_COPROC" ,
# " __ECC_OPC1_SFT " , c r " op2Reg " , c r " op1Reg " , c r " resReg " , # " __ECC_OPC2_LSR
)
104 # d e f i n e ECC_ASR( op1Reg , op2Reg , resReg ) asm v o l a t i l e ( " cdp "__ECC_COPROC" ,
# " __ECC_OPC1_SFT " , c r " op2Reg " , c r " op1Reg " , c r " resReg " , # " __ECC_OPC2_ASR
)
105
Appendix D. ECCo C Wrapper 61

106
107 /* * * * * * * * * * * * * * * * * * * * * * * * * * *
108 * Comparison o p e r a t i o n s *
109 * * * * * * * * * * * * * * * * * * * * * * * * * * */
110
111 // A l l arguments a r e c o p r o c e s s o r r e g i s t e r indexes , which must be i n t e g e r s
i n double quotes .
112 # d e f i n e ECC_ZR( reg ) asm v o l a t i l e ( " cdp "__ECC_COPROC" , # "
__ECC_OPC1_CMP " , cr0 , c r " reg " , cr0 , # " __ECC_OPC2_ZR )
113 # d e f i n e ECC_NZR( reg ) asm v o l a t i l e ( " cdp "__ECC_COPROC" , # "
__ECC_OPC1_CMP " , cr0 , c r " reg " , cr0 , # "__ECC_OPC2_NZR )
114 # d e f i n e ECC_EQ( op1Reg , op2Reg ) asm v o l a t i l e ( " cdp "__ECC_COPROC" , # "
__ECC_OPC1_CMP " , c r " op2Reg " , c r " op1Reg " , cr0 , # "__ECC_OPC2_EQ )
115 # d e f i n e ECC_NEQ( op1Reg , op2Reg ) asm v o l a t i l e ( " cdp "__ECC_COPROC" , # "
__ECC_OPC1_CMP " , c r " op2Reg " , c r " op1Reg " , cr0 , # "__ECC_OPC2_NEQ )
116 # d e f i n e ECC_LT ( op1Reg , op2Reg ) asm v o l a t i l e ( " cdp "__ECC_COPROC" , # "
__ECC_OPC1_CMP " , c r " op2Reg " , c r " op1Reg " , cr0 , # " __ECC_OPC2_LT )
117 # d e f i n e ECC_GT( op1Reg , op2Reg ) asm v o l a t i l e ( " cdp "__ECC_COPROC" , # "
__ECC_OPC1_CMP " , c r " op2Reg " , c r " op1Reg " , cr0 , # " __ECC_OPC2_GT )
118
119
120 /* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
121 * Miscellaneous operations *
122 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
123
124 // A l l arguments a r e c o p r o c e s s o r r e g i s t e r indexes , which must be i n t e g e r s
i n double quotes .
125 # d e f i n e ECC_INC ( opReg , resReg ) asm v o l a t i l e ( " cdp "__ECC_COPROC" , # "
__ECC_OPC1_INC " , cr0 , c r " opReg " , c r " resReg " , #0 " )
126 # d e f i n e ECC_DEC( opReg , resReg ) asm v o l a t i l e ( " cdp "__ECC_COPROC" , # "
__ECC_OPC1_DEC " , cr0 , c r " opReg " , c r " resReg " , #0 " )
127 # d e f i n e ECC_SSB ( reg ) asm v o l a t i l e ( " cdp "__ECC_COPROC" , # "
__ECC_OPC1_SSB " , cr0 , c r " reg " , cr0 , # " __ECC_OPC2_SSB )
128 # d e f i n e ECC_USB ( reg ) asm v o l a t i l e ( " cdp "__ECC_COPROC" , # "
__ECC_OPC1_USB " , cr0 , c r " reg " , cr0 , # " __ECC_OPC2_USB )
129
130
131 /* * * * * * * * * * * * * * * * * * * * * * * * * *
132 * Data t r a n s f e r macros *
133 * * * * * * * * * * * * * * * * * * * * * * * * * */
134
135 / * Load c o p r o c e s s o r r e g i s t e r macros . O f f s e t i s i n hexa . ’ reg ’ i s a
coprocessor
136 r e g i s t e r index and must be a decimal i n t e g e r i n double quotes . ’ Rt ’ and
’ Rt2 ’ a r e
137 32− b i t i np ut v a r i a b l e s . * /
138 # d e f i n e ECC_LOAD_0( Rt , Rt2 , reg ) asm v o l a t i l e ( " mcrr "__ECC_COPROC" , #0 x0 ,
%0, %1, c r " reg : : "rm" ( Rt ) , "rm" ( Rt2 ) )
139 # i f ECC_WORD_WIDTH > 64
140 # d e f i n e ECC_LOAD_1( Rt , Rt2 , reg ) asm v o l a t i l e ( " mcrr "__ECC_COPROC" , #0 x1 ,
%0, %1, c r " reg : : "rm" ( Rt ) , "rm" ( Rt2 ) )
141 # else
142 # d e f i n e ECC_LOAD_1( Rt , Rt2 , reg )
143 # endif
144 # i f ECC_WORD_WIDTH > 128
145 # d e f i n e ECC_LOAD_2( Rt , Rt2 , reg ) asm v o l a t i l e ( " mcrr "__ECC_COPROC" , #0 x2 ,
%0, %1, c r " reg : : "rm" ( Rt ) , "rm" ( Rt2 ) )
146 # else
147 # d e f i n e ECC_LOAD_2( Rt , Rt2 , reg )
148 # endif
149 # i f ECC_WORD_WIDTH > 192
150 # d e f i n e ECC_LOAD_3( Rt , Rt2 , reg ) asm v o l a t i l e ( " mcrr "__ECC_COPROC" , #0 x3 ,
%0, %1, c r " reg : : "rm" ( Rt ) , "rm" ( Rt2 ) )
151 # else
152 # d e f i n e ECC_LOAD_3( Rt , Rt2 , reg )
153 # endif
154 # i f ECC_WORD_WIDTH > 256
155 # d e f i n e ECC_LOAD_4( Rt , Rt2 , reg ) asm v o l a t i l e ( " mcrr "__ECC_COPROC" , #0 x4 ,
%0, %1, c r " reg : : "rm" ( Rt ) , "rm" ( Rt2 ) )
156 # else
157 # d e f i n e ECC_LOAD_4( Rt , Rt2 , reg )
158 # endif
62 Appendix D. ECCo C Wrapper

159 # i f ECC_WORD_WIDTH > 320


160 # d e f i n e ECC_LOAD_5( Rt , Rt2 , reg ) asm v o l a t i l e ( " mcrr "__ECC_COPROC" , #0 x5 ,
%0, %1, c r " reg : : "rm" ( Rt ) , "rm" ( Rt2 ) )
161 # else
162 # d e f i n e ECC_LOAD_5( Rt , Rt2 , reg )
163 # endif
164 # i f ECC_WORD_WIDTH > 384
165 # d e f i n e ECC_LOAD_6( Rt , Rt2 , reg ) asm v o l a t i l e ( " mcrr "__ECC_COPROC" , #0 x6 ,
%0, %1, c r " reg : : "rm" ( Rt ) , "rm" ( Rt2 ) )
166 # else
167 # d e f i n e ECC_LOAD_6( Rt , Rt2 , reg )
168 # endif
169 # i f ECC_WORD_WIDTH > 448
170 # d e f i n e ECC_LOAD_7( Rt , Rt2 , reg ) asm v o l a t i l e ( " mcrr "__ECC_COPROC" , #0 x7 ,
%0, %1, c r " reg : : "rm" ( Rt ) , "rm" ( Rt2 ) )
171 # else
172 # d e f i n e ECC_LOAD_7( Rt , Rt2 , reg )
173 # endif
174 # i f ECC_WORD_WIDTH > 512
175 # d e f i n e ECC_LOAD_8( Rt , Rt2 , reg ) asm v o l a t i l e ( " mcrr "__ECC_COPROC" , #0 x8 ,
%0, %1, c r " reg : : "rm" ( Rt ) , "rm" ( Rt2 ) )
176 # else
177 # d e f i n e ECC_LOAD_8( Rt , Rt2 , reg )
178 # endif
179 # i f ECC_WORD_WIDTH > 576
180 # d e f i n e ECC_LOAD_9( Rt , Rt2 , reg ) asm v o l a t i l e ( " mcrr "__ECC_COPROC" , #0 x9 ,
%0, %1, c r " reg : : "rm" ( Rt ) , "rm" ( Rt2 ) )
181 # else
182 # d e f i n e ECC_LOAD_9( Rt , Rt2 , reg )
183 # endif
184 # i f ECC_WORD_WIDTH > 640
185 # d e f i n e ECC_LOAD_10 ( Rt , Rt2 , reg ) asm v o l a t i l e ( " mcrr "__ECC_COPROC" , #0 xa
, %0, %1, c r " reg : : "rm" ( Rt ) , "rm" ( Rt2 ) )
186 # else
187 # d e f i n e ECC_LOAD_10 ( Rt , Rt2 , reg )
188 # endif
189 # i f ECC_WORD_WIDTH > 704
190 # d e f i n e ECC_LOAD_11 ( Rt , Rt2 , reg ) asm v o l a t i l e ( " mcrr "__ECC_COPROC" , #0xb
, %0, %1, c r " reg : : "rm" ( Rt ) , "rm" ( Rt2 ) )
191 # else
192 # d e f i n e ECC_LOAD_11 ( Rt , Rt2 , reg )
193 # endif
194 # i f ECC_WORD_WIDTH > 768
195 # d e f i n e ECC_LOAD_12 ( Rt , Rt2 , reg ) asm v o l a t i l e ( " mcrr "__ECC_COPROC" , #0 xc
, %0, %1, c r " reg : : "rm" ( Rt ) , "rm" ( Rt2 ) )
196 # else
197 # d e f i n e ECC_LOAD_12 ( Rt , Rt2 , reg )
198 # endif
199 # i f ECC_WORD_WIDTH > 832
200 # d e f i n e ECC_LOAD_13 ( Rt , Rt2 , reg ) asm v o l a t i l e ( " mcrr "__ECC_COPROC" , #0xd
, %0, %1, c r " reg : : "rm" ( Rt ) , "rm" ( Rt2 ) )
201 # else
202 # d e f i n e ECC_LOAD_13 ( Rt , Rt2 , reg )
203 # endif
204 # i f ECC_WORD_WIDTH > 896
205 # d e f i n e ECC_LOAD_14 ( Rt , Rt2 , reg ) asm v o l a t i l e ( " mcrr "__ECC_COPROC" , #0 xe
, %0, %1, c r " reg : : "rm" ( Rt ) , "rm" ( Rt2 ) )
206 # else
207 # d e f i n e ECC_LOAD_14 ( Rt , Rt2 , reg )
208 # endif
209 # i f ECC_WORD_WIDTH > 960
210 # d e f i n e ECC_LOAD_15 ( Rt , Rt2 , reg ) asm v o l a t i l e ( " mcrr "__ECC_COPROC" , #0 xf
, %0, %1, c r " reg : : "rm" ( Rt ) , "rm" ( Rt2 ) )
211 # else
212 # d e f i n e ECC_LOAD_15 ( Rt , Rt2 , reg )
213 # endif
214
215 / * S t o r e c o p r o c e s s o r r e g i s t e r macros . O f f s e t i s i n hexa . ’ reg ’ i s a
coprocessor
216 r e g i s t e r index and must be a decimal i n t e g e r i n double quotes . ’ Rt ’ and
’ Rt2 ’ a r e
217 32− b i t output v a r i a b l e s . * /
Appendix D. ECCo C Wrapper 63

218 # d e f i n e ECC_STORE_0 ( Rt , Rt2 , reg ) asm v o l a t i l e ( " mrrc "__ECC_COPROC" , #0 x0 ,


%0, %1, c r " reg : " =rm" ( Rt ) , " =rm" ( Rt2 ) )
219 # i f ECC_WORD_WIDTH > 64
220 # d e f i n e ECC_STORE_1 ( Rt , Rt2 , reg ) asm v o l a t i l e ( " mrrc "__ECC_COPROC" , #0 x1
, %0, %1, c r " reg : " =rm" ( Rt ) , " =rm" ( Rt2 ) )
221 # else
222 # d e f i n e ECC_STORE_1 ( Rt , Rt2 , reg )
223 # endif
224 # i f ECC_WORD_WIDTH > 128
225 # d e f i n e ECC_STORE_2 ( Rt , Rt2 , reg ) asm v o l a t i l e ( " mrrc "__ECC_COPROC" , #0 x2
, %0, %1, c r " reg : " =rm" ( Rt ) , " =rm" ( Rt2 ) )
226 # else
227 # d e f i n e ECC_STORE_2 ( Rt , Rt2 , reg )
228 # endif
229 # i f ECC_WORD_WIDTH > 192
230 # d e f i n e ECC_STORE_3 ( Rt , Rt2 , reg ) asm v o l a t i l e ( " mrrc "__ECC_COPROC" , #0 x3
, %0, %1, c r " reg : " =rm" ( Rt ) , " =rm" ( Rt2 ) )
231 # else
232 # d e f i n e ECC_STORE_3 ( Rt , Rt2 , reg )
233 # endif
234 # i f ECC_WORD_WIDTH > 256
235 # d e f i n e ECC_STORE_4 ( Rt , Rt2 , reg ) asm v o l a t i l e ( " mrrc "__ECC_COPROC" , #0 x4
, %0, %1, c r " reg : " =rm" ( Rt ) , " =rm" ( Rt2 ) )
236 # else
237 # d e f i n e ECC_STORE_4 ( Rt , Rt2 , reg )
238 # endif
239 # i f ECC_WORD_WIDTH > 320
240 # d e f i n e ECC_STORE_5 ( Rt , Rt2 , reg ) asm v o l a t i l e ( " mrrc "__ECC_COPROC" , #0 x5
, %0, %1, c r " reg : " =rm" ( Rt ) , " =rm" ( Rt2 ) )
241 # else
242 # d e f i n e ECC_STORE_5 ( Rt , Rt2 , reg )
243 # endif
244 # i f ECC_WORD_WIDTH > 384
245 # d e f i n e ECC_STORE_6 ( Rt , Rt2 , reg ) asm v o l a t i l e ( " mrrc "__ECC_COPROC" , #0 x6
, %0, %1, c r " reg : " =rm" ( Rt ) , " =rm" ( Rt2 ) )
246 # else
247 # d e f i n e ECC_STORE_6 ( Rt , Rt2 , reg )
248 # endif
249 # i f ECC_WORD_WIDTH > 448
250 # d e f i n e ECC_STORE_7 ( Rt , Rt2 , reg ) asm v o l a t i l e ( " mrrc "__ECC_COPROC" , #0 x7
, %0, %1, c r " reg : " =rm" ( Rt ) , " =rm" ( Rt2 ) )
251 # else
252 # d e f i n e ECC_STORE_7 ( Rt , Rt2 , reg )
253 # endif
254 # i f ECC_WORD_WIDTH > 512
255 # d e f i n e ECC_STORE_8 ( Rt , Rt2 , reg ) asm v o l a t i l e ( " mrrc "__ECC_COPROC" , #0 x8
, %0, %1, c r " reg : " =rm" ( Rt ) , " =rm" ( Rt2 ) )
256 # else
257 # d e f i n e ECC_STORE_8 ( Rt , Rt2 , reg )
258 # endif
259 # i f ECC_WORD_WIDTH > 576
260 # d e f i n e ECC_STORE_9 ( Rt , Rt2 , reg ) asm v o l a t i l e ( " mrrc "__ECC_COPROC" , #0 x9
, %0, %1, c r " reg : " =rm" ( Rt ) , " =rm" ( Rt2 ) )
261 # else
262 # d e f i n e ECC_STORE_9 ( Rt , Rt2 , reg )
263 # endif
264 # i f ECC_WORD_WIDTH > 640
265 # d e f i n e ECC_STORE_10 ( Rt , Rt2 , reg ) asm v o l a t i l e ( " mrrc "__ECC_COPROC" , #0
xa , %0, %1, c r " reg : " =rm" ( Rt ) , " =rm" ( Rt2 ) )
266 # else
267 # d e f i n e ECC_STORE_10 ( Rt , Rt2 , reg )
268 # endif
269 # i f ECC_WORD_WIDTH > 704
270 # d e f i n e ECC_STORE_11 ( Rt , Rt2 , reg ) asm v o l a t i l e ( " mrrc "__ECC_COPROC" , #0
xb , %0, %1, c r " reg : " =rm" ( Rt ) , " =rm" ( Rt2 ) )
271 # else
272 # d e f i n e ECC_STORE_11 ( Rt , Rt2 , reg )
273 # endif
274 # i f ECC_WORD_WIDTH > 768
275 # d e f i n e ECC_STORE_12 ( Rt , Rt2 , reg ) asm v o l a t i l e ( " mrrc "__ECC_COPROC" , #0
xc , %0, %1, c r " reg : " =rm" ( Rt ) , " =rm" ( Rt2 ) )
276 # else
64 Appendix D. ECCo C Wrapper

277 # d e f i n e ECC_STORE_12 ( Rt , Rt2 , reg )


278 # endif
279 # i f ECC_WORD_WIDTH > 832
280 # d e f i n e ECC_STORE_13 ( Rt , Rt2 , reg ) asm v o l a t i l e ( " mrrc "__ECC_COPROC" , #0
xd , %0, %1, c r " reg : " =rm" ( Rt ) , " =rm" ( Rt2 ) )
281 # else
282 # d e f i n e ECC_STORE_13 ( Rt , Rt2 , reg )
283 # endif
284 # i f ECC_WORD_WIDTH > 896
285 # d e f i n e ECC_STORE_14 ( Rt , Rt2 , reg ) asm v o l a t i l e ( " mrrc "__ECC_COPROC" , #0
xe , %0, %1, c r " reg : " =rm" ( Rt ) , " =rm" ( Rt2 ) )
286 # else
287 # d e f i n e ECC_STORE_14 ( Rt , Rt2 , reg )
288 # endif
289 # i f ECC_WORD_WIDTH > 960
290 # d e f i n e ECC_STORE_15 ( Rt , Rt2 , reg ) asm v o l a t i l e ( " mrrc "__ECC_COPROC" , #0
xf , %0, %1, c r " reg : " =rm" ( Rt ) , " =rm" ( Rt2 ) )
291 # else
292 # d e f i n e ECC_STORE_15 ( Rt , Rt2 , reg )
293 # endif
294
295 # endif // ECC_H

L ISTING D.1: ECCo C wrapper source.


65

Appendix E

ECCo Big Number library

1 # i f n d e f ECC_WORD_H
2 # d e f i n e ECC_WORD_H
3
4 # i n c l u d e < s t d b o o l . h>
5
6 # include " ecc . h"
7
8 / * Length o f a r r a y i n word s t r u c t . Define here i n s t e a d o f e c c . h s i n c e i t
depends
9 on a r r a y type . * /
10 # d e f i n e EW_LENGTH (ECC_WORD_WIDTH_BYTE/ s i z e o f ( i n t ) )
11
12 / * +4 t o f i t t e r m i n a t i n g ’ \ 0 ’ , l e a d i n g ’ 0 b ’ and o p t i o n a l ’ − ’ s i g n . * /
13 # d e f i n e EW_STR_LENGTH ECC_WORD_WIDTH+4
14
15 / * ecc_word i s t h e d a t a t y p e t o work with b i g numbers width t h e same width
as
16 t h e ECC c o p r o c e s s o r s word s i z e . * /
17 typedef s t r u c t {
18 i n t word [EW_LENGTH ] ;
19 bool i s _ z e r o ;
20 bool i s _ n e g a t i v e ;
21 } ecc_word_t ;
22
23 / * S t r i n g −type b i g enough t o r e p r e s e n t any number on e i t h e r
24 binary , decimal or hexadecimal format . * /
25 t y p e d e f char e w _ s t r _ t [EW_STR_LENGTH ] ;
26
27 / * I n i t i a l i z e s a ecc_word . Returns a p o i n t e r t o t h e given word . * /
28 ecc_word_t * e w _ i n i t ( ecc_word_t * ) ;
29
30 / * C r e a t e s a new copy o f an ecc_word . Returns a p o i n t e r t o d s t . * /
31 ecc_word_t * ew_copy ( c o n s t ecc_word_t * r e s t r i c t s r c , ecc_word_t * r e s t r i c t
dst ) ;
32
33
34 /* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
35 * *
36 * Content h a n d l e r s *
37 * *
38 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
39
40 / * S e t s t h e c o n t e n t o f a ecc_word t o 0 . Returns a p o i n t e r t o t h e given word
. */
41 ecc_word_t * ew_zero ( ecc_word_t * ) ;
42
43 / * S e t t h e value t o an i n t e g e r value . * /
44 ecc_word_t * e w _ s e t _ i n t ( ecc_word_t * , i n t ) ;
45
46 / * S e t t h e value o f a word t o a number r e p r e s e n t e d by a s t r i n g i n
hexadecimal
47 ( 0 x p r e f i x ) format . Return a p o i n t e r t o t h e word , or NULL on f a i l u r e . * /
48 ecc_word_t * e w _ s e t _ s t r ( ecc_word_t * , c o n s t char [ ] ) ;
49
66 Appendix E. ECCo Big Number library

50 / * S e t p a r t s o f t h e c o n t e n t o f a word , based on t h e given o f f s e t . * /


51 ecc_word_t * e w _ s e t _ o f f s ( ecc_word_t * w, i n t o f f s , i n t r1 , i n t r 2 ) ;
52
53 / * Return a p o i n t e r t o t h e hexadecimal f o r m a t t e d s t r i n g o f t h e number . * /
54 char * e w _ t o _ s t r ( c o n s t ecc_word_t * , char [ ] , i n t ) ;
55
56
57 /* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
58 * *
59 * Comparison *
60 * *
61 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
62
63 / * Check i f two words a r e equal . * /
64 bool ew_eq ( c o n s t ecc_word_t * , c o n s t ecc_word_t * ) ;
65
66
67 /* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
68 * *
69 * Coprocessor i n t e r r a c t i o n *
70 * *
71 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
72
73 / * Load t h e given word i n t o a c o p r o c e s s o r r e g i s t e r . * /
74 void ew_load_cr0 ( c o n s t ecc_word_t * ) ;
75 void ew_load_cr1 ( c o n s t ecc_word_t * ) ;
76 void ew_load_cr2 ( c o n s t ecc_word_t * ) ;
77 void ew_load_cr3 ( c o n s t ecc_word_t * ) ;
78 void ew_load_cr4 ( c o n s t ecc_word_t * ) ;
79 void ew_load_cr5 ( c o n s t ecc_word_t * ) ;
80 void ew_load_cr6 ( c o n s t ecc_word_t * ) ;
81 void ew_load_cr7 ( c o n s t ecc_word_t * ) ;
82 void ew_load_cr8 ( c o n s t ecc_word_t * ) ;
83 void ew_load_cr9 ( c o n s t ecc_word_t * ) ;
84 void ew_load_cr10 ( c o n s t ecc_word_t * ) ;
85 void ew_load_cr11 ( c o n s t ecc_word_t * ) ;
86 void ew_load_cr12 ( c o n s t ecc_word_t * ) ;
87 void ew_load_cr13 ( c o n s t ecc_word_t * ) ;
88 void ew_load_cr14 ( c o n s t ecc_word_t * ) ;
89 / * CP r e g i s t e r 15 i s s t a t u s r e g i s t e r and u n w r i t e a b l e * /
90
91 / * S t o r e t h e value o f a c o p r o c e s s o r s r e g i s t e r i n t h e given word . Takes
92 c o p r o c e s s o r r e g i s t e r index as second argument . * /
93 void e w _ s t o r e _ c r 0 ( ecc_word_t * ) ;
94 void e w _ s t o r e _ c r 1 ( ecc_word_t * ) ;
95 void e w _ s t o r e _ c r 2 ( ecc_word_t * ) ;
96 void e w _ s t o r e _ c r 3 ( ecc_word_t * ) ;
97 void e w _ s t o r e _ c r 4 ( ecc_word_t * ) ;
98 void e w _ s t o r e _ c r 5 ( ecc_word_t * ) ;
99 void e w _ s t o r e _ c r 6 ( ecc_word_t * ) ;
100 void e w _ s t o r e _ c r 7 ( ecc_word_t * ) ;
101 void e w _ s t o r e _ c r 8 ( ecc_word_t * ) ;
102 void e w _ s t o r e _ c r 9 ( ecc_word_t * ) ;
103 void e w _ s t o r e _ c r 1 0 ( ecc_word_t * ) ;
104 void e w _ s t o r e _ c r 1 1 ( ecc_word_t * ) ;
105 void e w _ s t o r e _ c r 1 2 ( ecc_word_t * ) ;
106 void e w _ s t o r e _ c r 1 3 ( ecc_word_t * ) ;
107 void e w _ s t o r e _ c r 1 4 ( ecc_word_t * ) ;
108 void e w _ s t o r e _ c r 1 5 ( ecc_word_t * ) ;
109
110 / * Convenience macros * /
111 # d e f i n e EW_LOAD_MOD(WORD) ew_load_cr14 (WORD)
112 # d e f i n e EW_STORE_MOD(WORD) e w _ s t o r e _ c r 1 4 (WORD)
113 # d e f i n e EW_STORE_STATUS(WORD) e w _ s t o r e _ c r 1 5 (WORD)
114
115
116 /* * * * * * * * * * * * * * * * * * * * * * * * * *
117 * O f f s e t s e l e c t macros *
118 * * * * * * * * * * * * * * * * * * * * * * * * * */
119
120 # d e f i n e EW_GET_0 ( Rt , Rt2 , W) Rt = W−>word [ 0 ] ; Rt2 = W−>word [ 1 ]
121 # i f ECC_WORD_WIDTH > 64
Appendix E. ECCo Big Number library 67

122 # d e f i n e EW_GET_1 ( Rt , Rt2 , W) Rt = W−>word [ 2 ] ; Rt2 = W−>word [ 3 ]


123 # else
124 # d e f i n e EW_GET_1 ( Rt , Rt2 , W)
125 # endif
126 # i f ECC_WORD_WIDTH > 128
127 # d e f i n e EW_GET_2 ( Rt , Rt2 , W) Rt = W−>word [ 4 ] ; Rt2 = W−>word [ 5 ]
128 # else
129 # d e f i n e EW_GET_2 ( Rt , Rt2 , W)
130 # endif
131 # i f ECC_WORD_WIDTH > 192
132 # d e f i n e EW_GET_3 ( Rt , Rt2 , W) Rt = W−>word [ 6 ] ; Rt2 = W−>word [ 7 ]
133 # else
134 # d e f i n e EW_GET_3 ( Rt , Rt2 , W)
135 # endif
136 # i f ECC_WORD_WIDTH > 256
137 # d e f i n e EW_GET_4 ( Rt , Rt2 , W) Rt = W−>word [ 8 ] ; Rt2 = W−>word [ 9 ]
138 # else
139 # d e f i n e EW_GET_4 ( Rt , Rt2 , W)
140 # endif
141 # i f ECC_WORD_WIDTH > 320
142 # d e f i n e EW_GET_5 ( Rt , Rt2 , W) Rt = W−>word [ 1 0 ] ; Rt2 = W−>word [ 1 1 ]
143 # else
144 # d e f i n e EW_GET_5 ( Rt , Rt2 , W)
145 # endif
146 # i f ECC_WORD_WIDTH > 384
147 # d e f i n e EW_GET_6 ( Rt , Rt2 , W) Rt = W−>word [ 1 2 ] ; Rt2 = W−>word [ 1 3 ]
148 # else
149 # d e f i n e EW_GET_6 ( Rt , Rt2 , W)
150 # endif
151 # i f ECC_WORD_WIDTH > 448
152 # d e f i n e EW_GET_7 ( Rt , Rt2 , W) Rt = W−>word [ 1 4 ] ; Rt2 = W−>word [ 1 5 ]
153 # else
154 # d e f i n e EW_GET_7 ( Rt , Rt2 , W)
155 # endif
156 # i f ECC_WORD_WIDTH > 512
157 # d e f i n e EW_GET_8 ( Rt , Rt2 , W) Rt = W−>word [ 1 6 ] ; Rt2 = W−>word [ 1 7 ]
158 # else
159 # d e f i n e EW_GET_8 ( Rt , Rt2 , W)
160 # endif
161 # i f ECC_WORD_WIDTH > 576
162 # d e f i n e EW_GET_9 ( Rt , Rt2 , W) Rt = W−>word [ 1 8 ] ; Rt2 = W−>word [ 1 9 ]
163 # else
164 # d e f i n e EW_GET_9 ( Rt , Rt2 , W)
165 # endif
166 # i f ECC_WORD_WIDTH > 640
167 # d e f i n e EW_GET_10 ( Rt , Rt2 , W) Rt = W−>word [ 2 0 ] ; Rt2 = W−>word [ 2 1 ]
168 # else
169 # d e f i n e EW_GET_10 ( Rt , Rt2 , W)
170 # endif
171 # i f ECC_WORD_WIDTH > 704
172 # d e f i n e EW_GET_11 ( Rt , Rt2 , W) Rt = W−>word [ 2 2 ] ; Rt2 = W−>word [ 2 3 ]
173 # else
174 # d e f i n e EW_GET_11 ( Rt , Rt2 , W)
175 # endif
176 # i f ECC_WORD_WIDTH > 768
177 # d e f i n e EW_GET_12 ( Rt , Rt2 , W) Rt = W−>word [ 2 4 ] ; Rt2 = W−>word [ 2 5 ]
178 # else
179 # d e f i n e EW_GET_12 ( Rt , Rt2 , W)
180 # endif
181 # i f ECC_WORD_WIDTH > 832
182 # d e f i n e EW_GET_13 ( Rt , Rt2 , W) Rt = W−>word [ 2 6 ] ; Rt2 = W−>word [ 2 7 ]
183 # else
184 # d e f i n e EW_GET_13 ( Rt , Rt2 , W)
185 # endif
186 # i f ECC_WORD_WIDTH > 896
187 # d e f i n e EW_GET_14 ( Rt , Rt2 , W) Rt = W−>word [ 2 8 ] ; Rt2 = W−>word [ 2 9 ]
188 # else
189 # d e f i n e EW_GET_14 ( Rt , Rt2 , W)
190 # endif
191 # i f ECC_WORD_WIDTH > 960
192 # d e f i n e EW_GET_15 ( Rt , Rt2 , W) Rt = W−>word [ 3 0 ] ; Rt2 = W−>word [ 3 1 ]
193 # else
68 Appendix E. ECCo Big Number library

194 # d e f i n e EW_GET_15 ( Rt , Rt2 , W)


195 # endif
196
197 # d e f i n e EW_SET_0 ( Rt , Rt2 , W) e w _ s e t _ o f f s (W, 0 , Rt , Rt2 )
198 # i f ECC_WORD_WIDTH > 64
199 # d e f i n e EW_SET_1 ( Rt , Rt2 , W) e w _ s e t _ o f f s (W, 1 , Rt , Rt2 )
200 # else
201 # d e f i n e EW_SET_1 ( Rt , Rt2 , W)
202 # endif
203 # i f ECC_WORD_WIDTH > 128
204 # d e f i n e EW_SET_2 ( Rt , Rt2 , W) e w _ s e t _ o f f s (W, 2 , Rt , Rt2 )
205 # else
206 # d e f i n e EW_SET_2 ( Rt , Rt2 , W)
207 # endif
208 # i f ECC_WORD_WIDTH > 192
209 # d e f i n e EW_SET_3 ( Rt , Rt2 , W) e w _ s e t _ o f f s (W, 3 , Rt , Rt2 )
210 # else
211 # d e f i n e EW_SET_3 ( Rt , Rt2 , W)
212 # endif
213 # i f ECC_WORD_WIDTH > 256
214 # d e f i n e EW_SET_4 ( Rt , Rt2 , W) e w _ s e t _ o f f s (W, 4 , Rt , Rt2 )
215 # else
216 # d e f i n e EW_SET_4 ( Rt , Rt2 , W)
217 # endif
218 # i f ECC_WORD_WIDTH > 320
219 # d e f i n e EW_SET_5 ( Rt , Rt2 , W) e w _ s e t _ o f f s (W, 5 , Rt , Rt2 )
220 # else
221 # d e f i n e EW_SET_5 ( Rt , Rt2 , W)
222 # endif
223 # i f ECC_WORD_WIDTH > 384
224 # d e f i n e EW_SET_6 ( Rt , Rt2 , W) e w _ s e t _ o f f s (W, 6 , Rt , Rt2 )
225 # else
226 # d e f i n e EW_SET_6 ( Rt , Rt2 , W)
227 # endif
228 # i f ECC_WORD_WIDTH > 448
229 # d e f i n e EW_SET_7 ( Rt , Rt2 , W) e w _ s e t _ o f f s (W, 7 , Rt , Rt2 )
230 # else
231 # d e f i n e EW_SET_7 ( Rt , Rt2 , W)
232 # endif
233 # i f ECC_WORD_WIDTH > 512
234 # d e f i n e EW_SET_8 ( Rt , Rt2 , W) e w _ s e t _ o f f s (W, 8 , Rt , Rt2 )
235 # else
236 # d e f i n e EW_SET_8 ( Rt , Rt2 , W)
237 # endif
238 # i f ECC_WORD_WIDTH > 576
239 # d e f i n e EW_SET_9 ( Rt , Rt2 , W) e w _ s e t _ o f f s (W, 9 , Rt , Rt2 )
240 # else
241 # d e f i n e EW_SET_9 ( Rt , Rt2 , W)
242 # endif
243 # i f ECC_WORD_WIDTH > 640
244 # d e f i n e EW_SET_10 ( Rt , Rt2 , W) e w _ s e t _ o f f s (W, 1 0 , Rt , Rt2 )
245 # else
246 # d e f i n e EW_SET_10 ( Rt , Rt2 , W)
247 # endif
248 # i f ECC_WORD_WIDTH > 704
249 # d e f i n e EW_SET_11 ( Rt , Rt2 , W) e w _ s e t _ o f f s (W, 1 1 , Rt , Rt2 )
250 # else
251 # d e f i n e EW_SET_11 ( Rt , Rt2 , W)
252 # endif
253 # i f ECC_WORD_WIDTH > 768
254 # d e f i n e EW_SET_12 ( Rt , Rt2 , W) e w _ s e t _ o f f s (W, 1 2 , Rt , Rt2 )
255 # else
256 # d e f i n e EW_SET_12 ( Rt , Rt2 , W)
257 # endif
258 # i f ECC_WORD_WIDTH > 832
259 # d e f i n e EW_SET_13 ( Rt , Rt2 , W) e w _ s e t _ o f f s (W, 1 3 , Rt , Rt2 )
260 # else
261 # d e f i n e EW_SET_13 ( Rt , Rt2 , W)
262 # endif
263 # i f ECC_WORD_WIDTH > 896
264 # d e f i n e EW_SET_14 ( Rt , Rt2 , W) e w _ s e t _ o f f s (W, 1 4 , Rt , Rt2 )
265 # else
Appendix E. ECCo Big Number library 69

266 # d e f i n e EW_SET_14 ( Rt , Rt2 , W)


267 # endif
268 # i f ECC_WORD_WIDTH > 960
269 # d e f i n e EW_SET_15 ( Rt , Rt2 , W) e w _ s e t _ o f f s (W, 1 5 , Rt , Rt2 )
270 # else
271 # d e f i n e EW_SET_15 ( Rt , Rt2 , W)
272 # endif
273
274 # e n d i f // ECC_WORD_H

L ISTING E.1: Header file for big number


implementation of an ECCo word.

1 # i n c l u d e " ecc_word . h "


2
3 # i n c l u d e < e e _ p r i n t f . h>
4 # i n c l u d e < s t d b o o l . h>
5
6 # include " ecc . h"
7
8 ecc_word_t *
9 e w _ i n i t ( ecc_word_t * w)
10 {
11 f o r ( i n t i = 0 ; i < EW_LENGTH; i ++ )
12 w−>word [ i ] = 0 ;
13 w−>i s _ z e r o = true ;
14 w−>i s _ n e g a t i v e = f a l s e ;
15 r e t u r n w;
16 }
17
18 ecc_word_t *
19 ew_copy ( c o n s t ecc_word_t * r e s t r i c t s r c , ecc_word_t * r e s t r i c t d s t )
20 {
21 i f ( ! s r c −>i s _ z e r o )
22 f o r ( i n t i = 0 ; i < EW_LENGTH; i ++ )
23 dst −>word [ i ] = s r c −>word [ i ] ;
24 else
25 f o r ( i n t i = 0 ; i < EW_LENGTH; i ++ )
26 dst −>word [ i ] = 0 ;
27
28 dst −>i s _ z e r o = s r c −>i s _ z e r o ;
29 dst −>i s _ n e g a t i v e = s r c −>i s _ n e g a t i v e ;
30 return dst ;
31 }
32
33
34 /* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
35 * *
36 * Content h a n d l e r s *
37 * *
38 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
39
40 ecc_word_t *
41 ew_zero ( ecc_word_t * w)
42 {
43 i f ( !w−>i s _ z e r o ) {
44 f o r ( i n t i = 0 ; i < EW_LENGTH; i ++ )
45 w−>word [ i ] = 0 ;
46 w−>i s _ z e r o = 1 ;
47 }
48 r e t u r n w;
49 }
50
51 ecc_word_t *
52 e w _ s e t _ i n t ( ecc_word_t * w, i n t v a l )
53 {
54 ew_zero (w) ;
55 w−>word [ 0 ] = v a l ;
70 Appendix E. ECCo Big Number library

56 w−>i s _ z e r o = f a l s e ;
57 r e t u r n w;
58 }
59
60 ecc_word_t *
61 e w _ s e t _ s t r ( ecc_word_t * w, c o n s t char s t r [ ] )
62 {
63 int s h i f t , tmp ;
64 i n t * num = w−>word ;
65 c o n s t char * c ;
66
67 f o r ( c = s t r ; * c ! = ’ \0 ’ ; c++ )
68 ;
69
70 / * Check s i g n * /
71 i f ( * s t r == ’− ’ ) {
72 w−>i s _ n e g a t i v e = t r u e ;
73 s t r ++;
74 }
75 else
76 w−>i s _ n e g a t i v e = f a l s e ;
77
78 / * S a n i t y checks * /
79 i f ( * s t r ++ ! = ’ 0 ’ ) {
80 MSG( ( " e w _ s e t _ s t r : badly f o r m a t t e d s t r i n g , must s t a r t with ’ 0 x ’ or
’ − 0x ’\n " ) ) ;
81 r e t u r n NULL;
82 }
83 i f ( * s t r != ’ x ’ ) {
84 MSG( ( " e w _ s e t _ s t r : badly f o r m a t t e d s t r i n g , must s t a r t with ’ 0 x ’ or
’ − 0x ’\n " ) ) ;
85 r e t u r n NULL;
86 }
87
88 / * S e t word t o zero i f non−zero * /
89 i f ( !w−>i s _ z e r o ) {
90 do
91 * num = 0 ;
92 while ( ++num ! = w−>word+EW_LENGTH ) ;
93 w−>i s _ z e r o = t r u e ;
94 num = w−>word ;
95 }
96
97 do {
98 tmp = 0 ;
99 f o r ( s h i f t = 0 ; s h i f t < 32 && −−c ! = s t r ; s h i f t += 4 ) {
100 s w i t ch ( * c ) {
101 case ’ f ’ : case ’F ’ :
102 tmp ^= 0 x f << s h i f t ;
103 break ;
104 case ’ e ’ : case ’E ’ :
105 tmp ^= 0 xe << s h i f t ;
106 break ;
107 c a s e ’ d ’ : c a s e ’D ’ :
108 tmp ^= 0xd << s h i f t ;
109 break ;
110 c a s e ’ c ’ : c a s e ’C ’ :
111 tmp ^= 0 xc << s h i f t ;
112 break ;
113 case ’b ’ : case ’B ’ :
114 tmp ^= 0xb << s h i f t ;
115 break ;
116 c a s e ’ a ’ : c a s e ’A ’ :
117 tmp ^= 0 xa << s h i f t ;
118 break ;
119 default :
120 i f ( * c < ’ 0 ’ && * c > ’ 9 ’ ) {
121 MSG( ( " e w _ s e t _ s t r : i n v a l i d c h a r a c t e r i n s t r i n g : %c " , * c )
);
122 r e t u r n NULL;
123 }
124 tmp ^= ( * c − ’ 0 ’ ) << s h i f t ;
Appendix E. ECCo Big Number library 71

125 }
126 }
127 i f ( tmp && w−>i s _ z e r o )
128 w−>i s _ z e r o = f a l s e ;
129 * num = tmp ;
130 } while ( c ! = s t r && ++num ! = w−>word+EW_LENGTH ) ;
131
132 r e t u r n w;
133 }
134
135 ecc_word_t *
136 e w _ s e t _ o f f s ( ecc_word_t * w, i n t o f f s , i n t r1 , i n t r 2 )
137 {
138 i f ( w−>i s _ z e r o )
139 i f ( r 1 || r 2 )
140 w−>i s _ z e r o = f a l s e ;
141 o f f s *= 2 ;
142 w−>word [ o f f s ] = r1 ;
143 w−>word [ o f f s +1] = r 2 ;
144 r e t u r n w;
145 }
146
147 char *
148 e w _ t o _ s t r ( c o n s t ecc_word_t * w, char s [ ] , i n t sz )
149 {
150 int i = 0, shift ;
151 const i n t * num = w−>word+EW_LENGTH;
152 unsigned char tmp ;
153
154 i f ( sz < 4 ) {
155 MSG( ( " e w _ t o _ s t r : too s m a l l s t r i n g : sz = %d\n " , sz ) ) ;
156 r e t u r n NULL;
157 }
158 i f ( w−>i s _ n e g a t i v e )
159 s [ i ++] = ’− ’ ;
160 s [ i ++] = ’ 0 ’ ;
161 s [ i ++] = ’ x ’ ;
162
163 while ( i < sz && num−− ! = w−>word )
164 f o r ( s h i f t = 2 8 ; s h i f t >= 0 && i < sz ; s h i f t −= 4 , i ++ )
165 s w i tc h ( ( tmp = ( * num >> s h i f t ) & 0 x f ) ) {
166 case 0 xf :
167 s[i] = ’f ’;
168 break ;
169 c a s e 0 xe :
170 s [ i ] = ’e ’ ;
171 break ;
172 c a s e 0xd :
173 s [ i ] = ’d ’ ;
174 break ;
175 c a s e 0 xc :
176 s[ i ] = ’c ’ ;
177 break ;
178 c a s e 0xb :
179 s [ i ] = ’b ’ ;
180 break ;
181 c a s e 0 xa :
182 s [ i ] = ’a ’ ;
183 break ;
184 default :
185 s [ i ] = ( tmp > 9 ) ? ’X ’ : tmp + ’ 0 ’ ;
186 }
187
188 i f ( i < sz )
189 s [ i ] = ’ \0 ’ ;
190 else {
191 MSG( ( " e w _ t o _ s t r : too s m a l l s t r i n g : sz = %d\n " , sz ) ) ;
192 r e t u r n NULL;
193 }
194 return s ;
195 }
196
72 Appendix E. ECCo Big Number library

197
198 /* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
199 * *
200 * Comparison *
201 * *
202 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
203
204 bool
205 ew_eq ( c o n s t ecc_word_t * l h s , c o n s t ecc_word_t * r hs )
206 {
207 c o n s t i n t * lw = l h s −>word+EW_LENGTH;
208 c o n s t i n t * rw = rhs −>word+EW_LENGTH;
209
210 i f ( l h s −>i s _ z e r o && rhs −>i s _ z e r o )
211 return true ;
212 while ( *−−lw == *−−rw )
213 i f ( lw == l h s −>word )
214 return true ;
215 return f a l s e ;
216 }
217
218 /* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
219 * *
220 * Coprocessor load *
221 * *
222 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
223
224 # d e f i n e _EW_LOAD_CR(N) void ew_load_cr ##N( c o n s t ecc_word_t * w) { \
225 v o l a t i l e r e g i s t e r i n t r1 , r 2 ; \
226 /* O f f s e t 0 */ \
227 EW_GET_0 ( r1 , r2 , w) ; \
228 ECC_LOAD_0( r1 , r2 , #N) ; \
229 /* O f f s e t 1 */ \
230 EW_GET_1 ( r1 , r2 , w) ; \
231 ECC_LOAD_1( r1 , r2 , #N) ; \
232 /* O f f s e t 2 */ \
233 EW_GET_2 ( r1 , r2 , w) ; \
234 ECC_LOAD_2( r1 , r2 , #N) ; \
235 /* O f f s e t 3 */ \
236 EW_GET_3 ( r1 , r2 , w) ; \
237 ECC_LOAD_3( r1 , r2 , #N) ; \
238 /* O f f s e t 4 */ \
239 EW_GET_4 ( r1 , r2 , w) ; \
240 ECC_LOAD_4( r1 , r2 , #N) ; \
241 /* O f f s e t 5 */ \
242 EW_GET_5 ( r1 , r2 , w) ; \
243 ECC_LOAD_5( r1 , r2 , #N) ; \
244 /* O f f s e t 6 */ \
245 EW_GET_6 ( r1 , r2 , w) ; \
246 ECC_LOAD_6( r1 , r2 , #N) ; \
247 /* O f f s e t 7 */ \
248 EW_GET_7 ( r1 , r2 , w) ; \
249 ECC_LOAD_7( r1 , r2 , #N) ; \
250 /* O f f s e t 8 */ \
251 EW_GET_8 ( r1 , r2 , w) ; \
252 ECC_LOAD_8( r1 , r2 , #N) ; \
253 /* O f f s e t 9 */ \
254 EW_GET_9 ( r1 , r2 , w) ; \
255 ECC_LOAD_9( r1 , r2 , #N) ; \
256 /* O f f s e t a */ \
257 EW_GET_10 ( r1 , r2 , w) ; \
258 ECC_LOAD_10 ( r1 , r2 , #N) ; \
259 /* O f f s e t b */ \
260 EW_GET_11 ( r1 , r2 , w) ; \
261 ECC_LOAD_11 ( r1 , r2 , #N) ; \
262 /* O f f s e t c */ \
263 EW_GET_12 ( r1 , r2 , w) ; \
264 ECC_LOAD_12 ( r1 , r2 , #N) ; \
265 /* O f f s e t d */ \
266 EW_GET_13 ( r1 , r2 , w) ; \
267 ECC_LOAD_13 ( r1 , r2 , #N) ; \
268 /* O f f s e t e */ \
Appendix E. ECCo Big Number library 73

269 EW_GET_14 ( r1 , r2 , w) ; \
270 ECC_LOAD_14 ( r1 , r2 , #N) ; \
271 /* O f f s e t f */ \
272 EW_GET_15 ( r1 , r2 , w) ; \
273 ECC_LOAD_15 ( r1 , r2 , #N) ; \
274 \
275 i f ( w−>i s _ n e g a t i v e ) / * S e t signed b i t i f n e g a t i v e * / \
276 ECC_NEG( #N, #N) ; \
277 e l s e / * E l s e make s ur e i t ’ s unset * / \
278 ECC_USB( #N) ; \
279 }
280
281 _EW_LOAD_CR( 0 )
282 _EW_LOAD_CR( 1 )
283 _EW_LOAD_CR( 2 )
284 _EW_LOAD_CR( 3 )
285 _EW_LOAD_CR( 4 )
286 _EW_LOAD_CR( 5 )
287 _EW_LOAD_CR( 6 )
288 _EW_LOAD_CR( 7 )
289 _EW_LOAD_CR( 8 )
290 _EW_LOAD_CR( 9 )
291 _EW_LOAD_CR( 1 0 )
292 _EW_LOAD_CR( 1 1 )
293 _EW_LOAD_CR( 1 2 )
294 _EW_LOAD_CR( 1 3 )
295 _EW_LOAD_CR( 1 4 )
296
297
298 /* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
299 * *
300 * Coprocessor s t o r e *
301 * *
302 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
303
304 # d e f i n e _EW_STORE_CR(N) void e w _ s t o r e _ c r ##N( ecc_word_t * w) { \
305 r e g i s t e r i n t r1 , r 2 ; \
306 unsigned mask ; \
307 \
308 / * Check s i g n * / \
309 ECC_STORE_0 ( r1 , r2 , ECC_STATUS_REG ) ; \
310 mask = 1 << ( 0 x10 + N) ; \
311 i f ( r 1 & mask ) { \
312 w−>i s _ n e g a t i v e = t r u e ; \
313 ECC_NEG( #N, #N) ; \
314 } \
315 else \
316 w−>i s _ n e g a t i v e = f a l s e ; \
317 \
318 w−>i s _ z e r o = t r u e ; \
319 /* O f f s e t 0 */ \
320 ECC_STORE_0 ( r1 , r2 , #N) ; \
321 EW_SET_0 ( r1 , r2 , w) ; \
322 /* O f f s e t 1 */ \
323 ECC_STORE_1 ( r1 , r2 , #N) ; \
324 EW_SET_1 ( r1 , r2 , w) ; \
325 /* O f f s e t 2 */ \
326 ECC_STORE_2 ( r1 , r2 , #N) ; \
327 EW_SET_2 ( r1 , r2 , w) ; \
328 /* O f f s e t 3 */ \
329 ECC_STORE_3 ( r1 , r2 , #N) ; \
330 EW_SET_3 ( r1 , r2 , w) ; \
331 /* O f f s e t 4 */ \
332 ECC_STORE_4 ( r1 , r2 , #N) ; \
333 EW_SET_4 ( r1 , r2 , w) ; \
334 /* O f f s e t 5 */ \
335 ECC_STORE_5 ( r1 , r2 , #N) ; \
336 EW_SET_5 ( r1 , r2 , w) ; \
337 /* O f f s e t 6 */ \
338 ECC_STORE_6 ( r1 , r2 , #N) ; \
339 EW_SET_6 ( r1 , r2 , w) ; \
340 /* O f f s e t 7 */ \
74 Appendix E. ECCo Big Number library

341 ECC_STORE_7 ( r1 , r2 , #N) ; \


342 EW_SET_7 ( r1 , r2 , w) ; \
343 /* O f f s e t 8 */ \
344 ECC_STORE_8 ( r1 , r2 , #N) ; \
345 EW_SET_8 ( r1 , r2 , w) ; \
346 /* O f f s e t 9 */ \
347 ECC_STORE_9 ( r1 , r2 , #N) ; \
348 EW_SET_9 ( r1 , r2 , w) ; \
349 / * O f f s e t 10 * / \
350 ECC_STORE_10 ( r1 , r2 , #N) ; \
351 EW_SET_10 ( r1 , r2 , w) ; \
352 / * O f f s e t 11 * / \
353 ECC_STORE_11 ( r1 , r2 , #N) ; \
354 EW_SET_11 ( r1 , r2 , w) ; \
355 / * O f f s e t 12 * / \
356 ECC_STORE_12 ( r1 , r2 , #N) ; \
357 EW_SET_12 ( r1 , r2 , w) ; \
358 / * O f f s e t 13 * / \
359 ECC_STORE_13 ( r1 , r2 , #N) ; \
360 EW_SET_13 ( r1 , r2 , w) ; \
361 / * O f f s e t 14 * / \
362 ECC_STORE_14 ( r1 , r2 , #N) ; \
363 EW_SET_14 ( r1 , r2 , w) ; \
364 / * O f f s e t 15 * / \
365 ECC_STORE_15 ( r1 , r2 , #N) ; \
366 EW_SET_15 ( r1 , r2 , w) ; \
367 \
368 i f ( w−>i s _ n e g a t i v e ) \
369 ECC_NEG( #N, #N) ; \
370 }
371
372 _EW_STORE_CR ( 0 )
373 _EW_STORE_CR ( 1 )
374 _EW_STORE_CR ( 2 )
375 _EW_STORE_CR ( 3 )
376 _EW_STORE_CR ( 4 )
377 _EW_STORE_CR ( 5 )
378 _EW_STORE_CR ( 6 )
379 _EW_STORE_CR ( 7 )
380 _EW_STORE_CR ( 8 )
381 _EW_STORE_CR ( 9 )
382 _EW_STORE_CR ( 1 0 )
383 _EW_STORE_CR ( 1 1 )
384 _EW_STORE_CR ( 1 2 )
385 _EW_STORE_CR ( 1 3 )
386 _EW_STORE_CR ( 1 4 )
387
388 / * S t o r e word from CP r e g i s t e r 1 5 . Does not c a r e about s i g n s i n c e i t ’ s
389 t h e s t a t u s r e g i s t e r */
390 void
391 e w _ s t o r e _ c r 1 5 ( ecc_word_t * w)
392 {
393 r e g i s t e r i n t r1 , r 2 ;
394 w−>i s _ z e r o = t r u e ;
395 /* O f f s e t 0 */
396 ECC_STORE_0 ( r1 , r2 , " 15 " ) ;
397 EW_SET_0 ( r1 , r2 , w) ;
398 /* O f f s e t 1 */
399 ECC_STORE_1 ( r1 , r2 , " 15 " ) ;
400 EW_SET_1 ( r1 , r2 , w) ;
401 /* O f f s e t 2 */
402 ECC_STORE_2 ( r1 , r2 , " 15 " ) ;
403 EW_SET_2 ( r1 , r2 , w) ;
404 /* O f f s e t 3 */
405 ECC_STORE_3 ( r1 , r2 , " 15 " ) ;
406 EW_SET_3 ( r1 , r2 , w) ;
407 /* O f f s e t 4 */
408 ECC_STORE_4 ( r1 , r2 , " 15 " ) ;
409 EW_SET_4 ( r1 , r2 , w) ;
410 /* O f f s e t 5 */
411 ECC_STORE_5 ( r1 , r2 , " 15 " ) ;
412 EW_SET_5 ( r1 , r2 , w) ;
Appendix E. ECCo Big Number library 75

413 /* O f f s e t 6 */
414 ECC_STORE_6 ( r1 , r2 , " 15 " ) ;
415 EW_SET_6 ( r1 , r2 , w) ;
416 /* O f f s e t 7 */
417 ECC_STORE_7 ( r1 , r2 , " 15 " ) ;
418 EW_SET_7 ( r1 , r2 , w) ;
419 /* O f f s e t 8 */
420 ECC_STORE_8 ( r1 , r2 , " 15 " ) ;
421 EW_SET_8 ( r1 , r2 , w) ;
422 /* O f f s e t 9 */
423 ECC_STORE_9 ( r1 , r2 , " 15 " ) ;
424 EW_SET_9 ( r1 , r2 , w) ;
425 / * O f f s e t 10 * /
426 ECC_STORE_10 ( r1 , r2 , " 15 " ) ;
427 EW_SET_10 ( r1 , r2 , w) ;
428 / * O f f s e t 11 * /
429 ECC_STORE_11 ( r1 , r2 , " 15 " ) ;
430 EW_SET_11 ( r1 , r2 , w) ;
431 / * O f f s e t 12 * /
432 ECC_STORE_12 ( r1 , r2 , " 15 " ) ;
433 EW_SET_12 ( r1 , r2 , w) ;
434 / * O f f s e t 13 * /
435 ECC_STORE_13 ( r1 , r2 , " 15 " ) ;
436 EW_SET_13 ( r1 , r2 , w) ;
437 / * O f f s e t 14 * /
438 ECC_STORE_14 ( r1 , r2 , " 15 " ) ;
439 EW_SET_14 ( r1 , r2 , w) ;
440 / * O f f s e t 15 * /
441 ECC_STORE_15 ( r1 , r2 , " 15 " ) ;
442 EW_SET_15 ( r1 , r2 , w) ;
443 }

L ISTING E.2: Source file for big number


implementation of an ECCo word.
77

Appendix F

Benchmark & Test program

1
2 /* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
3 * *
4 * C o n t r o l macros *
5 * *
6 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
7
8 // # d e f i n e ONLY_HELLOW / * Only run a simple h e l l o world * /
9
10 /* Testing c o n t r o l macros * /
11 // # define TEST_ARI / * T e s t a r i t h m e t i c module * /
12 // # define TEST_ARI_NOADD / * Skip a d d i t i o n during a r i t h m e t i c t e s t i n g * /
13 // # define TEST_ARI_NOMOD / * Skip m u l t i p l i c a t i o n during a r i t h m e t i c t e s t i n g
*/
14 // # d e f i n e TEST_ARI_NODIV / * Skip d i v i s i o n during a r i t h m e t i c t e s t i n g * /
15 // # d e f i n e TEST_ARI_NONEG / * Skip n e g a t i o n during a r i t h m e t i c t e s t i n g * /
16 // # d e f i n e TEST_REGS / * T e s t r e g i s t e r bank reading/ w r i t i n g * /
17
18 / * Benchmarking c o n t r o l macros * /
19 # d e f i n e BENCHMARK / * D i s a b l e anything but t h e
benchmarking code * /
20 // # d e f i n e BENCHMARK_ECC_ADDITION / * Perform a d d i t i o n s with ECCo
with minimal e x t r a code * /
21 // # d e f i n e BENCHMARK_ANSSI_ADDITION / * Perform a d d i t i o n s with ANSSI
l i b with minimal e x t r a code * /
22 // # d e f i n e BENCHMARK_ECC_MULTIPLICATION / * Perform m u l t i p l i c a t i o n with
ECCo with minimal e x t r a code * /
23 # d e f i n e BENCHMARK_ANSSI_MULTIPLICATION / * Perform m u l t i p l i c a t i o n with
ANSSI l i b with minimal e x t r a code * /
24 // # d e f i n e BENCHMARK_ITERATIONS 1 / * Number o f i t e r a t i o n s during
benchmarking * /
25 // # d e f i n e BENCHMARK_ITERATIONS 10 / * Number o f i t e r a t i o n s during
benchmarking * /
26 # d e f i n e BENCHMARK_ITERATIONS 100 / * Number o f i t e r a t i o n s during
benchmarking * /
27
28 / * ANSSI l i b e c c c o n t r o l macros * /
29 # d e f i n e ANSSI_LIBECC
30
31 / * S a n i t y checks o f macros * /
32 # i f ( d e f i n e d (BENCHMARK_ECC_ADDITION) && ( d e f i n e d (
BENCHMARK_ANSSI_ADDITION) || d e f i n e d (BENCHMARK_ECC_MULTIPLICATION) ||
d e f i n e d (BENCHMARK_ANSSI_MULTIPLICATION) ) ) || \
33 ( d e f i n e d (BENCHMARK_ANSSI_ADDITION) && ( d e f i n e d (
BENCHMARK_ECC_ADDITION) || d e f i n e d (BENCHMARK_ECC_MULTIPLICATION) ||
d e f i n e d (BENCHMARK_ANSSI_MULTIPLICATION) ) ) || \
34 ( d e f i n e d (BENCHMARK_ECC_MULTIPLICATION) && ( d e f i n e d (
BENCHMARK_ANSSI_ADDITION) || d e f i n e d (BENCHMARK_ECC_ADDITION) ||
d e f i n e d (BENCHMARK_ANSSI_MULTIPLICATION) ) ) || \
35 ( d e f i n e d (BENCHMARK_ANSSI_MULTIPLICATION) && ( d e f i n e d (
BENCHMARK_ANSSI_ADDITION) || d e f i n e d (BENCHMARK_ECC_MULTIPLICATION) ||
d e f i n e d (BENCHMARK_ECC_ADDITION) ) )
36 # e r r o r ( " Only one BENCHMARK_ macro can be d e f i n e d a t a time " )
37 # endif
78 Appendix F. Benchmark & Test program

38
39 # i f ( d e f i n e d (BENCHMARK_ANSSI_ADDITION) || d e f i n e d (
BENCHMARK_ANSSI_MULTIPLICATION) ) && ! d e f i n e d ( ANSSI_LIBECC )
40 # e r r o r ( " ANSSI_LIBECC must be d e f i n e d f o r ANSSI benchmarks " )
41 # endif
42
43
44 /* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
45 * *
46 * Includes *
47 * *
48 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
49
50 / * ARM CM33 * /
51 # i n c l u d e <arm_cmse . h>
52 # i n c l u d e <cm4ss . h>
53 # i n c l u d e < e e _ p r i n t f . h>
54 # i n c l u d e <cm33/ s e c u r e / t r u s t z o n e _ u t i l . h>
55
56 /* s t d l i b */
57 # i n c l u d e < s t d b o o l . h>
58 # i n c l u d e < s t r i n g . h>
59
60 / * Coprocessor * /
61 # include " ecc . h"
62 # i n c l u d e " ecc_word . h "
63 # include " division_data . h"
64 # i n c l u d e " modular_addition_data . h "
65 # include " modular_multiplication_data . h"
66
67 / * ANSSI l i b e c c * /
68 # i f d e f ANSSI_LIBECC
69 # include " l i b a r i t h . h"
70 # endif
71
72
73 /* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
74 * *
75 * G l o b a l s /Macros *
76 * *
77 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
78
79 / * TZ_START_NS : S t a r t address o f non−s e c u r e a p p l i c a t i o n * /
80 # i f n d e f TZ_START_NS
81 # d e f i n e TZ_START_NS ( 0 x80000U )
82 # endif
83
84 # d e f i n e CPACR_ADDR ( ( unsigned * ) 0xE000ED88U )
85
86
87 /* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
88 * *
89 * T e s t setup *
90 * *
91 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
92
93 /* A r i t h m e t i c t e s t f u n c t i o n s */
94 bool t e s t _ a r i _ m u l t i p l i c a t i o n ( char ( * ) [DATAMUL16_NUM_HEADERS] [
DATAMUL16_NUM_CHARS+ 1 ] ) ;
95 bool t e s t _ a r i _ a d d i t i o n ( char ( * ) [DATAADD16_NUM_HEADERS] [DATAADD16_NUM_CHARS
+1]) ;
96 bool t e s t _ a r i _ d i v i s i o n ( char ( * ) [DATADIV16_NUM_HEADERS ] [ DATADIV16_NUM_CHARS
+1]) ;
97
98 / * ANSSI l i b e c c h e l p e r s * /
99 # i f d e f ANSSI_LIBECC
100 s t a t i c void nn_import_from_hexbuf ( nn_t out_nn , c o n s t char * hbuf , u32
hbuflen ) ;
101 # endif
102
103 / * Benchmark value s t r i n g s * /
Appendix F. Benchmark & Test program 79

104 char add_op1_str [ ] = " 0


x63feb1ab67e6b315a2dea87e6547ba17e0daa6009366d19f14dbb427faee50ae " ;
105 char add_op1_buf [ ] = { 0 x63 , 0 xfe , 0xb1 , 0xab , 0 x67 , 0 xe6 , 0xb3 , 0 x15 , 0 xa2 ,
0xde , 0 xa8 , 0 x7e , 0 x65 , 0 x47 , 0xba , 0 x17 , 0 xe0 , 0xda , 0 xa6 , 0 x00 , 0 x93
, 0 x66 , 0xd1 , 0 x9f , 0 x14 , 0xdb , 0xb4 , 0 x27 , 0 xfa , 0 xee , 0 x50 , 0 xae } ;
106 char add_op2_str [ ] = " 0
x2f08337b7ae05e16b4fada1ebbb4c7bb56009e5c141dc5b487db427faee50ae0 " ;
107 char add_op2_buf [ ] = { 0 x2f , 0 x08 , 0 x33 , 0x7b , 0 x7a , 0 xe0 , 0 x5e , 0 x16 , 0xb4 ,
0 xfa , 0xda , 0 x1e , 0xbb , 0xb4 , 0 xc7 , 0xbb , 0 x56 , 0 x00 , 0 x9e , 0 x5c , 0 x14
, 0x1d , 0 xc5 , 0xb4 , 0 x87 , 0xdb , 0 x42 , 0 x7f , 0 xae , 0 xe5 , 0 x0a , 0 xe0 } ;
108 char add_mod_str [ ] = " 0
xa41a41a12a799548211c410c65d8133afde34d28bdd542e4b680cf2899c8a8c4 " ;
109 char add_mod_buf [ ] = { 0 xa4 , 0 x1a , 0 x41 , 0 xa1 , 0 x2a , 0 x79 , 0 x95 , 0 x48 , 0 x21 ,
0 x1c , 0 x41 , 0 x0c , 0 x65 , 0xd8 , 0 x13 , 0 x3a , 0 xfd , 0 xe3 , 0x4d , 0 x28 , 0xbd
, 0xd5 , 0 x42 , 0 xe4 , 0xb6 , 0 x80 , 0 x c f , 0 x28 , 0 x99 , 0 xc8 , 0 xa8 , 0 xc4 } ;
110 char mul_op1_str [ ] = " 0
x63feb1ab67e6b315a2dea87e6547ba17e0daa6009366d19f14dbb427faee50ae " ;
111 char mul_op1_buf [ ] = { 0 x63 , 0 xfe , 0xb1 , 0xab , 0 x67 , 0 xe6 , 0xb3 , 0 x15 , 0 xa2 ,
0xde , 0 xa8 , 0 x7e , 0 x65 , 0 x47 , 0xba , 0 x17 , 0 xe0 , 0xda , 0 xa6 , 0 x00 , 0 x93
, 0 x66 , 0xd1 , 0 x9f , 0 x14 , 0xdb , 0xb4 , 0 x27 , 0 xfa , 0 xee , 0 x50 , 0 xae } ;
112 char mul_op2_str [ ] = " 0
x02f08337b7ae05e16b4fada1ebbb4c7bb56009e5c141dc5b487db427faee50ae " ;
113 char mul_op2_buf [ ] = { 0 x02 , 0 xf0 , 0 x83 , 0 x37 , 0xb7 , 0 xae , 0 x05 , 0 xe1 , 0x6b ,
0 x4f , 0xad , 0 xa1 , 0 xeb , 0xbb , 0 x4c , 0x7b , 0xb5 , 0 x60 , 0 x09 , 0 xe5 , 0 xc1
, 0 x41 , 0xdc , 0x5b , 0 x48 , 0x7d , 0xb4 , 0 x27 , 0 xfa , 0 xee , 0 x50 , 0 xae } ;
114 char mul_mod_str [ ] = " 0
xa41a41a12a799548211c410c65d8133afde34d28bdd542e4b680cf2899c8a8c4 " ;
115 char mul_mod_buf [ ] = { 0 xa4 , 0 x1a , 0 x41 , 0 xa1 , 0 x2a , 0 x79 , 0 x95 , 0 x48 , 0 x21 ,
0 x1c , 0 x41 , 0 x0c , 0 x65 , 0xd8 , 0 x13 , 0 x3a , 0 xfd , 0 xe3 , 0x4d , 0 x28 , 0xbd
, 0xd5 , 0 x42 , 0 xe4 , 0xb6 , 0 x80 , 0 x c f , 0 x28 , 0 x99 , 0 xc8 , 0 xa8 , 0 xc4 } ;
116
117 # d e f i n e BM_STR_LEN 67
118 # d e f i n e BM_BUF_LEN 32
119 # d e f i n e BM_NN_LEN ( ( BM_STR_LEN / 2 ) / WORD_BYTES)
120
121
122 /* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
123 * *
124 * Secure main *
125 * *
126 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
127
128 int
129 main ( void )
130 {
131 # i f n d e f BENCHMARK
132 MSG( ( "C−code : Secure firmware b o o t i n g\n " ) ) ;
133 MSG( ( " >>>>>>>> Running ECC firmware t e s t . \ n " ) ) ;
134 # endif
135
136 / * Enable c o p r o c e s s o r * /
137 *CPACR_ADDR ^= 0 x01 ;
138
139 # i f d e f ONLY_HELLOW
140
141 MSG( ( "HELLO EC WORLD! \ n " ) ) ;
142
143 # else
144
145 /* * * * * * * * * * * * * * * * * * * * * * * * * * * *
146 * T e s t a r i t h m e t i c module *
147 * * * * * * * * * * * * * * * * * * * * * * * * * * * */
148
149 # i f d e f TEST_ARI
150 / * Modular a d d i t i o n * /
151 # i f n d e f TEST_ARI_NOADD
152 MSG( ( " >>>> T e s t i n g a d d i t i o n \n " ) ) ;
153 i f ( t e s t _ a r i _ a d d i t i o n ( dataAdd16 ) )
154 MSG( ( " S u c c e s s ! \ n " ) ) ;
155 # endif
156 / * Modular m u l t i p l i c a t i o n * /
157 # i f n d e f TEST_ARI_NOMUL
80 Appendix F. Benchmark & Test program

158 MSG( ( " >>>> T e s t i n g m u l t i p l i c a t i o n \n " ) ) ;


159 i f ( t e s t _ a r i _ m u l t i p l i c a t i o n ( dataMul16 ) )
160 MSG( ( " S u c c e s s ! \ n " ) ) ;
161 # endif
162 /* D i v i s i o n */
163 # i f n d e f TEST_ARI_NODIV
164 MSG( ( " >>>> T e s t i n g d i v i s i o n \n " ) ) ;
165 i f ( t e s t _ a r i _ d i v i s i o n ( dataDiv16 ) )
166 MSG( ( " S u c c e s s ! \ n " ) ) ;
167 # endif
168 # endif
169
170 /* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
171 * Benchmark modular a d d i t i o n w/CP *
172 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
173
174 # i f d e f BENCHMARK_ECC_ADDITION
175 ecc_word_t op1 , op2 , mod ;
176 / * S e t parameter v a l u e s * /
177 e w _ s e t _ s t r (&op1 , add_op1_str ) ;
178 e w _ s e t _ s t r (&op2 , add_op2_str ) ;
179 e w _ s e t _ s t r (&mod, add_mod_str ) ;
180 / * Load parameters t o CP * /
181 ew_load_cr0 (&op1 ) ;
182 ew_load_cr1 (&op2 ) ;
183 EW_LOAD_MOD(&mod) ;
184 / * Perform N number o f a d d i t i o n s * /
185 f o r ( i n t i = 0 ; i < BENCHMARK_ITERATIONS ; ++ i )
186 ECC_ADD( " 0 " , " 1 " , " 0 " ) ;
187 # endif
188
189 /* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
190 * Benchmark modular a d d i t i o n i n s o f t w a r e *
191 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
192
193 # i f d e f BENCHMARK_ANSSI_ADDITION
194 nn nn_op1 , nn_op2 , nn_mod ;
195 fp fp_op1 , fp_op2 ;
196 f p _ c t x f p _ c t x ; /* F i n i t e f i e l d c o n t e x t − s i z e o f f i e l d e t c . */
197 / * I n i t i a l i z e and s e t parameter v a l u e s * /
198 n n _ i n i t _ f r o m _ b u f (&nn_op1 , add_op1_buf , BM_BUF_LEN) ;
199 n n _ i n i t _ f r o m _ b u f (&nn_op2 , add_op2_buf , BM_BUF_LEN) ;
200 n n _ i n i t _ f r o m _ b u f (&nn_mod , add_mod_buf , BM_BUF_LEN) ;
201 f p _ c t x _ i n i t _ f r o m _ p (& f p _ c t x , &nn_mod ) ;
202 f p _ i n i t (&fp_op1 , &f p _ c t x ) ;
203 f p _ i n i t (&fp_op2 , &f p _ c t x ) ;
204 fp_op1 . f p _ v a l = nn_op1 ;
205 fp_op2 . f p _ v a l = nn_op2 ;
206 / * Perform N number o f a d d i t i o n s * /
207 f o r ( i n t i = 0 ; i < BENCHMARK_ITERATIONS ; ++ i )
208 fp_add(&fp_op1 , &fp_op1 , &fp_op2 ) ;
209 # endif
210
211 /* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
212 * Benchmark modular m u l t i p l i c a t i o n w/CP *
213 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
214
215 # i f d e f BENCHMARK_ECC_MULTIPLICATION
216 ecc_word_t op1 , op2 , mod ;
217 / * S e t parameter v a l u e s * /
218 e w _ s e t _ s t r (&op1 , mul_op1_str ) ;
219 e w _ s e t _ s t r (&op2 , mul_op2_str ) ;
220 e w _ s e t _ s t r (&mod, mul_mod_str ) ;
221 / * Load parameters t o CP * /
222 ew_load_cr0 (&op1 ) ;
223 ew_load_cr1 (&op2 ) ;
224 EW_LOAD_MOD(&mod) ;
225 / * Perform N number o f a d d i t i o n s * /
226 f o r ( i n t i = 0 ; i < BENCHMARK_ITERATIONS ; ++ i )
227 ECC_MUL( " 0 " , " 1 " , " 0 " ) ;
228 # endif
229
Appendix F. Benchmark & Test program 81

230 /* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
231 * Benchmark modular m u l t i p l i c a t i o n i n s o f t w a r e *
232 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
233
234 # i f d e f BENCHMARK_ANSSI_MULTIPLICATION
235 nn nn_op1 , nn_op2 , nn_mod ;
236 fp fp_op1 , fp_op2 ;
237 f p _ c t x f p _ c t x ; /* F i n i t e f i e l d c o n t e x t − s i z e o f f i e l d e t c . */
238 / * I n i t i a l i z e and s e t parameter v a l u e s * /
239 n n _ i n i t _ f r o m _ b u f (&nn_op1 , mul_op1_buf , BM_BUF_LEN) ;
240 n n _ i n i t _ f r o m _ b u f (&nn_op2 , mul_op2_buf , BM_BUF_LEN) ;
241 n n _ i n i t _ f r o m _ b u f (&nn_mod , mul_mod_buf , BM_BUF_LEN) ;
242 f p _ c t x _ i n i t _ f r o m _ p (& f p _ c t x , &nn_mod ) ;
243 f p _ i n i t (&fp_op1 , &f p _ c t x ) ;
244 f p _ i n i t (&fp_op2 , &f p _ c t x ) ;
245 fp_op1 . f p _ v a l = nn_op1 ;
246 fp_op2 . f p _ v a l = nn_op2 ;
247 / * Perform N number o f a d d i t i o n s * /
248 f o r ( i n t i = 0 ; i < BENCHMARK_ITERATIONS ; ++ i )
249 fp_mul(&fp_op1 , &fp_op1 , &fp_op2 ) ;
250 # endif
251
252 # endif
253
254 # i f n d e f BENCHMARK
255 MSG( ( " >>>>>>>> F i n i s h e d ECC firmware t e s t . \ n\n " ) ) ;
256 # endif
257
258 f i n i s h _ t e s t ( TEST_PASS ) ;
259 r e t u r n 0 ; // This l i n e w i l l never e x e c u t e as boot_nonsec_program never
returns
260 }
261
262
263 /* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
264 * *
265 * Test functions *
266 * *
267 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
268
269 /* * * * * * * * * * * * * * * * * * * * * * *
270 * A r i t h m e t i c module *
271 * * * * * * * * * * * * * * * * * * * * * * */
272
273 / * Modular a d d i t i o n * /
274 bool
275 t e s t _ a r i _ a d d i t i o n ( char ( * data ) [DATAADD16_NUM_HEADERS] [DATAADD16_NUM_CHARS
+1])
276 {
277 int i = 0;
278 char ( * e n t r y ) [DATAADD16_NUM_CHARS+ 1 ] ;
279 ew_str_t mod_s , op1_s , op2_s , s o l _ s , r e s _ s ;
280 ecc_word_t mod, op1 , op2 , s o l , r e s ;
281
282 while ( i ++ < DATAADD16_NUM_ENTRIES ) {
283 e n t r y = * data ++;
284 / * S e t parameter v a l u e s from data s t r i n g s * /
285 i f ( ! e w _ s e t _ s t r (&mod, e n t r y [ 0 ] ) ) goto e r r o r ;
286 i f ( ! e w _ s e t _ s t r (&op1 , e n t r y [ 1 ] ) ) goto e r r o r ;
287 i f ( ! e w _ s e t _ s t r (&op2 , e n t r y [ 2 ] ) ) goto e r r o r ;
288 i f ( ! e w _ s e t _ s t r (& s o l , e n t r y [ 3 ] ) ) goto e r r o r ;
289 / * Load parameters i n t o CP r e g i s t e r s * /
290 ew_load_cr0 (&op1 ) ;
291 ew_load_cr1 (&op2 ) ;
292 EW_LOAD_MOD(&mod) ;
293 / * Perform a d d i t i o n * /
294 ECC_ADD( " 0 " , " 1 " , " 2 " ) ;
295 /* V e r i f y r e s u l t */
296 e w _ s t o r e _ c r 2 (& r e s ) ;
297 i f ( ! ew_eq(& r e s , &s o l ) )
298 goto wrong ;
299 MSG( ( " T e s t e n t r y %d passed . \ n " , i ) ) ;
82 Appendix F. Benchmark & Test program

300 }
301 return true ;
302
303 wrong :
304 e w _ t o _ s t r (&mod, mod_s , EW_STR_LENGTH) ;
305 e w _ t o _ s t r (&op1 , op1_s , EW_STR_LENGTH) ;
306 e w _ t o _ s t r (&op2 , op2_s , EW_STR_LENGTH) ;
307 e w _ t o _ s t r (& r e s , r e s _ s , EW_STR_LENGTH) ;
308 e w _ t o _ s t r (& s o l , s o l _ s , EW_STR_LENGTH) ;
309 MSG( ( " %s\n "
310 " + %s\n "
311 " (mod %s ) \n "
312 " = %s\n "
313 " got %s\n " ,
314 op1_s , op2_s , mod_s , r e s _ s , s o l _ s ) ) ;
315 error :
316 MSG( ( " F a i l e d . . . \ n " ) ) ;
317 return f a l s e ;
318 }
319
320 / * Modular a d d i t i o n * /
321 bool
322 t e s t _ a r i _ m u l t i p l i c a t i o n ( char ( * data ) [DATAMUL16_NUM_HEADERS] [
DATAMUL16_NUM_CHARS+ 1 ] )
323 {
324 int i = 0;
325 char ( * e n t r y ) [DATAMUL16_NUM_CHARS+ 1 ] ;
326 ew_str_t mod_s , op1_s , op2_s , s o l _ s , r e s _ s ;
327 ecc_word_t mod, op1 , op2 , s o l , r e s ;
328
329 while ( i ++ < DATAMUL16_NUM_ENTRIES ) {
330 e n t r y = * data ++;
331 / * S e t parameter v a l u e s from data s t r i n g s * /
332 i f ( ! e w _ s e t _ s t r (&mod, e n t r y [ 0 ] ) ) goto e r r o r ;
333 i f ( ! e w _ s e t _ s t r (&op1 , e n t r y [ 1 ] ) ) goto e r r o r ;
334 i f ( ! e w _ s e t _ s t r (&op2 , e n t r y [ 2 ] ) ) goto e r r o r ;
335 i f ( ! e w _ s e t _ s t r (& s o l , e n t r y [ 3 ] ) ) goto e r r o r ;
336 / * Load parameters i n t o CP r e g i s t e r s * /
337 ew_load_cr0 (&op1 ) ;
338 ew_load_cr1 (&op2 ) ;
339 EW_LOAD_MOD(&mod) ;
340 / * Perform a d d i t i o n * /
341 ECC_MUL( " 0 " , " 1 " , " 2 " ) ;
342 /* V e r i f y r e s u l t */
343 e w _ s t o r e _ c r 2 (& r e s ) ;
344 i f ( ! ew_eq(& r e s , &s o l ) )
345 goto wrong ;
346 MSG( ( " T e s t e n t r y %d passed . \ n " , i ) ) ;
347 }
348 return true ;
349
350 wrong :
351 e w _ t o _ s t r (&mod, mod_s , EW_STR_LENGTH) ;
352 e w _ t o _ s t r (&op1 , op1_s , EW_STR_LENGTH) ;
353 e w _ t o _ s t r (&op2 , op2_s , EW_STR_LENGTH) ;
354 e w _ t o _ s t r (& r e s , r e s _ s , EW_STR_LENGTH) ;
355 e w _ t o _ s t r (& s o l , s o l _ s , EW_STR_LENGTH) ;
356 MSG( ( " %s\n "
357 " * %s\n "
358 " (mod %s ) \n "
359 " = %s\n "
360 " got %s\n " ,
361 op1_s , op2_s , mod_s , r e s _ s , s o l _ s ) ) ;
362 error :
363 MSG( ( " F a i l e d . . . \ n " ) ) ;
364 return f a l s e ;
365 }
366
367 / * Modular a d d i t i o n * /
368 bool
369 t e s t _ a r i _ d i v i s i o n ( char ( * data ) [DATADIV16_NUM_HEADERS ] [ DATADIV16_NUM_CHARS
+1])
Appendix F. Benchmark & Test program 83

370 {
371 int i = 0;
372 char ( * e n t r y ) [DATADIV16_NUM_CHARS+ 1 ] ;
373 ew_str_t op1_s , op2_s , s o l _ s , r e s _ s ;
374 ecc_word_t op1 , op2 , s o l , r e s ;
375
376 while ( i ++ < DATADIV16_NUM_ENTRIES ) {
377 e n t r y = * data ++;
378 / * S e t parameter v a l u e s from data s t r i n g s * /
379 i f ( ! e w _ s e t _ s t r (&op1 , e n t r y [ 0 ] ) ) goto e r r o r ;
380 i f ( ! e w _ s e t _ s t r (&op2 , e n t r y [ 1 ] ) ) goto e r r o r ;
381 i f ( ! e w _ s e t _ s t r (& s o l , e n t r y [ 2 ] ) ) goto e r r o r ;
382 / * Load parameters i n t o CP r e g i s t e r s * /
383 ew_load_cr0 (&op1 ) ;
384 ew_load_cr1 (&op2 ) ;
385 / * Perform a d d i t i o n * /
386 ECC_DIV ( " 0 " , " 1 " , " 2 " ) ;
387 /* V e r i f y r e s u l t */
388 e w _ s t o r e _ c r 2 (& r e s ) ;
389 i f ( ! ew_eq(& r e s , &s o l ) )
390 goto wrong ;
391 MSG( ( " T e s t e n t r y %d passed . \ n " , i ) ) ;
392 }
393 return true ;
394
395 wrong :
396 e w _ t o _ s t r (&op1 , op1_s , EW_STR_LENGTH) ;
397 e w _ t o _ s t r (&op2 , op2_s , EW_STR_LENGTH) ;
398 e w _ t o _ s t r (& r e s , r e s _ s , EW_STR_LENGTH) ;
399 e w _ t o _ s t r (& s o l , s o l _ s , EW_STR_LENGTH) ;
400 MSG( ( " %s\n "
401 " / %s\n "
402 " = %s\n "
403 " got %s\n " ,
404 op1_s , op2_s , r e s _ s , s o l _ s ) ) ;
405 error :
406 MSG( ( " F a i l e d . . . \ n " ) ) ;
407 return f a l s e ;
408 }

L ISTING F.1: C main of test and benchmark program.


85

References

[1] N. Koblitz, “Elliptic curve cryptosystems”, Math. Comp., vol. 48, pp. 203–
209, 1987, ISSN: 0025-5718. DOI: 10.1090/S0025-5718-1987-0866109-
5.
[2] V. S. Miller, “Use of elliptic curves in cryptography”, in Advances in
Cryptology — CRYPTO ’85 Proceedings, H. C. Williams, Ed., Berlin, Hei-
delberg: Springer Berlin Heidelberg, 1986, pp. 417–426, ISBN: 978-3-
540-39799-1.
[3] A. J. Menezes, S. A. Vanstone, and P. C. V. Oorschot, Handbook of Applied
Cryptography, 1st. Boca Raton, FL, USA: CRC Press, Inc., 1996, ISBN:
0849385237.
[4] W. Diffie and M. Hellman, “New directions in cryptography”, IEEE
Transactions on Information Theory, vol. 22, no. 6, pp. 644–654, Nov. 1976,
ISSN : 0018-9448. DOI : 10.1109/TIT.1976.1055638.
[5] Y. Kumar, R. Munjal, and H. Sharma, “Comparison of symmetric and
asymmetric cryptography with existing vulnerabilities and counter-
measures”, International Journal of Computer Science and Management Stud-
ies, vol. 11, no. 03, 2011.
[6] R. Tripathi and S. Agrawal, “Comparative study of symmetric and asym-
metric cryptography techniques”, International Journal of Advance Foun-
dation and Research in Computer (IJAFRC), vol. 1, no. 6, pp. 68–76, 2014.
[7] E. Rescorla. (2018). The transport layer security (tls) protocol version
1.3, [Online]. Available: https : / / tools . ietf . org / html / rfc8446
(visited on 11/09/2018).
[8] IEEE. (2017). Why we need low-power, low-latency devices, [Online].
Available: https://innovationatwork.ieee.org/why-we-need-low-
power-low-latency-devices/ (visited on 06/26/2019).
[9] M. Guerra. (2017). The power of iot devices, [Online]. Available: https:
//www.electronicdesign.com/power/power-iot-devices (visited on
06/26/2019).
[10] N. Shields. (2017). Here’s how 5g will revolutionize the internet of
things, [Online]. Available: https://www.businessinsider.com/how-
5g- will- revolutionize- the- internet- of- things- 2017- 6?r=US&
IR=T (visited on 06/26/2019).
[11] M. Hirth, Hardware acceleration of asymmetric elliptic curve cryptography,
2018.
[12] P. B. Bhattacharya, S. K. Jain, and S. Nagpaul, Basic abstract algebra, 2nd.
Cambridge University Press, 1994, ISBN: 0521460816.
86 REFERENCES

[13] B. Lynn. (). Modular arithmetic, [Online]. Available: https://crypto.


stanford . edu / pbc / notes / numbertheory / arith . html (visited on
11/14/2018).
[14] Wikipedia. (2018). Extended euclidaen algorithm, [Online]. Available:
https://en.wikipedia.org/wiki/Extended_Euclidean_algorithm
(visited on 11/14/2018).
[15] ——, (2018). Euclidaen algorithm, [Online]. Available: https : / / en .
wikipedia.org/wiki/Euclidean_algorithm (visited on 11/14/2018).
[16] S. for Efficient Cryptography. (2009). Sec 1: Elliptic curve cryptography,
[Online]. Available: http://www.secg.org/sec1- v2.pdf (visited on
12/19/2018).
[17] J. Balasch, B. Gierlichs, K. Ja¨rvinen, and I. Verbauwhede, “Hardware/-
software co-design flavors of elliptic curve scalar multiplication”, in
2014 IEEE International Symposium on Electromagnetic Compatibility (EMC),
Aug. 2014, pp. 758–763. DOI: 10.1109/ISEMC.2014.6899070.
[18] H. Cohen, A. Miyaji, and T. Ono, “Efficient elliptic curve exponen-
tiation using mixed coordinates”, in Advances in Cryptology — ASI-
ACRYPT’98, K. Ohta and D. Pei, Eds., Berlin, Heidelberg: Springer Berlin
Heidelberg, 1998, pp. 51–65, ISBN: 978-3-540-49649-6.
[19] D. Hankerson, A. J. Menezes, and S. Vanstone, Guide to Elliptic Curve
Cryptography. Berlin, Heidelberg: Springer-Verlag, 2003, ISBN: 038795273X.
[20] R. L. Rivest, A. Shamir, and L. Adleman, “A method for obtaining dig-
ital signatures and public-key cryptosystems”, Commun. ACM, vol. 21,
no. 2, pp. 120–126, Feb. 1978, ISSN: 0001-0782. DOI: 10.1145/359340.
359342. [Online]. Available: http://doi.acm.org/10.1145/359340.
359342.
[21] A. P. Fournaris, I. Zafeirakis, C. Koulamas, N. Sklavos, and O. Koufopavlou,
“Designing efficient elliptic curve diffie-hellman accelerators for em-
bedded systems”, in 2015 IEEE International Symposium on Circuits and
Systems (ISCAS), May 2015, pp. 2025–2028. DOI: 10.1109/ISCAS.2015.
7169074.
[22] Mentor. (2019). Questa® advanced simulator, [Online]. Available: https:
//www.mentor.com/products/fv/questa/ (visited on 06/19/2019).
[23] ——, (2019). Mentor, [Online]. Available: https://www.mentor.com/
(visited on 06/19/2019).
[24] “Ieee standard vhdl language reference manual”, IEEE Std 1076-2008
(Revision of IEEE Std 1076-2002), pp. c1–626, Jan. 2009. DOI: 10.1109/
IEEESTD.2009.4772740.
[25] “Ieee standard for verilog hardware description language”, IEEE Std
1364-2005 (Revision of IEEE Std 1364-2001), pp. 1–590, Apr. 2006. DOI:
10.1109/IEEESTD.2006.99495.
REFERENCES 87

[26] “Ieee standard for systemverilog–unified hardware design, specifica-


tion, and verification language”, IEEE Std 1800-2017 (Revision of IEEE
Std 1800-2012), pp. 1–1315, Feb. 2018. DOI: 10 . 1109 / IEEESTD . 2018 .
8299595.
[27] ARM. (2019). Cortex-m33, [Online]. Available: https : / / developer .
arm.com/ip-products/processors/cortex-m/cortex-m33 (visited on
06/19/2019).
[28] ——, (2019). Arm, [Online]. Available: https://www.arm.com/ (visited
on 06/19/2019).
[29] ——, (2016). Armv8-m architecture reference manual, [Online]. Avail-
able: http://infocenter.arm.com/help/index.jsp?topic=/com.
arm.doc.ddi0553a.d/index.html (visited on 06/26/2019).
[30] Wikipedia. (2019). Hardware acceleration, [Online]. Available: https:
//en.wikipedia.org/wiki/Hardware_acceleration (visited on 06/19/2019).
[31] R. Benadjila, A. Ebalard, and J.-P. Flori. (2017). Libecc project, [On-
line]. Available: https://github.com/ANSSI- FR/libecc (visited on
10/11/2018).
[32] Python Software Foundation. (2018). Python, [Online]. Available: https:
//www.python.org/ (visited on 11/21/2018).
[33] Python Docs. (2018). Python data model, [Online]. Available: https :
//docs.python.org/3/reference/datamodel.html#the-standard-
type-hierarchy (visited on 11/20/2018).
[34] C. Koc, Rsa hardware implementation, rsa laboratories, rsa data security, inc.
august 1995.
[35] J. K. Omura, “A public key cell design for smart card chips”, ISITA’90,
pp. 983–985, 1990.
[36] P. L. Montgomery, “Modular multiplication without trial division”, Math.
Comp, vol. 44, pp. 519–521, 1985. DOI: 10 . 1090 / S0025 - 5718 - 1985 -
0777282-X.
[37] N. I. of Standards and Technology. (2013). Digital signature standards,
[Online]. Available: https : / / nvlpubs . nist . gov / nistpubs / FIPS /
NIST.FIPS.186-4.pdf (visited on 09/09/2018).
[38] S. for Efficient Cryptography. (2010). Sec 2: Recommended elliptic curve
domain parameters, [Online]. Available: http://www.secg.org/sec2-
v2.pdf (visited on 09/09/2018).
[39] OpenCores. (2019). Opencores, [Online]. Available: https://opencores.
org/ (visited on 07/01/2019).

You might also like