Hardware Acceleration of ECC
Hardware Acceleration of ECC
Hardware Acceleration of ECC
NTNU
Abstract
Faculty Name
IE
Master Thesis
With the great number of mobile, battery powered devices and IoT de-
vices being developed, there is a need for efficient, energy effective cryptog-
raphy. Elliptic curve cryptography (ECC) provides high security with small
key size, and seems very well suited for use in embedded, low-power sys-
tems.
The mathematics of ECC are based on set theory, performing operations
on elliptic curves, usually over finite prime fields or binary fields. The secu-
rity of these mathematical operations are based on the Elliptic Curve Discrete
Logarithm Problem.
This thesis has explored how to design a coprocessor for accelerating el-
liptic curve cryptography, based on the results from a pre-study. The copro-
cessor designed in the thesis, ECCo, was designed for use with the ARM
CM33 processor. The CM33 provides a coprocessor interface for tight inte-
gration of coprocessors, which allows instructions to be issued to connected
coprocessors from software. This motivated the design of an instruction set
for the coprocessor.
For the design in this thesis the operations of modular addition, modu-
lar multiplication and integer division was implemented. The design used
for testing consisted of a controller, register bank and arithmetic module.
A pure software implementation of elliptic curve cryptography, libecc, was
compared to the ECCo. Results showed that the hardware accelerated de-
signed performed 3.8x - 27x times better than the pure software implemen-
tation.
Area estimates of the design was aquired through synthesis, using Ques-
tasim. The ECCo accounted for 45% of the area when synthesizing ECCo+CM33.
The estimates showed that the ECCo area consumption was largely domi-
nated by the divisor (73.18% of the total ECCo area), which was implemented
using the SystemVerilog division operator, "/", and no optimization in syn-
thesis. However, the atomic operations of ECC, Modular Multiplication and
Modular Addition, only occupied 1.97% and 1.92%, respectively.
v
Preface
This thesis is a continuation of an autumn project which explored how an
hardware accelerator of elliptic curve cryptography should be implemented
in order to address the shortcomings of elliptic curve cryptography in soft-
ware. Part of the theory is reused from the project. The project will from now
on be referred to as the pre-study.
vii
Contents
Abstract iii
Preface v
1 Introduction 1
1.1 Asymmetric Cryptography . . . . . . . . . . . . . . . . . . . . 1
1.2 Objective and Approach . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Main Contributions . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Background 5
2.1 Set theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.1 Finite Field Arithmetic . . . . . . . . . . . . . . . . . . . 6
2.2 Elliptic Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.1 EC over F p . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.2 EC over F2k . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.3 Point Arithmetics . . . . . . . . . . . . . . . . . . . . . . 9
2.3 Scalar Multiplication . . . . . . . . . . . . . . . . . . . . . . . . 10
2.4 Coordinate Systems . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.5 ECC Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.6 Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.7 ARM Cortex M33 . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.8 Hardware Acceleration . . . . . . . . . . . . . . . . . . . . . . . 13
2.9 libecc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.10 Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3 Previous Work 15
3.1 Modular Addition Implementation . . . . . . . . . . . . . . . . 15
3.2 Modular Multiplication Implementation . . . . . . . . . . . . . 16
3.3 FPGA Elliptic Curve Coprocessor . . . . . . . . . . . . . . . . . 18
3.4 Pre-Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
5 Implementation 25
5.1 ECCo Instruction Set . . . . . . . . . . . . . . . . . . . . . . . . 25
5.2 ECCo Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 27
5.3 Internal Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5.4 Register Bank . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.5 Arithmetic Module . . . . . . . . . . . . . . . . . . . . . . . . . 30
5.5.1 Negation . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.5.2 Integer Division . . . . . . . . . . . . . . . . . . . . . . . 32
5.5.3 Modular Addition . . . . . . . . . . . . . . . . . . . . . 33
5.5.4 Modular Multiplication . . . . . . . . . . . . . . . . . . 33
5.5.5 Test Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.5.6 Verification - Arithmetic Module . . . . . . . . . . . . . 35
5.6 Controller Module . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.6.1 Verification - Controller Module . . . . . . . . . . . . . 37
5.7 Verification - ECCo . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.8 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.8.1 ECCo Wrapper . . . . . . . . . . . . . . . . . . . . . . . 39
5.8.2 Big Number library . . . . . . . . . . . . . . . . . . . . . 39
5.8.3 Benchmark Software . . . . . . . . . . . . . . . . . . . . 40
6 Results 43
6.1 Speed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
6.2 Area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
7 Future Work 47
7.1 Instruction Set Architecture . . . . . . . . . . . . . . . . . . . . 47
7.2 Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
7.3 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
8 Conclusion 49
C Test Data 57
D ECCo C Wrapper 59
List of Abbreviations
Chapter 1
Introduction
Today, many mobile and embedded devices are being used daily, and the
number of such devices are ever increasing. Embedded devices are used in
many applications where security is a concern, be it for a company or per-
sonal privacy: In hospitals, smart cards (banking, SIM, access control), mo-
bile phones, wifi routers, etc. Many of these use battery powered devices,
which in addition to security issues require low power solutions. This issue
motivates the exploration of low-power implementation of cryptographic al-
gorithms. A field of cryptography which seems suited for low-power appli-
cations is Elliptic Curve Cryptography (ECC), which was introduced in the
80s by Neil Koblitz [1] and Victor Miller [2]. It has gained popularity for
desktop and server use, and many of the algorithms in the Transport Level
Security protocol 1.3 (TLS 1.3) are elliptic curve (EC) algorithms.
In this thesis an implementation of a coprocessor for the ARM Cortex-
M33 (CM33) designed for accelerating Elliptic Curve Cryptography (ECC)
is designed and tested. The work is a continuation of the autumn project
on hardware acceleration of ECC, which concluded that the optimal use of
a hardware accelerator were to perform the entire operation of scalar multi-
plication (SM) in hardware. The implementation in this thesis aims at accel-
erating the entire SM in hardware, and taking advantage of the features the
coprocessor interface of the CM33 provides.
In this thesis cryptosystem is used in the same way as defined in [3]: “A
cryptosystem is a general term referring to a set of cryptographic primitives
used to provide information security services. Most often the term is used in
conjunction with primitives providing confidentiality, i.e., encryption.”
Also, the term big numbers are used to refer to numbers of bit length longer
than a processors word length.
been corrupted, or the private key may be used to decrypt a message which
has been encrypted using the public key.
The security of public key cryptography systems relies on the private key
being infeasible for an attacker to compute, but not impossible given infinite
time and resources. That is, public key cryptosystems are computationally se-
cure and it is infeasible for an attacker to compute the private key if it requires
≈ 10100 instructions [4].
Another very common type of cryptosystems are symmetric cryptogra-
phy which uses a single shared key. These systems usually require smaller
key sizes and have lower power consumption compared to public key sys-
tems [5][6]. Because of this symmetric cryptosystems are prefered when en-
crypting large amounts of data, but since they require the shared key to be
shared over a secure channel it is usually not sufficient to rely solely on sym-
metric key cryptography. As a possible solution to this, a public key cryp-
tosystem was introduced in 1976 by Whitfield Diffie and Martin E. Hellman
[4] which enables two parties to securely share a key over an insecure chan-
nel, thus allowing secure communication through a combination of asym-
metric and symmetric cryptosystems.
This combination of symmetric and asymmetric cryptosystems are now
standard and the TLS 1.3 [7] standard describes a set of cryptosystems to
use for secure communication over insecure channels. A number of these
systems are public key systems and with the increasing demand for high
security without reducing the efficiency of low power devices such as IoT
[8][9] and mobile devices [10] it seems like a good incentive to explore the
possibilities of accelerating public key cryptosystems.
Further more, TLS defines a number of ellptic curve (EC) cryptosystems
to use. EC cryptosystems are systems that uses mathematics based on elliptic
curves and have traits that makes them suited for use in resource limited
environments, such as for IoT devices. ECC algorithms are often considered
safer than their non-EC counterparts [1], and this safety is provided with
smaller key sizes. The benefit of smaller key sizes is that less storage for the
variables of the algorithm is required and less data needs to be transfered
between devices. An efficient and good implementation of ECC algorithms
could potentially benefit IoT devices by reducing power consumption while
still maintaning high security.
should be tested separately during the development process, using test data
generated by software scripts, providing reliable test data.
1.4 Structure
Chapter 2 presents mathematical and other related background information
necessary for the rest of the thesis. In Chapter 3 previous work relevant
for this thesis is presented. Chapter 4 details the methodology and design
choices of the coprocessor. Chapter 5 describes the implementation details
of the design, and Chapter 6 presents the results of the thesis. Finally, Chap-
ter 7 discusses thoughts on future work on the coprocessor, and Chapter 8
concludes the report.
5
Chapter 2
Background
This thesis is mainly concerned with elliptic curve cryptography, which are
cryptosystems that uses mathematical operations on elliptic curves over fi-
nite fields. In order to give the reader a better understanding of these subjects
this chapter gives a brief introduction into the mathematical field of set the-
ory, focusing on the understanding of finite fields, and explaining the funda-
mentals of elliptic curves and related arithmetic operations on elliptic curves.
Further, this chapter describes algorithms for implementation of modular
arithmetic and elliptic curve operations in hardware, which are used later
in the implementation of the coprocessor. Lastly this chapter also briefly de-
scribes the tools used.
F7 = {0, 1, 2, 3, 4, 5, 6} (2.1)
If there exists a positive integer n such that n · a = 0 for all a ∈ F then the
smallest such number is called the characteristic of F. If no such number exist
then the characteristic of F is said to be zero [12, p.170]. In our example of F7
the characteristic is 7, since 7 · a ≡ 0 (mod 7) for a ∈ F7 . The characteristic
of any finite field GF ( pk ) is p [12, p.311]. The size of a field, q, is also called
the order of the field.
Of particular interest when working with elliptic curves are finite fields
where q = p1 , prime fields, and finite fields where q = 2k , binary fields.
6 Chapter 2. Background
4+6 = 3 (2.2)
1−5 = 3 (2.3)
2·5 = 3 (2.4)
5 · 4−1 = 3 (2.5)
Equations 2.2, 2.4 and 2.5 is 3 since 10 ≡ 3 (mod 7) and Equation 2.3 is
3 since −4 ≡ 3 (mod 7). Equation 2.5 is an example of modular division
which is the most complicated operation of the four. In order to perform
modular division one needs to find the modular inverse of the divisor, which
is why modular division often is written as in Equation 2.5, avoiding the
division operator, "/", to avoid confusion with integer division. [13]
To find the modular inverse of a field element the Extended Euclidean
Algorithm is used [14]. It is an extension to the Euclidean Algorithm which
is an algorithm for finding the greatest common divisor of two numbers, a
and b [15]. The extended algorithm can further be used to find two numbers,
x and y, such that:
ax + by = gcd( a, b) (2.6)
For the level of details needed in this report we can now simply say that
a and b has to be co-prime (gcd( a, b) = 1) and assign b = q, the field size. It
can be shown that this leads to Equation 2.7.
ax ≡ 1 (mod q) (2.7)
This allows us to find the inverse x of element a by solving for x (x ∈ Fq ).
In Equation 2.5 a = 4 and q = 7, and so, we can find the inverse of 4 by
solving for x in Equation 2.7:
4x ≡ 1 (mod 7)
⇓
x=2
Equation 2.5 can then be explained by replacing 4−1 with the modular
inverse of 4:
5·2 ≡ 3 (mod 7)
2.2. Elliptic Curves 7
2.2.1 EC over F p
“Let F p be a prime finite field so that p is an odd prime number, and let
a, b ∈ F p satisfy 4a3 + 27b2 6≡ 0 (mod p). Then an elliptic curve E(F p ) over
F p defined by the parameters a, b ∈ F p consists of the set of solutions or
points P = ( x, y) for x, y ∈ F p to the equation:
y2 ≡ x3 + ax + b (mod p) (2.8)
together with an extra point O called the point at infinity.” [16]
infinite field. However, in cryptography finite fields are used, in which case
there only exists discrete solutions to the elliptic curve, and for all of the
solutions the x and y values must be in F p .
The discrete solutions to the elliptic curve (Equation 2.8) are plotted in
Figure 2.1, and it is apparent that only the solutions (0, 1) and (1, 0) lie on
the curve itself. This is because the x and/or y values resulting in the other
solutions produced a LHS or RHS value in Equation 2.8 which were ≥ 7.
(0,6)(1,6)
(6,4)
(2,3)
(2,2)(3,2)(4,2)
(0,1) (4,1)(5,1)
(1,0) x
y y
P1
P2 P1
x x
P3 = P1 + P2
P3 = 2P1
x3 ≡ − x1 − x2 + α2 (mod p) (2.10)
y3 ≡ −y1 + α( x1 − x3 ) (mod p) (2.11)
where
( y2 − y1
x2 − x1 ifP1 6= P2
α= 3x12 + a (2.12)
2y1 ifP1 = P2
In the case of elliptic curves over F2m , when P1 6= P2 :
x3 ≡ α2 + α + x1 + x2 + a (mod p) (2.13)
y3 ≡ α( x1 + x3 ) + x3 + y1 (mod p) (2.14)
y + y2
α= 1 (2.15)
x1 + x2
and when P1 = P2 :
10 Chapter 2. Background
x3 ≡ α2 + α + a (mod p) (2.16)
y3 ≡ x12
+ ( α + 1) x3 (mod p) (2.17)
y
α = x1 + 1 (2.18)
x1
Note that all of these operations require modular inversion for the divi-
sion in the calculation of α, which is an expensive operation.
In this algorithm P is the base point on the curve, which is being multi-
plied with the scalar k, and Q is the resulting point on the curve. t is the bit
length of k. What Algorithm 1 does is to iterate through all the bits in k, start-
ing to the left (most significant bit). First R0 is set to the point at infinity, and
R1 to the base point P. For each iteration it performes point doubling of R0
(doubling of point at infinity returns the point at infinity), and if the current
bit i is 1 then the point addition of R0 and R1 is stored in R0 (addition of a
point at infinity and a point P returns the point P).
This algorithm will perform t point doublings and, in worst case, t point
additions.
2.4. Coordinate Systems 11
q Are the field order (Number of elements in the field. See Chapter 2.1)
When Bob then receives the message and the signature from Alice he can
use Algorithm 3 to verify that the message has not been corrupted during
sending, and be sure that it is the exact same message as Alice sent. The proof
of the verification is out of scope for this thesis, but note that the verification
requires two scalar multiplications.
Relating to the TLS 1.3 [7] standard: ECDH [4] [21] is often used to pass
a symmetric key between Alice and Bob, along with an ECDSA-signature
which verifies that the symmetric key has not been corrupted during trans-
mission.
2.6. Tools 13
2.6 Tools
For simulation and synthesis the tool Questasim [22] is used. Questasim is
developed by Mentor [23]. It is a high-performance tool supporting sim-
ulation, debugging and functional coverage using HDL languages such as
VHDL [24], Verilog [25], and SystemVerilog [26], including SystemVerilogs
object oriented features and SVA.
2.9 libecc
libecc [31] is a library implementing EC mathematics hierarchically, as illus-
trated in Figure 2.4. The library provides separate modules which provides
natural numbers arithmetics, field arithmetics (Chapter 2.1), elliptic curve
14 Chapter 2. Background
+−−−−−−−−−−−−−−−−−−−−−−−−−+
|EC * DSA s i g n a t u r e |
|algorithms | <−−−−−−−−−−−−−−−−−−+
|( ISO 14888 − 3) | |
+−−−−−−−−−−−+−−−−−−−−−−−−−+ |
^ |
| |
+−−−−−−−−−−−+−−−−−−−−−−−−−+ +−−−−−−−−−−+−−−−−−−−−−−−+
|Curves ( SECP , Brainpool , | | Hash |
|FRP , . . . ) | | functions |
| | | |
+−−−−−−−−−−−+−−−−−−−−−−−−−+ +−−−−−−−−−−−−−−−−−−−−−−−+
^ @@@@@@@@@@@@@@@@@@@@@@@@@@@@
| @{ Useful a u x i l i a r y modules }@
+−−−−−−−−−−−+−−−−−−−−−−−−−+ @+−−−−−−−−−−−−−−−−−−−−−−−−+@
| E l l i p t i c curves | @| Utils |@
| c o r e ( s c a l a r mul , . . . ) | @+−−−−−−−−−−−−−−−−−−−−−−−−+@
+−−−−−−−−−−−+−−−−−−−−−−−−−+ @| Sig S e l f t e s t s |@
^ @| A r i t h S e l f t e s t s |@
| @| User Examples |@
| @+−−−−−−−−−−−−−−−−−−−−−−−−+@
| @| E x t e r n a l deps |@
+−−−−−−−−−−−+−−−−−−−−−−−−−+ @+−−−−−−−−−−−−−−−−−−−−−−−−+@
| Fp f i n i t e f i e l d s | @| LibECC conf f i l e s |@
| arithmetic | @+−−−−−−−−−−−−−−−−−−−−−−−−+@
+−−−−−−−−−−−+−−−−−−−−−−−−−+ @| Scripts |@
^ @+−−−−−−−−−−−−−−−−−−−−−−−−+@
| @@@@@@@@@@@@@@@@@@@@@@@@@@@@
+−−−−−−−−−−−+−−−−−−−−−−−−−+ +−−−−−−−−−−−−−−−−−−−−−−−−+
| NN n a t u r a l | <−−−−−−+ Machine r e l a t e d |
| numbers a r i t h m e t i c | | ( words , . . . ) |
+−−−−−−−−−−−−−−−−−−−−−−−−−+ +−−−−−−−−−−−−−−−−−−−−−−−−+
2.10 Python
Python [32] is an interpreted, general-purpose programming language with
dynamic type checking. Python has several interesting features which makes
it flexible and easy to use, e.g. Python integers have an unlimited range [33]
which makes handling of big numbers trivial. Internally Python represents
big numbers as an array of fixed sized integers, but it is hidden when work-
ing with Python. Python also supports object oriented programming.
15
Chapter 3
Previous Work
The operations on lines 1 and 2 are normal addition and subtraction, and
the subtraction will require the 2’s complement of n to either be calculated
during operation or precomputed and be an input to the HW module. Algo-
rithm 4 is restricted to positive numbers smaller than n.
Another method was proposed in [35]. Let n < 2k and m = 2k − n, where
k may be the word size of the system. It is assumed that A, B < 2k . Modular
addition can the be computed as in Algorithm 5.
16 Chapter 3. Previous Work
S = S0 + m
= ( A + B − 2k ) + (2k − n )
= A+B−n
2P + A · Bj ≤ 2(n − 1) + (n − 1) = 3n − 3
Thus, maximum two subtractions are needed to reduce P to 0 ≤ P < n,
which means the modulo operation in line 4 may be implemented as condi-
tional subtractions.
Another efficient modular multiplication algorithm was proposed by Pe-
ter Montgomery in [36]. The result from the Montgomery algorithm is
P = A · B · r −1 (mod n)
where A, B < n and gcd(n, r) = 1. This adds overhead by requiring con-
version of the result. The number of bits in A or B is less than k, and we take
r = 2k [34]. The multiplication is shown in Algorithm 8.
Here, the division on line 7 is just a right shift, and the operations on line
3 and 5 can be combined: the LSB of P can be calculated before computing
the sum on line 3.
18 Chapter 3. Previous Work
3.4 Pre-Study
In the pre-study [11] possible partitioning between hardware and software
for an ECC accelerator was explored. Using a pure software implementation
of ECC profiling results were analyzed, trying to determine which parts of
the software implementation could benefit the most from hardware acceler-
ation.
The results showed that roughly 18.8% of execution time during testing
was spent on managing the software implementation of big numbers: ini-
tialization, checking correct behavior, and handling number meta data. The
conclusion was that as much as possible of an EC cryptosystem, in particular
the scalar multiplication, should be performed by a coprocessor to reduce the
overhead of dealing with big numbers in software.
19
Chapter 4
The main goal for this thesis is to implement an Elliptic Curve Cryptography
Coprocessor (ECCo) which primary purpose is to accelerate the scalar multi-
plication in EC cryptosystems, as was the conclusion of the pre-study [11]. To
perform the scalar multiplication the fundamental mathematical operations
needed are modular multiplication and modular addition (Chapter 2.1.1),
and integer division, when using affine coordinates (Chapter 2.4). These op-
erations are enough to perform point doubling and point addition (Chapter
2.2.3), which allows implementation of an entire scalar multiplication (SM).
The primary goal when designing the ECCo is therefore to implement the
modular arithmetic operations.
The design of a coprocessor are potentially a complex and lengthy pro-
cess. In the design process of the ECCo, to try to simplify this process,
reusable design patterns was actively used: communication between sub-
modules in the ECCo was generalized with clearly defined protocols; test
data for all arithmetic operations was generated with a single Python script,
utilizing Pythons OOP features; and a common testbench setup was used for
all modules. These design patterns are further explained in their respective
methodology and implementation chapters.
This chapter discusses which choices where made during the design and
testing of the ECCo, and why these choices were made. Further, it highlights
important aspects of the design process, specifically where and why reusable
design patterns where used.
The ECCo design in this thesis will interface with the ARM Cortex M33
(Chapter 2.7) for use from software. The CM33 provides a coprocessor inter-
face which allows for tight integration of coprocessors and issuing opcodes to
the coprocessor from software. Because of this, Solutions 2. and 3. are good
choices. Ideally, Solution 3. would be chosen, but due to time limitations So-
lution 2. is the choice for this thesis. Allowing for estimates of SM speedup
with and without the coprocessor by comparing speed of atomic operations
in hardware and software. This minimal implementation will also be able to
give an indication on how the size of the coprocessor will compare to that of
the CM33 core itself.
Since the ECCo will be controlled from software through the coprocessor
interface an instruction set has to be defined for the ECCo. The instruction
set proposed in this thesis is presented in Chapter 5.1. The proposed instruc-
tion set includes more than the atomic operations and data transfer; It also
includes logical, comparison, and shift operations. The pre-study concluded
that an entire SM should be performed in the coprocessor in order to max-
imize the benefit of the coprocessor. By including these flow-control and
common operations the ECCo will be able to perform an entire SM without
datatransfer between the ECCo and CM33 during execution, even though it
is being controlled from SW.
This algorithm can handle both positive and negative numbers, and in-
termediate sums larger than 2n. Notice that the while loops are mutually
exclusive; After the intermediate sum, S0 = A + B, has been calculated, S0
will either be reduced or increased. Clearly, the while loops are not syn-
thesizable. Details on the interpretation of this algorithm are presented in
Chapter 5.
used. This test data will be reused in this thesis. Test data for simpler opera-
tions (i.e. modular addition, division, etc.) is easy to generate using a Python
script. Using a Python script will also allow generating more test data for SM
and point arithmetic, since a Python implementation of these operations was
written for the pre-study. The details of this script are described in Chapter
5, and full source code is listed in Appendix A.
Generation of test data contains a repeating pattern, regardless of what
data is being generated: reading data from file, and writing properly format-
ted data to file. This can be handled by Pythons OOP features (see Chapter
5.5.5 and 2.10).
4.5 Verification
In order to both verify correct behavior and to speed up the development
process, the entire ECCo and each sub-module are separately tested with a
testbench verifying correct behavior. In the case of the arithmetic operations
this includes checking results with test data, previously mentioned in Chap-
ter 4.4.
Design of testbenches are a repeating process, which can be simplified
by following a design pattern. During the development of ECCo the chosen
pattern was:
• All signals in the DUTs interface are connected to, and controlled by,
the testbench. Allowing independent testing of all sub-modules.
Chapter 5
Implementation
This chapter describes implementation details about the work done for this
thesis: proposed instruction set for the ECCo; the implementation of the
ECCo and its integration with the CM33; testbench architecture and verifi-
cation of the ECCo and its sub-modules; test data generation using a Python
script; C implementation of the big numbers library, and the ECCo software
wrapper; benchmarking of modular arithmetic operations, using the ECCo
and a pure software implementation.
The logical, shift and comparison operations mentioned are not imple-
mented in the ECCo for this thesis. The proposed instruction set includes
these instructions, and discusses why they should be included in a future
implementatin of an elliptic curve coprocessor.
Is zero Operand
Invert comparison
Set signed bit Index
• Set/Unset are required because the signed bit is not accessible through
the data transfer instructions (see Chapter 5.4 for details).
The ECCo is connected to the CM33 through the coprocessor interface. In-
ternally the sub-modules are connected through two interfaces, as discussed
in Chapter 4.6. These interfaces are described in Chapter 5.3.
28 Chapter 5. Implementation
In the state machine in Figure 5.3 StartT, ReadyT, and WaitT are names
of possible transitions. This is because the output of the state machine are
determined by both state and input. In IDLE the ready signal is asserted, and
the value of error may be either 0 or 1. In WAIT both ready and error is always
0.
The in_Registers interface exposes all the registers directly, for reading. To
write, the signals enable, register, and data are used, indicating when to enable
writing, which register to write to, and the write data, respectively. The SV
interface implementation of in_Registers are listed in Appendix B.
CR0 0 X X
CR1 1 X X
... ... ... ...
CR13 13 X X
Modulo Register 14 X X
Status Register 15 X
Table 5.4 lists all registers in the register bank. There is only two non-
general registers: the modulo register and the status register. The modulo
register is used for storing the modulo during modular arithmetic operations.
The status register is read-only (all writing to it is done inside the register
bank) and contains information about the current status of the ECCo:
Bit 1-15 Active bits. These are reserved for future use in an asynchronous de-
sign, for indicating which operation modules are currently working
and which are idle.
Bit 16-30 Signed bits. The signed bits of register 0-14, respectively.
5.5.1 Negation
The negation operation is a single cycle operation which is straight forward
to implement, and performs a 2’s complement negation of the operand. It is
continually calculated:
1 a s s i g n r e s = ~( operand ) + 1 ;
DoneT Transition to IDLE when an addition has finished. Asserting done for
one cycle.
WaitT Transition in IDLE when not performing an operation.
ReduceT Transition to REDUCE when the intermediate sum is greater than
the modulo, and need to be reduced to 0 ≤ Sum < Modulo.
IncreaseT Transition to INCREASE when the intermediate sum is less than
0, and need to be increased to 0 ≤ Sum < Modulo.
If initially: op1 + op2 < mod then the calculation only takes one cycle to
complete, or else op1 mux selects the intermediate result as operand 1 and
op2 mux selects either mod or −mod as operand 2, depending on if the state
is INCREASE or REDUCE, respectively. In worst case the addition could
take 2WORD_W IDTH − 1 cycles to perform, calculating ((2WORD_W IDTH − 1) +
0) % 1.
Notice the integer division // does not handle division of negative num-
bers correctly. Instead any negative numbers are negated, and basic algebra
rules are used to determine the sign of the result, just as it is implemented in
hardware.
The script source code is listed in Appendix A. Test data values used for
verification are listed in Appendix C.
RyT - ready transition Transition to READY, with ready asserted and error
deasserted, waiting for an instruction to be issued.
ET - error transition Transition to READY, with both ready and error asserted.
May be from an write error, read error, data processing error or an in-
valid instruction being issued.
WaT - wait transition Transition to WAIT when valid is asserted and a data
processing operation is issued.
WaWT - wait wait transition Transition to WAIT, from WAIT, while current
data processing operation is not yet finished.
5.6. Controller Module 37
WaRT - wait ready transition Transition to WAIT, from WAIT, when a data
processing operation finished successfully and valid is asserted, request-
ing a new data processing operation immediately.
WaET - wait error transition Transition to WAIT, from WAIT, when a data
processing operation finished with error and valid is asserted, request-
ing a new data processing operation immediately.
ReT - read transition Transition to READ, when the processor wants to read
from a coprocessor register.
ReRT - read ready transition Transition to READ, from WAIT, when a data
processing operation finished successfully and valid is asserted, request-
ing a data transfer operation (read) immediately.
ReET - read error transition Transition to READ, from WAIT, when a data
processing operation finished with error and valid is asserted, request-
ing a data transfer operation (read) immediately.
WrRT - write ready transition Transition to WRITE, from WAIT, when a data
processing operation finished successfully and valid is asserted, request-
ing a data transfer operation (write) immediately.
WrET - write error transition Transition to WRITE, from WAIT, when a data
processing operation finished with error and valid is asserted, request-
ing a data transfer operation (write) immediately.
5.8 Software
For this thesis three software components were implemented: a wrapper for
the coprocessor interface instructions; a big number library for use with the
ECCo; and a benchmarking program.
5.8. Software 39
The big number library and ECCo wrapper were used to verify that com-
munication with the ECCo using the coprocessor interface was working as
expected. To verify correct behavior of the ECCo controller and the imple-
mented operations the test data form Appendix C were used. The source
code of the test programs used for verification are listed in Appendix F.
were the modular multiplication and modular addition. As these are the
fundamental operations of SM the execution time of these will give an in-
dication of the possible speedup. The benchmarking was performed by do-
ing the setup of parameters once, instantiating operand 1 (OP1), operand 2
(OP2), and modulo (MOD) to large 256-bit values. The same values were
used for the libecc and ECCo benchmarks. Then the operation OP1 = OP1 +
OP2 % MOD were performed for the modular addition benchmark, and
OP1 = OP1 ∗ OP2 % MOD for the modular multiplication benchmark.
The benchmarks were performed doing runs of 10 and 100 iterations, i.e.
performing the operation 10 or 100 times, updating the OP1 value each time.
The test values were large 256-bit values, making them similar to values used
during 256-bit SM. These benchmarks does, however, not include tests of
edge cases, such as when MOD << OP1 + OP2 in which case the ECCo will
have a very long execution time, nor does it guarantee coverage of the case
when MOD > OP1 + OP2 or MOD > OP1 ∗ OP2.
The source code for the benchmarking programs are listed in Appendix
F.
43
Chapter 6
Results
6.1 Speed
The execution time of modular addition and modular multiplication is com-
pared between benchmark code running the operations on ECCo and using
the software implementation from libecc. Table 6.2 summarizes the bench-
marking results. The execution time is measured in clock cycles. As a ref-
erence, a simulation run without any operation was performed in order to
measure the setup time of the system. This empty run had an execution time
of 36,790 cycles (this is included in the results presented in Table 6.2).
The results show that the ECCo performed 3.8 times faster for modular
addition at 10 iterations, and 8 times faster at 100 iterations. As for the mod-
ular multiplication the ECCo performed 7.8 times faster at 10 and 27 times
faster at 100 iterations.
While the ECCo is significantly faster than the compared software imple-
mentation another notable result is how the ECCo and software implemen-
tation scales differently: From 10 to 100 iterations the ECCo had an increase
Measurement Increase
in execution time of 1.01x (addition) and 1.8x (multiplication), while the soft-
ware implementation had an increase of 2.1x (addition) and 6.5x (multiplica-
tion). This gives an indication on the benefit of having a coprocessor which
allows an extensive amount of operations to be performed without the need
for data transfer between processor and coprocessor.
6.2 Area
The design of the CM33 with the ECCo was synthesizable, and did not have
any negative slack. It was synthesized without any optimization, at a fre-
quency of 128MHz. The area results are presented as a comparison between
synthesis estimates of the design with and without the ECCo included (Ta-
ble 6.4), and a area distribution between the sub-modules of the ECCo (Table
6.6).
The values shown in Table 6.4 are percentage increase in area when syn-
thesizing the CM33 and CM33+ECCo. Clearly, the ECCo contains a great
deal of combinatorial logic, increasing area of combinatorial cell area by 312%.
In total the ECCo’s area equals 83% of existing design.
The values shown in Table 6.6 are the area distribution of the ECCo sub-
modules.
6.2. Area 45
ECCo Accumulative Area The area percentage of the ECCo occupied by this
module, included its sub-modules. The percentages of Arithmetic, Con-
troller, and Register Bank modules add up to 100%, being all the sub-
modules of the ECCo. The percentages of Multiplication, Addition, Nega-
tion, and Division are included in the Arithmetic percentage, but they do
not sum up to 84.63% since the Arithmetic module contains some logic
of its own.
Combinatorial Area The area percentage of combinatorial cells for only this
module, not including any of its sub-modules. E.g. the Arithmetic mod-
ule uses 5.8% of the total area of combinatorial cells in the ECCo, ex-
cluded its sub-modules, and the Division module uses 83.02% of the
total combinatorial area of the ECCo.
Chapter 7
Future Work
The ECCo implementation in this thesis has only included a small subset
of necessary operations and features for the suggested design of a complete
elliptic curve coprocessor. This chapter discusses possible changes and con-
siderations for future work on the coprocessor proposed in this thesis.
7.2 Security
An issue which has not been addressed in this thesis, but which must be
considered for future work, is security of the implementation against attacks
such as side-channel attacks. A way of trying to defend against side-channel
attacks is by using constant time algorithms for calculations, which should
be considered both for the finite-field arithmetic, point operations and the
scalar multiplication algorithm.
7.3 Algorithms
While the implemented algorithms for modular addition and modular mul-
tiplication are simple, with more complex and efficient methods available
48 Chapter 7. Future Work
(Chapters 3.1 and 3.2), the current implementation already provides signifi-
cant speedup over pure software implementation. A future change in choice
of algorithms is necessary for further development, a decision in which a
compromise between security and efficiency surely is needed.
The integer division will, however, need a more area efficient implemen-
tation. Reducing the area consumption of the divisor module could, poten-
tially, significantly reduce the total area of the ECCo.
49
Chapter 8
Conclusion
This thesis has explored how to design a coprocessor for accelerating elliptic
curve cryptography, based on the results from the prestudy [11]. The co-
processor designed in the thesis, ECCo, was designed for use with the ARM
CM33 processor. The CM33 provides a coprocessor interface for tight integra-
tion of coprocessors, which allows the instructions to be issued to connected
coprocessors from software.
This lead to the ECCo being designed with an instruction set providing
the atomic mathematical operations for ECC, with the possibility of adding
implementations of scalar multiplication to the instruct set in a future work.
As time did not allow for the entire proposed instruction set to be im-
plemented only the atomic arithmetic operations were implemented, and an
ECCo design with a controller, register bank and arithmetic module were
used to compare execution time with an ECC software implementation, and
to estimate area usage by synthesis. The ECCo accounted for 45% of the
area when synthesizing ECCo+CM33. The estimates showed that the ECCo
area consumption was largely dominated by the divisor (73.18% of the total
ECCo area), which was implemented using the SystemVerilog division oper-
ator, "/", and no optimization in synthesis. However, the atomic operations
of ECC, Modular Multiplication and Modular Addition, only occupied 1.97%
and 1.92%, respectively. These modules also performed 3.8x - 27x faster than
a pure software implementation of ECC.
While the implemented algorithms for modular addition and modular
multiplication are simple, with more complex and efficient methods avail-
able (Chapters 3.1 and 3.2), the current implementation already provides
significant speedup over pure software implementation. Providing a com-
plete system which allows efficiency to be achieved through several meth-
ods: reducing data transfers, optimizing implementation of mathematical
operations and flexibility and ease-of-use.
51
Appendix A
1 import a r g p a r s e
2 import csv
3 import i o
4 import os
5 import r e
6 import s h u t i l
7 import sys
8 from abc import ABC, a b s t r a c t c l a s s m e t h o d
9 from typing import *
10
11
12 # E xce pt ion c l a s s used t o d i f f e r e n t i o t e between known and unknown e r r o r s .
13 c l a s s DataError ( E xce pt ion ) :
14 pass
15
16
17 # ##############################################################################
18 # #
19 # Baseclass #
20 # #
21 # ##############################################################################
22
23 c l a s s DataABC (ABC) :
24 " " " DataABC i s t h e b a s e c l a s s f o r a l l c a l c u l a t i o n s . I t handles reading from
25 and w r i t i n g t o csv data f i l e s , w r i t i n g t o C f i l e s , and number f o r m a t t i n g
26 ( decimal , hex & b i n a r y ) .
27 """
28 headers = [ ]
29 data = []
30
31 def _ _ i n i t _ _ ( s e l f , headers , f i l e : i o . IOBase , numBase : i n t ) −> None :
32 s e l f . headers = headers
33 rd = csv . r e a d e r ( f i l e )
34 # F i r s t l i n e o f t h e f i l e must be t h e headers
35 f i l e H e a d e r s = rd . __next__ ( )
36 i f s e l f . headers ! = f i l e H e a d e r s :
37 r a i s e DataError ( f ’ [ ! ! ] DataABC , _ _ i n i t _ _ : I n v a l i d headers ! Want { s e l f . headers } − got
,→ { f i l e H e a d e r s } ’ )
38
39 # Read a l l data
40 f o r j , c o l s i n enumerate ( rd ) :
41 # Report and s k i p empty l i n e s
42 i f not c o l s :
43 p r i n t ( f ’ [ ] DataABC , _ _ i n i t _ _ : Reading { f i l e } : Found empty l i n e ( { j + 2 } ) .
,→ Ignoring . . . ’ )
44 continue
45 # Represent t h e data as a d i c t , indexed by header names
46 tmp = d i c t ( )
47 f o r i , h i n enumerate ( s e l f . headers ) :
48 # S a n i t y c h e c k s t o avoid decimal i n t e r p r e t e d as hex e t c .
49 i f not r e . match ( r ’^−?\d+$ ’ , c o l s [ i ] ) and numBase == 10 :
50 r a i s e DataError ( f ’ DataABC , _ _ i n i t _ _ : Reading { f i l e } : T r i e d i n t e r p r e t i n g
,→ non−decimal number as decimal : " { c o l s [ i ] } " ’ )
51 e l i f not r e . match ( r ’ ^−?0x[0 − 9a−fA−F ]+ $ ’ , c o l s [ i ] ) and numBase == 16 :
52 r a i s e DataError ( f ’ DataABC , _ _ i n i t _ _ : Reading { f i l e } : T r i e d i n t e r p r e t i n g non−hex
,→ number as hexadecimal : " { c o l s [ i ] } " ’ )
53 e l i f not r e . match ( r ’ ^−?0b [ 0 1 ] + $ ’ , c o l s [ i ] ) and numBase == 2 :
54 r a i s e DataError ( f ’ DataABC , _ _ i n i t _ _ : Reading { f i l e } : T r i e d i n t e r p r e t i n g
,→ non−b i n a r y number as b i n a r y : " { c o l s [ i ] } " ’ )
55 tmp [ h ] = i n t ( c o l s [ i ] , numBase )
56 s e l f . data . append ( tmp )
57
58 @abstractclassmethod
59 def c a l c u l a t e ( s e l f ) −> None :
60 pass
61
62 @staticmethod
63 def _formatNumber (num : i n t , numFormat : i n t ) −> s t r :
64 # Determine number format s t r i n g
65 i f numFormat == 16 :
66 r e t u r n f ’ 0x {num : x } ’ i f num >= 0 e l s e f ’ −0x { abs (num) : x } ’
67 e l i f numFormat == 2 :
68 r e t u r n f ’ 0b {num : b } ’ i f num >= 0 e l s e f ’ −0b { abs (num) : b } ’
52 Appendix A. Test Data Python script
69 else :
70 r e t u r n f ’ {num} ’
71
72
73 def _formatDataCsv ( s e l f , numFormat : i n t ) −> Generator [ D i c t [ s t r , s t r ] , None , None ] :
74 # I t e r a t e through data values , y i e l d d i c t i o n a r i e s with s t r i n g s o f form atted numbers
75 f o r d i n s e l f . data :
76 tmp = d i c t ( )
77 f o r k , v i n d . items ( ) :
78 tmp [ k ] = s e l f . _formatNumber ( v , numFormat )
79 y i e l d tmp
80
81 def writeCsv ( s e l f , f i l e : i o . IOBase , numFormat : i n t ) −> None :
82 wr = csv . D i c t W r i t e r ( f i l e , f i e l d n a m e s= s e l f . headers )
83 # F i r s t writeCsv t h e header l i n e
84 wr . w r i t e h e a d e r ( )
85 # Write a l l data t o t h e f i l e
86 f o r d i n s e l f . _formatDataCsv ( numFormat ) :
87 wr . writerow ( d )
88
89 def _formatDataC ( s e l f , numFormat : i n t ) −> Generator [ L i s t [ s t r ] , None , None ] :
90 f o r d i n s e l f . data :
91 tmp = l i s t ( )
92 for v in d . values ( ) :
93 tmp . append ( s e l f . _formatNumber ( v , numFormat ) )
94 y i e l d tmp
95
96 def writeC ( s e l f , f i l e : i o . IOBase , numFormat : i n t , fileName : s t r , arrayName : s t r ) −> None :
97 # Need t o know s i z e o f a l l t h e a r r a y s dimensions
98 numEntries = l e n ( s e l f . data ) + 1 # Zero t e r m i n a t e d
99 numHeaders = l e n ( s e l f . headers )
100 numChars = 0
101 # I t e r a t e through a l l v a l u e s and f i n d t h e l o n g e s t s t r i n g
102 f o r d i n s e l f . data :
103 for v in d . values ( ) :
104 l = l e n ( s e l f . _formatNumber ( v , numFormat ) )
105 i f l > numChars :
106 numChars = l
107 numChars += 1 # One e x t r a , f o r t e r m i n a t i n g zero
108
109 # P r i n t some g e n e r a l i n f o r m a t i o n comments
110 p r i n t ( f ’ // Created by { sys . argv [ 0 ] } with data from { fileName }\n// Number base : { numFormat } ’ ,
,→ f i l e = f i l e , end= ’ \n\n ’ )
111 # P r i n t some macros with meta data
112 p r i n t ( f ’ # d e f i n e { arrayName . upper ( ) }_NUM_ENTRIES { numEntries − 1} ’ , f i l e = f i l e )
113 p r i n t ( f ’ # d e f i n e { arrayName . upper ( ) }_NUM_HEADERS { numHeaders } ’ , f i l e = f i l e )
114 p r i n t ( f ’ # d e f i n e { arrayName . upper ( ) }_NUM_CHARS { numChars − 1} ’ , f i l e = f i l e , end= ’ \n\n ’ )
115 # P r i n t a comment with t h e headers
116 p r i n t ( f ’ // [ { " , " . j o i n ( s e l f . headers ) } ] ’ , f i l e = f i l e )
117 # Write t h e a c t u a l data
118 p r i n t ( f ’ char { arrayName } [ { numEntries } ] [ { numHeaders } ] [ { numChars } ] = { { ’ , f i l e = f i l e )
119 f o r data i n s e l f . _formatDataC ( numFormat ) :
120 print ( f """ { { " { ’ " , " ’ . j o i n ( data ) } " } } , " " " , f i l e = f i l e )
121 # End with zero t e r m i n a t i o n
122 print ( ’ {0}\n } ; ’ , f i l e = f i l e )
123
124
125
126 # ##############################################################################
127 # #
128 # Addition #
129 # #
130 # ##############################################################################
131
132 c l a s s ModAddData ( DataABC ) :
133 def _ _ i n i t _ _ ( s e l f , f i l e : i o . IOBase , numBase : i n t ) :
134 super ( ) . _ _ i n i t _ _ ( [ ’ modulo ’ , ’ operand1 ’ , ’ operand2 ’ , ’ r e s u l t ’ ] , f i l e , numBase )
135
136 def c a l c u l a t e ( s e l f ) :
137 # For each e n t r y c a l c u l a t e op1+op2 % mod
138 f o r i , d i n enumerate ( s e l f . data ) :
139 s e l f . data [ i ] [ ’ r e s u l t ’ ] = ( d [ ’ operand1 ’ ] + d [ ’ operand2 ’ ] ) % d [ ’ modulo ’ ]
140
141
142 # ##############################################################################
143 # #
144 # Multiplication #
145 # #
146 # ##############################################################################
147
148 c l a s s ModMulData ( DataABC ) :
149 def _ _ i n i t _ _ ( s e l f , f i l e : i o . IOBase , numBase : i n t ) :
150 super ( ) . _ _ i n i t _ _ ( [ ’ modulo ’ , ’ operand1 ’ , ’ operand2 ’ , ’ r e s u l t ’ ] , f i l e , numBase )
151
152 def c a l c u l a t e ( s e l f ) :
153 # For each e n t r y c a l c u l a t e op1 * op2 % mod
154 f o r i , d i n enumerate ( s e l f . data ) :
155 s e l f . data [ i ] [ ’ r e s u l t ’ ] = ( d [ ’ operand1 ’ ] * d [ ’ operand2 ’ ] ) % d [ ’ modulo ’ ]
156
157
158 # ##############################################################################
159 # #
160 # Division #
161 # #
162 # ##############################################################################
163
164 c l a s s DivData ( DataABC ) :
Appendix A. Test Data Python script 53
255 data . c a l c u l a t e ( )
256 i f args [ ’b ’ ] :
257 s h u t i l . copy ( d a t a F i l e , bkupFile )
258 with open ( o u t F i l e , ’w’ , newline= ’ ’ ) as f o u t :
259 i f csvOut :
260 data . writeCsv ( fout , outBase )
261 else :
262 data . writeC ( fout , outBase , o u t F i l e , cArrayName )
263 e x c e p t DataError as e :
264 p r i n t ( e , f i l e =sys . s t d e r r )
Appendix B
1 interface in_Registers ;
2 l o g i c [NUM_REGS−1 : 0 ] [WORD_WIDTH: 0 ] registers ;
3 l o g i c [WORD_WIDTH: 0 ] wData ;
4 logic [3 : 0] wReg ;
5 logic wEnable ;
6
7 modport s l a v e (
8 output r e g i s t e r s ,
9 in pu t wData ,
10 in pu t wReg ,
11 in pu t wEnable
12 );
13 modport master (
14 in pu t r e g i s t e r s ,
15 output wData ,
16 output wReg ,
17 output wEnable
18 );
19 endinterface
20
21 i n t e r f a c e in_OpModule ;
22 logic ready ;
23 logic error ;
24 logic valid ;
25 l o g i c [ 3 : 0 ] opcode ;
26 l o g i c [ 3 : 0 ] op1Reg ;
27 l o g i c [ 3 : 0 ] op2Reg ;
28 l o g i c [ 3 : 0 ] resReg ;
29
30 modport s l a v e (
31 output ready ,
32 output e r r o r ,
33 in pu t v a l i d ,
34 in pu t opcode ,
35 in pu t op1Reg ,
36 in pu t op2Reg ,
37 in pu t resReg
38 );
39 modport master (
40 in pu t ready ,
41 in pu t e r r o r ,
42 output v a l i d ,
43 output opcode ,
56 Appendix B. Internal Interfaces SV Code
44 output op1Reg ,
45 output op2Reg ,
46 output resReg
47 );
48 endinterface
L ISTING B.1: SystemVerilog code for the internal
interfaces of ECCo.
57
Appendix C
Test Data
74225698149877013133163669918490695756676765155849109751738796007550114900164 ,55228977
55228977394393414412853003502097247104908965897402951232160234933662925082798 ,45228977
74225698149877013133163669918490695756676765155849109751738796007550114900164 ,55228977
74225698149877013133163669918490695756676765155849109751738796007550114900164 ,65228977
74225698149877013133163669918490695756676765155849109751738796007550114900164 ,35289773
74225698149877013133163669918490695756676765155849109751738796007550114900164 ,95289773
74225698149877013133163669918490695756676765155849109751738796007550114900164 ,85289773
74225698149877013133163669918490695756676765155849109751738796007550114900164 ,45228977
74225698149877013133163669918490695756676765155849109751738796007550114900164 ,5522
55228977394393414412853003502097247104908965897402951232160234933662925082798 ,4522
74225698149877013133163669918490695756676765155849109751738796007550114900164 ,5522
74225698149877013133163669918490695756676765155849109751738796007550114900164 ,6522
74225698149877013133163669918490695756676765155849109751738796007550114900164 ,3528
74225698149877013133163669918490695756676765155849109751738796007550114900164 ,9528
74225698149877013133163669918490695756676765155849109751738796007550114900164 ,8528
74225698149877013133163669918490695756676765155849109751738796007550114900164 ,4522
operand1 , operand2 , r e s u l t
5 ,1 ,5
3 ,2 ,1
3 , − 4 ,0
75 ,77 ,0
567 ,895 ,0
567 , − 895 ,0
16578 ,19504 ,0
546500 ,357980 ,1
98275954794755497 ,12457956214 ,7888609
98275954794755497 , − 12457956214 , − 7888609
98275954794755497 ,92657924597654697 ,1
98275954794755497 ,97 ,1013154173141809
98275954794755497 , − 97 , − 1013154173141809
65245765798756497 ,70256423697 ,928680
55228977394654679572853003502097247104908965897402951232160234933662925082798 ,4128
65228977394654679572853003502097247104908965897402951232160234933662925082798 ,4128
3528977394654679572853003502097247104908965897402951232160234933662925082798 ,41285
9528977394654679572853003502097247104908965897402951232160234933662925082798 ,91285
8528977394654679572853003502097247104908965897402951232160234933662925082798 ,91285
45228977394393414412853003502097247104908965897402951232160234933662925082798 ,1329
Appendix D
ECCo C Wrapper
1 # i f n d e f ECC_H
2 # d e f i n e ECC_H
3
4 /* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
5 * *
6 * I n t e r n a l e c c . h macros *
7 * *
8 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
9
10 // Coprocessor number o f t h e ECCo
11 # d e f i n e __ECC_COPROC " p0 "
12
13 /* * * * * * * * * * * * *
14 * Opcodes *
15 * * * * * * * * * * * * */
16
17 // A r i t h m e t i c
18 # d e f i n e __ECC_OPC1_MUL " 0 x0 "
19 # d e f i n e __ECC_OPC1_ADD " 0 x1 "
20 # d e f i n e __ECC_OPC1_DIV " 0 x2 "
21 # d e f i n e __ECC_OPC1_NEG " 0 x3 "
22 // L o g i c a l
23 # d e f i n e __ECC_OPC1_LOG " 0xd "
24 # d e f i n e __ECC_OPC2_OR " 0 x0 "
25 # d e f i n e __ECC_OPC2_AND " 0 x1 "
26 # d e f i n e __ECC_OPC2_XOR " 0 x2 "
27 # d e f i n e __ECC_OPC2_NOT " 0 x3 "
28 // S h i f t
29 # d e f i n e __ECC_OPC1_SFT " 0 xe "
30 # d e f i n e __ECC_OPC2_LSL " 0 x0 "
31 # d e f i n e __ECC_OPC2_LSR " 0 x1 "
32 # d e f i n e __ECC_OPC2_ASR " 0 x2 "
33 // Comparison
34 # d e f i n e __ECC_OPC1_CMP " 0 xf "
35 # d e f i n e __ECC_OPC2_ZR " 0 x0 "
36 # d e f i n e __ECC_OPC2_NZR " 0 x1 "
37 # d e f i n e __ECC_OPC2_EQ " 0 x2 "
38 # d e f i n e __ECC_OPC2_NEQ " 0 x3 "
39 # d e f i n e __ECC_OPC2_LT " 0 x4 "
40 # d e f i n e __ECC_OPC2_GT " 0 x5 "
41 // M i s c e l l a n e o u s
42 # d e f i n e __ECC_OPC1_INC " 0 xa "
43 # d e f i n e __ECC_OPC1_DEC " 0xb "
44 # d e f i n e __ECC_OPC1_SSB " 0 xc "
45 # d e f i n e __ECC_OPC2_SSB " 0 x0 "
46 # d e f i n e __ECC_OPC1_USB " 0 xc "
47 # d e f i n e __ECC_OPC2_USB " 0 x1 "
48
49
50 /* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
51 * *
52 * Exported e c c . h macros *
53 * *
54 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
60 Appendix D. ECCo C Wrapper
55
56 # i f n d e f NULL
57 # d e f i n e NULL ( ( void * ) 0 )
58 # endif
59
60 /* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
61 * Coprocessor i n t e r f a c e meta *
62 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
63
64 # define ECC_OP1_WIDTH 4
65 # define ECC_OP1_MAX 15
66 # define ECC_OP2_WIDTH 3
67 # define ECC_OP2_MAX 7
68 # define ECC_REG_IDX_WIDTH 4
69 # define ECC_REG_IDX_MAX 15
70 # define ECC_WORD_WIDTH 256
71 # define ECC_WORD_WIDTH_BYTE (ECC_WORD_WIDTH/8)
72 # define ECC_MODULO_REG " 14 "
73 # define ECC_STATUS_REG " 15 "
74
75 /* * * * * * * * * * * * * * * * * * * * * * * * * * *
76 * Arithmetic operations *
77 * * * * * * * * * * * * * * * * * * * * * * * * * * */
78
79 // A l l arguments a r e c o p r o c e s s o r r e g i s t e r indexes , which must be i n t e g e r s
i n double quotes .
80 # d e f i n e ECC_MUL( op1Reg , op2Reg , resReg ) asm v o l a t i l e ( " cdp "__ECC_COPROC" ,
# "__ECC_OPC1_MUL" , c r " op2Reg " , c r " op1Reg " , c r " resReg " , #0 " )
81 # d e f i n e ECC_ADD( op1Reg , op2Reg , resReg ) asm v o l a t i l e ( " cdp "__ECC_COPROC" ,
# "__ECC_OPC1_ADD" , c r " op2Reg " , c r " op1Reg " , c r " resReg " , #0 " )
82 # d e f i n e ECC_DIV ( op1Reg , op2Reg , resReg ) asm v o l a t i l e ( " cdp "__ECC_COPROC" ,
# " __ECC_OPC1_DIV " , c r " op2Reg " , c r " op1Reg " , c r " resReg " , #0 " )
83 # d e f i n e ECC_NEG( opReg , resReg ) asm v o l a t i l e ( " cdp "__ECC_COPROC" ,
# "__ECC_OPC1_NEG" , cr0 , c r " opReg " , c r " resReg " , #0 " )
84
85
86 /* * * * * * * * * * * * * * * * * * * * * * * *
87 * Logical operations *
88 * * * * * * * * * * * * * * * * * * * * * * * */
89
90 // A l l arguments a r e c o p r o c e s s o r r e g i s t e r indexes , which must be i n t e g e r s
i n double quotes .
91 # d e f i n e ECC_OR( op1Reg , op2Reg , resReg ) asm v o l a t i l e ( " cdp "__ECC_COPROC" ,
# "__ECC_OPC1_LOG" , c r " op2Reg " , c r " op1Reg " , c r " resReg " , # "__ECC_OPC2_OR
)
92 # d e f i n e ECC_AND( op1Reg , op2Reg , resReg ) asm v o l a t i l e ( " cdp "__ECC_COPROC" ,
# "__ECC_OPC1_LOG" , c r " op2Reg " , c r " op1Reg " , c r " resReg " , # "
__ECC_OPC2_AND)
93 # d e f i n e ECC_XOR( op1Reg , op2Reg , resReg ) asm v o l a t i l e ( " cdp "__ECC_COPROC" ,
# "__ECC_OPC1_LOG" , c r " op2Reg " , c r " op1Reg " , c r " resReg " , # "
__ECC_OPC2_XOR )
94 # d e f i n e ECC_NOT( opReg , resReg ) asm v o l a t i l e ( " cdp "__ECC_COPROC" ,
# "__ECC_OPC1_LOG" , cr0 , c r " op1Reg " , c r " resReg " , # "
__ECC_OPC2_NOT )
95
96
97 /* * * * * * * * * * * * * * * * * * * * * *
98 * S h i f t operations *
99 * * * * * * * * * * * * * * * * * * * * * */
100
101 // A l l arguments a r e c o p r o c e s s o r r e g i s t e r indexes , which must be i n t e g e r s
i n double quotes .
102 # d e f i n e ECC_LSL ( op1Reg , op2Reg , resReg ) asm v o l a t i l e ( " cdp "__ECC_COPROC" ,
# " __ECC_OPC1_SFT " , c r " op2Reg " , c r " op1Reg " , c r " resReg " , # " __ECC_OPC2_LSL
)
103 # d e f i n e ECC_LSR ( op1Reg , op2Reg , resReg ) asm v o l a t i l e ( " cdp "__ECC_COPROC" ,
# " __ECC_OPC1_SFT " , c r " op2Reg " , c r " op1Reg " , c r " resReg " , # " __ECC_OPC2_LSR
)
104 # d e f i n e ECC_ASR( op1Reg , op2Reg , resReg ) asm v o l a t i l e ( " cdp "__ECC_COPROC" ,
# " __ECC_OPC1_SFT " , c r " op2Reg " , c r " op1Reg " , c r " resReg " , # " __ECC_OPC2_ASR
)
105
Appendix D. ECCo C Wrapper 61
106
107 /* * * * * * * * * * * * * * * * * * * * * * * * * * *
108 * Comparison o p e r a t i o n s *
109 * * * * * * * * * * * * * * * * * * * * * * * * * * */
110
111 // A l l arguments a r e c o p r o c e s s o r r e g i s t e r indexes , which must be i n t e g e r s
i n double quotes .
112 # d e f i n e ECC_ZR( reg ) asm v o l a t i l e ( " cdp "__ECC_COPROC" , # "
__ECC_OPC1_CMP " , cr0 , c r " reg " , cr0 , # " __ECC_OPC2_ZR )
113 # d e f i n e ECC_NZR( reg ) asm v o l a t i l e ( " cdp "__ECC_COPROC" , # "
__ECC_OPC1_CMP " , cr0 , c r " reg " , cr0 , # "__ECC_OPC2_NZR )
114 # d e f i n e ECC_EQ( op1Reg , op2Reg ) asm v o l a t i l e ( " cdp "__ECC_COPROC" , # "
__ECC_OPC1_CMP " , c r " op2Reg " , c r " op1Reg " , cr0 , # "__ECC_OPC2_EQ )
115 # d e f i n e ECC_NEQ( op1Reg , op2Reg ) asm v o l a t i l e ( " cdp "__ECC_COPROC" , # "
__ECC_OPC1_CMP " , c r " op2Reg " , c r " op1Reg " , cr0 , # "__ECC_OPC2_NEQ )
116 # d e f i n e ECC_LT ( op1Reg , op2Reg ) asm v o l a t i l e ( " cdp "__ECC_COPROC" , # "
__ECC_OPC1_CMP " , c r " op2Reg " , c r " op1Reg " , cr0 , # " __ECC_OPC2_LT )
117 # d e f i n e ECC_GT( op1Reg , op2Reg ) asm v o l a t i l e ( " cdp "__ECC_COPROC" , # "
__ECC_OPC1_CMP " , c r " op2Reg " , c r " op1Reg " , cr0 , # " __ECC_OPC2_GT )
118
119
120 /* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
121 * Miscellaneous operations *
122 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
123
124 // A l l arguments a r e c o p r o c e s s o r r e g i s t e r indexes , which must be i n t e g e r s
i n double quotes .
125 # d e f i n e ECC_INC ( opReg , resReg ) asm v o l a t i l e ( " cdp "__ECC_COPROC" , # "
__ECC_OPC1_INC " , cr0 , c r " opReg " , c r " resReg " , #0 " )
126 # d e f i n e ECC_DEC( opReg , resReg ) asm v o l a t i l e ( " cdp "__ECC_COPROC" , # "
__ECC_OPC1_DEC " , cr0 , c r " opReg " , c r " resReg " , #0 " )
127 # d e f i n e ECC_SSB ( reg ) asm v o l a t i l e ( " cdp "__ECC_COPROC" , # "
__ECC_OPC1_SSB " , cr0 , c r " reg " , cr0 , # " __ECC_OPC2_SSB )
128 # d e f i n e ECC_USB ( reg ) asm v o l a t i l e ( " cdp "__ECC_COPROC" , # "
__ECC_OPC1_USB " , cr0 , c r " reg " , cr0 , # " __ECC_OPC2_USB )
129
130
131 /* * * * * * * * * * * * * * * * * * * * * * * * * *
132 * Data t r a n s f e r macros *
133 * * * * * * * * * * * * * * * * * * * * * * * * * */
134
135 / * Load c o p r o c e s s o r r e g i s t e r macros . O f f s e t i s i n hexa . ’ reg ’ i s a
coprocessor
136 r e g i s t e r index and must be a decimal i n t e g e r i n double quotes . ’ Rt ’ and
’ Rt2 ’ a r e
137 32− b i t i np ut v a r i a b l e s . * /
138 # d e f i n e ECC_LOAD_0( Rt , Rt2 , reg ) asm v o l a t i l e ( " mcrr "__ECC_COPROC" , #0 x0 ,
%0, %1, c r " reg : : "rm" ( Rt ) , "rm" ( Rt2 ) )
139 # i f ECC_WORD_WIDTH > 64
140 # d e f i n e ECC_LOAD_1( Rt , Rt2 , reg ) asm v o l a t i l e ( " mcrr "__ECC_COPROC" , #0 x1 ,
%0, %1, c r " reg : : "rm" ( Rt ) , "rm" ( Rt2 ) )
141 # else
142 # d e f i n e ECC_LOAD_1( Rt , Rt2 , reg )
143 # endif
144 # i f ECC_WORD_WIDTH > 128
145 # d e f i n e ECC_LOAD_2( Rt , Rt2 , reg ) asm v o l a t i l e ( " mcrr "__ECC_COPROC" , #0 x2 ,
%0, %1, c r " reg : : "rm" ( Rt ) , "rm" ( Rt2 ) )
146 # else
147 # d e f i n e ECC_LOAD_2( Rt , Rt2 , reg )
148 # endif
149 # i f ECC_WORD_WIDTH > 192
150 # d e f i n e ECC_LOAD_3( Rt , Rt2 , reg ) asm v o l a t i l e ( " mcrr "__ECC_COPROC" , #0 x3 ,
%0, %1, c r " reg : : "rm" ( Rt ) , "rm" ( Rt2 ) )
151 # else
152 # d e f i n e ECC_LOAD_3( Rt , Rt2 , reg )
153 # endif
154 # i f ECC_WORD_WIDTH > 256
155 # d e f i n e ECC_LOAD_4( Rt , Rt2 , reg ) asm v o l a t i l e ( " mcrr "__ECC_COPROC" , #0 x4 ,
%0, %1, c r " reg : : "rm" ( Rt ) , "rm" ( Rt2 ) )
156 # else
157 # d e f i n e ECC_LOAD_4( Rt , Rt2 , reg )
158 # endif
62 Appendix D. ECCo C Wrapper
Appendix E
1 # i f n d e f ECC_WORD_H
2 # d e f i n e ECC_WORD_H
3
4 # i n c l u d e < s t d b o o l . h>
5
6 # include " ecc . h"
7
8 / * Length o f a r r a y i n word s t r u c t . Define here i n s t e a d o f e c c . h s i n c e i t
depends
9 on a r r a y type . * /
10 # d e f i n e EW_LENGTH (ECC_WORD_WIDTH_BYTE/ s i z e o f ( i n t ) )
11
12 / * +4 t o f i t t e r m i n a t i n g ’ \ 0 ’ , l e a d i n g ’ 0 b ’ and o p t i o n a l ’ − ’ s i g n . * /
13 # d e f i n e EW_STR_LENGTH ECC_WORD_WIDTH+4
14
15 / * ecc_word i s t h e d a t a t y p e t o work with b i g numbers width t h e same width
as
16 t h e ECC c o p r o c e s s o r s word s i z e . * /
17 typedef s t r u c t {
18 i n t word [EW_LENGTH ] ;
19 bool i s _ z e r o ;
20 bool i s _ n e g a t i v e ;
21 } ecc_word_t ;
22
23 / * S t r i n g −type b i g enough t o r e p r e s e n t any number on e i t h e r
24 binary , decimal or hexadecimal format . * /
25 t y p e d e f char e w _ s t r _ t [EW_STR_LENGTH ] ;
26
27 / * I n i t i a l i z e s a ecc_word . Returns a p o i n t e r t o t h e given word . * /
28 ecc_word_t * e w _ i n i t ( ecc_word_t * ) ;
29
30 / * C r e a t e s a new copy o f an ecc_word . Returns a p o i n t e r t o d s t . * /
31 ecc_word_t * ew_copy ( c o n s t ecc_word_t * r e s t r i c t s r c , ecc_word_t * r e s t r i c t
dst ) ;
32
33
34 /* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
35 * *
36 * Content h a n d l e r s *
37 * *
38 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
39
40 / * S e t s t h e c o n t e n t o f a ecc_word t o 0 . Returns a p o i n t e r t o t h e given word
. */
41 ecc_word_t * ew_zero ( ecc_word_t * ) ;
42
43 / * S e t t h e value t o an i n t e g e r value . * /
44 ecc_word_t * e w _ s e t _ i n t ( ecc_word_t * , i n t ) ;
45
46 / * S e t t h e value o f a word t o a number r e p r e s e n t e d by a s t r i n g i n
hexadecimal
47 ( 0 x p r e f i x ) format . Return a p o i n t e r t o t h e word , or NULL on f a i l u r e . * /
48 ecc_word_t * e w _ s e t _ s t r ( ecc_word_t * , c o n s t char [ ] ) ;
49
66 Appendix E. ECCo Big Number library
56 w−>i s _ z e r o = f a l s e ;
57 r e t u r n w;
58 }
59
60 ecc_word_t *
61 e w _ s e t _ s t r ( ecc_word_t * w, c o n s t char s t r [ ] )
62 {
63 int s h i f t , tmp ;
64 i n t * num = w−>word ;
65 c o n s t char * c ;
66
67 f o r ( c = s t r ; * c ! = ’ \0 ’ ; c++ )
68 ;
69
70 / * Check s i g n * /
71 i f ( * s t r == ’− ’ ) {
72 w−>i s _ n e g a t i v e = t r u e ;
73 s t r ++;
74 }
75 else
76 w−>i s _ n e g a t i v e = f a l s e ;
77
78 / * S a n i t y checks * /
79 i f ( * s t r ++ ! = ’ 0 ’ ) {
80 MSG( ( " e w _ s e t _ s t r : badly f o r m a t t e d s t r i n g , must s t a r t with ’ 0 x ’ or
’ − 0x ’\n " ) ) ;
81 r e t u r n NULL;
82 }
83 i f ( * s t r != ’ x ’ ) {
84 MSG( ( " e w _ s e t _ s t r : badly f o r m a t t e d s t r i n g , must s t a r t with ’ 0 x ’ or
’ − 0x ’\n " ) ) ;
85 r e t u r n NULL;
86 }
87
88 / * S e t word t o zero i f non−zero * /
89 i f ( !w−>i s _ z e r o ) {
90 do
91 * num = 0 ;
92 while ( ++num ! = w−>word+EW_LENGTH ) ;
93 w−>i s _ z e r o = t r u e ;
94 num = w−>word ;
95 }
96
97 do {
98 tmp = 0 ;
99 f o r ( s h i f t = 0 ; s h i f t < 32 && −−c ! = s t r ; s h i f t += 4 ) {
100 s w i t ch ( * c ) {
101 case ’ f ’ : case ’F ’ :
102 tmp ^= 0 x f << s h i f t ;
103 break ;
104 case ’ e ’ : case ’E ’ :
105 tmp ^= 0 xe << s h i f t ;
106 break ;
107 c a s e ’ d ’ : c a s e ’D ’ :
108 tmp ^= 0xd << s h i f t ;
109 break ;
110 c a s e ’ c ’ : c a s e ’C ’ :
111 tmp ^= 0 xc << s h i f t ;
112 break ;
113 case ’b ’ : case ’B ’ :
114 tmp ^= 0xb << s h i f t ;
115 break ;
116 c a s e ’ a ’ : c a s e ’A ’ :
117 tmp ^= 0 xa << s h i f t ;
118 break ;
119 default :
120 i f ( * c < ’ 0 ’ && * c > ’ 9 ’ ) {
121 MSG( ( " e w _ s e t _ s t r : i n v a l i d c h a r a c t e r i n s t r i n g : %c " , * c )
);
122 r e t u r n NULL;
123 }
124 tmp ^= ( * c − ’ 0 ’ ) << s h i f t ;
Appendix E. ECCo Big Number library 71
125 }
126 }
127 i f ( tmp && w−>i s _ z e r o )
128 w−>i s _ z e r o = f a l s e ;
129 * num = tmp ;
130 } while ( c ! = s t r && ++num ! = w−>word+EW_LENGTH ) ;
131
132 r e t u r n w;
133 }
134
135 ecc_word_t *
136 e w _ s e t _ o f f s ( ecc_word_t * w, i n t o f f s , i n t r1 , i n t r 2 )
137 {
138 i f ( w−>i s _ z e r o )
139 i f ( r 1 || r 2 )
140 w−>i s _ z e r o = f a l s e ;
141 o f f s *= 2 ;
142 w−>word [ o f f s ] = r1 ;
143 w−>word [ o f f s +1] = r 2 ;
144 r e t u r n w;
145 }
146
147 char *
148 e w _ t o _ s t r ( c o n s t ecc_word_t * w, char s [ ] , i n t sz )
149 {
150 int i = 0, shift ;
151 const i n t * num = w−>word+EW_LENGTH;
152 unsigned char tmp ;
153
154 i f ( sz < 4 ) {
155 MSG( ( " e w _ t o _ s t r : too s m a l l s t r i n g : sz = %d\n " , sz ) ) ;
156 r e t u r n NULL;
157 }
158 i f ( w−>i s _ n e g a t i v e )
159 s [ i ++] = ’− ’ ;
160 s [ i ++] = ’ 0 ’ ;
161 s [ i ++] = ’ x ’ ;
162
163 while ( i < sz && num−− ! = w−>word )
164 f o r ( s h i f t = 2 8 ; s h i f t >= 0 && i < sz ; s h i f t −= 4 , i ++ )
165 s w i tc h ( ( tmp = ( * num >> s h i f t ) & 0 x f ) ) {
166 case 0 xf :
167 s[i] = ’f ’;
168 break ;
169 c a s e 0 xe :
170 s [ i ] = ’e ’ ;
171 break ;
172 c a s e 0xd :
173 s [ i ] = ’d ’ ;
174 break ;
175 c a s e 0 xc :
176 s[ i ] = ’c ’ ;
177 break ;
178 c a s e 0xb :
179 s [ i ] = ’b ’ ;
180 break ;
181 c a s e 0 xa :
182 s [ i ] = ’a ’ ;
183 break ;
184 default :
185 s [ i ] = ( tmp > 9 ) ? ’X ’ : tmp + ’ 0 ’ ;
186 }
187
188 i f ( i < sz )
189 s [ i ] = ’ \0 ’ ;
190 else {
191 MSG( ( " e w _ t o _ s t r : too s m a l l s t r i n g : sz = %d\n " , sz ) ) ;
192 r e t u r n NULL;
193 }
194 return s ;
195 }
196
72 Appendix E. ECCo Big Number library
197
198 /* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
199 * *
200 * Comparison *
201 * *
202 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
203
204 bool
205 ew_eq ( c o n s t ecc_word_t * l h s , c o n s t ecc_word_t * r hs )
206 {
207 c o n s t i n t * lw = l h s −>word+EW_LENGTH;
208 c o n s t i n t * rw = rhs −>word+EW_LENGTH;
209
210 i f ( l h s −>i s _ z e r o && rhs −>i s _ z e r o )
211 return true ;
212 while ( *−−lw == *−−rw )
213 i f ( lw == l h s −>word )
214 return true ;
215 return f a l s e ;
216 }
217
218 /* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
219 * *
220 * Coprocessor load *
221 * *
222 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
223
224 # d e f i n e _EW_LOAD_CR(N) void ew_load_cr ##N( c o n s t ecc_word_t * w) { \
225 v o l a t i l e r e g i s t e r i n t r1 , r 2 ; \
226 /* O f f s e t 0 */ \
227 EW_GET_0 ( r1 , r2 , w) ; \
228 ECC_LOAD_0( r1 , r2 , #N) ; \
229 /* O f f s e t 1 */ \
230 EW_GET_1 ( r1 , r2 , w) ; \
231 ECC_LOAD_1( r1 , r2 , #N) ; \
232 /* O f f s e t 2 */ \
233 EW_GET_2 ( r1 , r2 , w) ; \
234 ECC_LOAD_2( r1 , r2 , #N) ; \
235 /* O f f s e t 3 */ \
236 EW_GET_3 ( r1 , r2 , w) ; \
237 ECC_LOAD_3( r1 , r2 , #N) ; \
238 /* O f f s e t 4 */ \
239 EW_GET_4 ( r1 , r2 , w) ; \
240 ECC_LOAD_4( r1 , r2 , #N) ; \
241 /* O f f s e t 5 */ \
242 EW_GET_5 ( r1 , r2 , w) ; \
243 ECC_LOAD_5( r1 , r2 , #N) ; \
244 /* O f f s e t 6 */ \
245 EW_GET_6 ( r1 , r2 , w) ; \
246 ECC_LOAD_6( r1 , r2 , #N) ; \
247 /* O f f s e t 7 */ \
248 EW_GET_7 ( r1 , r2 , w) ; \
249 ECC_LOAD_7( r1 , r2 , #N) ; \
250 /* O f f s e t 8 */ \
251 EW_GET_8 ( r1 , r2 , w) ; \
252 ECC_LOAD_8( r1 , r2 , #N) ; \
253 /* O f f s e t 9 */ \
254 EW_GET_9 ( r1 , r2 , w) ; \
255 ECC_LOAD_9( r1 , r2 , #N) ; \
256 /* O f f s e t a */ \
257 EW_GET_10 ( r1 , r2 , w) ; \
258 ECC_LOAD_10 ( r1 , r2 , #N) ; \
259 /* O f f s e t b */ \
260 EW_GET_11 ( r1 , r2 , w) ; \
261 ECC_LOAD_11 ( r1 , r2 , #N) ; \
262 /* O f f s e t c */ \
263 EW_GET_12 ( r1 , r2 , w) ; \
264 ECC_LOAD_12 ( r1 , r2 , #N) ; \
265 /* O f f s e t d */ \
266 EW_GET_13 ( r1 , r2 , w) ; \
267 ECC_LOAD_13 ( r1 , r2 , #N) ; \
268 /* O f f s e t e */ \
Appendix E. ECCo Big Number library 73
269 EW_GET_14 ( r1 , r2 , w) ; \
270 ECC_LOAD_14 ( r1 , r2 , #N) ; \
271 /* O f f s e t f */ \
272 EW_GET_15 ( r1 , r2 , w) ; \
273 ECC_LOAD_15 ( r1 , r2 , #N) ; \
274 \
275 i f ( w−>i s _ n e g a t i v e ) / * S e t signed b i t i f n e g a t i v e * / \
276 ECC_NEG( #N, #N) ; \
277 e l s e / * E l s e make s ur e i t ’ s unset * / \
278 ECC_USB( #N) ; \
279 }
280
281 _EW_LOAD_CR( 0 )
282 _EW_LOAD_CR( 1 )
283 _EW_LOAD_CR( 2 )
284 _EW_LOAD_CR( 3 )
285 _EW_LOAD_CR( 4 )
286 _EW_LOAD_CR( 5 )
287 _EW_LOAD_CR( 6 )
288 _EW_LOAD_CR( 7 )
289 _EW_LOAD_CR( 8 )
290 _EW_LOAD_CR( 9 )
291 _EW_LOAD_CR( 1 0 )
292 _EW_LOAD_CR( 1 1 )
293 _EW_LOAD_CR( 1 2 )
294 _EW_LOAD_CR( 1 3 )
295 _EW_LOAD_CR( 1 4 )
296
297
298 /* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
299 * *
300 * Coprocessor s t o r e *
301 * *
302 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
303
304 # d e f i n e _EW_STORE_CR(N) void e w _ s t o r e _ c r ##N( ecc_word_t * w) { \
305 r e g i s t e r i n t r1 , r 2 ; \
306 unsigned mask ; \
307 \
308 / * Check s i g n * / \
309 ECC_STORE_0 ( r1 , r2 , ECC_STATUS_REG ) ; \
310 mask = 1 << ( 0 x10 + N) ; \
311 i f ( r 1 & mask ) { \
312 w−>i s _ n e g a t i v e = t r u e ; \
313 ECC_NEG( #N, #N) ; \
314 } \
315 else \
316 w−>i s _ n e g a t i v e = f a l s e ; \
317 \
318 w−>i s _ z e r o = t r u e ; \
319 /* O f f s e t 0 */ \
320 ECC_STORE_0 ( r1 , r2 , #N) ; \
321 EW_SET_0 ( r1 , r2 , w) ; \
322 /* O f f s e t 1 */ \
323 ECC_STORE_1 ( r1 , r2 , #N) ; \
324 EW_SET_1 ( r1 , r2 , w) ; \
325 /* O f f s e t 2 */ \
326 ECC_STORE_2 ( r1 , r2 , #N) ; \
327 EW_SET_2 ( r1 , r2 , w) ; \
328 /* O f f s e t 3 */ \
329 ECC_STORE_3 ( r1 , r2 , #N) ; \
330 EW_SET_3 ( r1 , r2 , w) ; \
331 /* O f f s e t 4 */ \
332 ECC_STORE_4 ( r1 , r2 , #N) ; \
333 EW_SET_4 ( r1 , r2 , w) ; \
334 /* O f f s e t 5 */ \
335 ECC_STORE_5 ( r1 , r2 , #N) ; \
336 EW_SET_5 ( r1 , r2 , w) ; \
337 /* O f f s e t 6 */ \
338 ECC_STORE_6 ( r1 , r2 , #N) ; \
339 EW_SET_6 ( r1 , r2 , w) ; \
340 /* O f f s e t 7 */ \
74 Appendix E. ECCo Big Number library
413 /* O f f s e t 6 */
414 ECC_STORE_6 ( r1 , r2 , " 15 " ) ;
415 EW_SET_6 ( r1 , r2 , w) ;
416 /* O f f s e t 7 */
417 ECC_STORE_7 ( r1 , r2 , " 15 " ) ;
418 EW_SET_7 ( r1 , r2 , w) ;
419 /* O f f s e t 8 */
420 ECC_STORE_8 ( r1 , r2 , " 15 " ) ;
421 EW_SET_8 ( r1 , r2 , w) ;
422 /* O f f s e t 9 */
423 ECC_STORE_9 ( r1 , r2 , " 15 " ) ;
424 EW_SET_9 ( r1 , r2 , w) ;
425 / * O f f s e t 10 * /
426 ECC_STORE_10 ( r1 , r2 , " 15 " ) ;
427 EW_SET_10 ( r1 , r2 , w) ;
428 / * O f f s e t 11 * /
429 ECC_STORE_11 ( r1 , r2 , " 15 " ) ;
430 EW_SET_11 ( r1 , r2 , w) ;
431 / * O f f s e t 12 * /
432 ECC_STORE_12 ( r1 , r2 , " 15 " ) ;
433 EW_SET_12 ( r1 , r2 , w) ;
434 / * O f f s e t 13 * /
435 ECC_STORE_13 ( r1 , r2 , " 15 " ) ;
436 EW_SET_13 ( r1 , r2 , w) ;
437 / * O f f s e t 14 * /
438 ECC_STORE_14 ( r1 , r2 , " 15 " ) ;
439 EW_SET_14 ( r1 , r2 , w) ;
440 / * O f f s e t 15 * /
441 ECC_STORE_15 ( r1 , r2 , " 15 " ) ;
442 EW_SET_15 ( r1 , r2 , w) ;
443 }
Appendix F
1
2 /* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
3 * *
4 * C o n t r o l macros *
5 * *
6 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
7
8 // # d e f i n e ONLY_HELLOW / * Only run a simple h e l l o world * /
9
10 /* Testing c o n t r o l macros * /
11 // # define TEST_ARI / * T e s t a r i t h m e t i c module * /
12 // # define TEST_ARI_NOADD / * Skip a d d i t i o n during a r i t h m e t i c t e s t i n g * /
13 // # define TEST_ARI_NOMOD / * Skip m u l t i p l i c a t i o n during a r i t h m e t i c t e s t i n g
*/
14 // # d e f i n e TEST_ARI_NODIV / * Skip d i v i s i o n during a r i t h m e t i c t e s t i n g * /
15 // # d e f i n e TEST_ARI_NONEG / * Skip n e g a t i o n during a r i t h m e t i c t e s t i n g * /
16 // # d e f i n e TEST_REGS / * T e s t r e g i s t e r bank reading/ w r i t i n g * /
17
18 / * Benchmarking c o n t r o l macros * /
19 # d e f i n e BENCHMARK / * D i s a b l e anything but t h e
benchmarking code * /
20 // # d e f i n e BENCHMARK_ECC_ADDITION / * Perform a d d i t i o n s with ECCo
with minimal e x t r a code * /
21 // # d e f i n e BENCHMARK_ANSSI_ADDITION / * Perform a d d i t i o n s with ANSSI
l i b with minimal e x t r a code * /
22 // # d e f i n e BENCHMARK_ECC_MULTIPLICATION / * Perform m u l t i p l i c a t i o n with
ECCo with minimal e x t r a code * /
23 # d e f i n e BENCHMARK_ANSSI_MULTIPLICATION / * Perform m u l t i p l i c a t i o n with
ANSSI l i b with minimal e x t r a code * /
24 // # d e f i n e BENCHMARK_ITERATIONS 1 / * Number o f i t e r a t i o n s during
benchmarking * /
25 // # d e f i n e BENCHMARK_ITERATIONS 10 / * Number o f i t e r a t i o n s during
benchmarking * /
26 # d e f i n e BENCHMARK_ITERATIONS 100 / * Number o f i t e r a t i o n s during
benchmarking * /
27
28 / * ANSSI l i b e c c c o n t r o l macros * /
29 # d e f i n e ANSSI_LIBECC
30
31 / * S a n i t y checks o f macros * /
32 # i f ( d e f i n e d (BENCHMARK_ECC_ADDITION) && ( d e f i n e d (
BENCHMARK_ANSSI_ADDITION) || d e f i n e d (BENCHMARK_ECC_MULTIPLICATION) ||
d e f i n e d (BENCHMARK_ANSSI_MULTIPLICATION) ) ) || \
33 ( d e f i n e d (BENCHMARK_ANSSI_ADDITION) && ( d e f i n e d (
BENCHMARK_ECC_ADDITION) || d e f i n e d (BENCHMARK_ECC_MULTIPLICATION) ||
d e f i n e d (BENCHMARK_ANSSI_MULTIPLICATION) ) ) || \
34 ( d e f i n e d (BENCHMARK_ECC_MULTIPLICATION) && ( d e f i n e d (
BENCHMARK_ANSSI_ADDITION) || d e f i n e d (BENCHMARK_ECC_ADDITION) ||
d e f i n e d (BENCHMARK_ANSSI_MULTIPLICATION) ) ) || \
35 ( d e f i n e d (BENCHMARK_ANSSI_MULTIPLICATION) && ( d e f i n e d (
BENCHMARK_ANSSI_ADDITION) || d e f i n e d (BENCHMARK_ECC_MULTIPLICATION) ||
d e f i n e d (BENCHMARK_ECC_ADDITION) ) )
36 # e r r o r ( " Only one BENCHMARK_ macro can be d e f i n e d a t a time " )
37 # endif
78 Appendix F. Benchmark & Test program
38
39 # i f ( d e f i n e d (BENCHMARK_ANSSI_ADDITION) || d e f i n e d (
BENCHMARK_ANSSI_MULTIPLICATION) ) && ! d e f i n e d ( ANSSI_LIBECC )
40 # e r r o r ( " ANSSI_LIBECC must be d e f i n e d f o r ANSSI benchmarks " )
41 # endif
42
43
44 /* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
45 * *
46 * Includes *
47 * *
48 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
49
50 / * ARM CM33 * /
51 # i n c l u d e <arm_cmse . h>
52 # i n c l u d e <cm4ss . h>
53 # i n c l u d e < e e _ p r i n t f . h>
54 # i n c l u d e <cm33/ s e c u r e / t r u s t z o n e _ u t i l . h>
55
56 /* s t d l i b */
57 # i n c l u d e < s t d b o o l . h>
58 # i n c l u d e < s t r i n g . h>
59
60 / * Coprocessor * /
61 # include " ecc . h"
62 # i n c l u d e " ecc_word . h "
63 # include " division_data . h"
64 # i n c l u d e " modular_addition_data . h "
65 # include " modular_multiplication_data . h"
66
67 / * ANSSI l i b e c c * /
68 # i f d e f ANSSI_LIBECC
69 # include " l i b a r i t h . h"
70 # endif
71
72
73 /* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
74 * *
75 * G l o b a l s /Macros *
76 * *
77 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
78
79 / * TZ_START_NS : S t a r t address o f non−s e c u r e a p p l i c a t i o n * /
80 # i f n d e f TZ_START_NS
81 # d e f i n e TZ_START_NS ( 0 x80000U )
82 # endif
83
84 # d e f i n e CPACR_ADDR ( ( unsigned * ) 0xE000ED88U )
85
86
87 /* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
88 * *
89 * T e s t setup *
90 * *
91 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
92
93 /* A r i t h m e t i c t e s t f u n c t i o n s */
94 bool t e s t _ a r i _ m u l t i p l i c a t i o n ( char ( * ) [DATAMUL16_NUM_HEADERS] [
DATAMUL16_NUM_CHARS+ 1 ] ) ;
95 bool t e s t _ a r i _ a d d i t i o n ( char ( * ) [DATAADD16_NUM_HEADERS] [DATAADD16_NUM_CHARS
+1]) ;
96 bool t e s t _ a r i _ d i v i s i o n ( char ( * ) [DATADIV16_NUM_HEADERS ] [ DATADIV16_NUM_CHARS
+1]) ;
97
98 / * ANSSI l i b e c c h e l p e r s * /
99 # i f d e f ANSSI_LIBECC
100 s t a t i c void nn_import_from_hexbuf ( nn_t out_nn , c o n s t char * hbuf , u32
hbuflen ) ;
101 # endif
102
103 / * Benchmark value s t r i n g s * /
Appendix F. Benchmark & Test program 79
230 /* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
231 * Benchmark modular m u l t i p l i c a t i o n i n s o f t w a r e *
232 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
233
234 # i f d e f BENCHMARK_ANSSI_MULTIPLICATION
235 nn nn_op1 , nn_op2 , nn_mod ;
236 fp fp_op1 , fp_op2 ;
237 f p _ c t x f p _ c t x ; /* F i n i t e f i e l d c o n t e x t − s i z e o f f i e l d e t c . */
238 / * I n i t i a l i z e and s e t parameter v a l u e s * /
239 n n _ i n i t _ f r o m _ b u f (&nn_op1 , mul_op1_buf , BM_BUF_LEN) ;
240 n n _ i n i t _ f r o m _ b u f (&nn_op2 , mul_op2_buf , BM_BUF_LEN) ;
241 n n _ i n i t _ f r o m _ b u f (&nn_mod , mul_mod_buf , BM_BUF_LEN) ;
242 f p _ c t x _ i n i t _ f r o m _ p (& f p _ c t x , &nn_mod ) ;
243 f p _ i n i t (&fp_op1 , &f p _ c t x ) ;
244 f p _ i n i t (&fp_op2 , &f p _ c t x ) ;
245 fp_op1 . f p _ v a l = nn_op1 ;
246 fp_op2 . f p _ v a l = nn_op2 ;
247 / * Perform N number o f a d d i t i o n s * /
248 f o r ( i n t i = 0 ; i < BENCHMARK_ITERATIONS ; ++ i )
249 fp_mul(&fp_op1 , &fp_op1 , &fp_op2 ) ;
250 # endif
251
252 # endif
253
254 # i f n d e f BENCHMARK
255 MSG( ( " >>>>>>>> F i n i s h e d ECC firmware t e s t . \ n\n " ) ) ;
256 # endif
257
258 f i n i s h _ t e s t ( TEST_PASS ) ;
259 r e t u r n 0 ; // This l i n e w i l l never e x e c u t e as boot_nonsec_program never
returns
260 }
261
262
263 /* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
264 * *
265 * Test functions *
266 * *
267 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
268
269 /* * * * * * * * * * * * * * * * * * * * * * *
270 * A r i t h m e t i c module *
271 * * * * * * * * * * * * * * * * * * * * * * */
272
273 / * Modular a d d i t i o n * /
274 bool
275 t e s t _ a r i _ a d d i t i o n ( char ( * data ) [DATAADD16_NUM_HEADERS] [DATAADD16_NUM_CHARS
+1])
276 {
277 int i = 0;
278 char ( * e n t r y ) [DATAADD16_NUM_CHARS+ 1 ] ;
279 ew_str_t mod_s , op1_s , op2_s , s o l _ s , r e s _ s ;
280 ecc_word_t mod, op1 , op2 , s o l , r e s ;
281
282 while ( i ++ < DATAADD16_NUM_ENTRIES ) {
283 e n t r y = * data ++;
284 / * S e t parameter v a l u e s from data s t r i n g s * /
285 i f ( ! e w _ s e t _ s t r (&mod, e n t r y [ 0 ] ) ) goto e r r o r ;
286 i f ( ! e w _ s e t _ s t r (&op1 , e n t r y [ 1 ] ) ) goto e r r o r ;
287 i f ( ! e w _ s e t _ s t r (&op2 , e n t r y [ 2 ] ) ) goto e r r o r ;
288 i f ( ! e w _ s e t _ s t r (& s o l , e n t r y [ 3 ] ) ) goto e r r o r ;
289 / * Load parameters i n t o CP r e g i s t e r s * /
290 ew_load_cr0 (&op1 ) ;
291 ew_load_cr1 (&op2 ) ;
292 EW_LOAD_MOD(&mod) ;
293 / * Perform a d d i t i o n * /
294 ECC_ADD( " 0 " , " 1 " , " 2 " ) ;
295 /* V e r i f y r e s u l t */
296 e w _ s t o r e _ c r 2 (& r e s ) ;
297 i f ( ! ew_eq(& r e s , &s o l ) )
298 goto wrong ;
299 MSG( ( " T e s t e n t r y %d passed . \ n " , i ) ) ;
82 Appendix F. Benchmark & Test program
300 }
301 return true ;
302
303 wrong :
304 e w _ t o _ s t r (&mod, mod_s , EW_STR_LENGTH) ;
305 e w _ t o _ s t r (&op1 , op1_s , EW_STR_LENGTH) ;
306 e w _ t o _ s t r (&op2 , op2_s , EW_STR_LENGTH) ;
307 e w _ t o _ s t r (& r e s , r e s _ s , EW_STR_LENGTH) ;
308 e w _ t o _ s t r (& s o l , s o l _ s , EW_STR_LENGTH) ;
309 MSG( ( " %s\n "
310 " + %s\n "
311 " (mod %s ) \n "
312 " = %s\n "
313 " got %s\n " ,
314 op1_s , op2_s , mod_s , r e s _ s , s o l _ s ) ) ;
315 error :
316 MSG( ( " F a i l e d . . . \ n " ) ) ;
317 return f a l s e ;
318 }
319
320 / * Modular a d d i t i o n * /
321 bool
322 t e s t _ a r i _ m u l t i p l i c a t i o n ( char ( * data ) [DATAMUL16_NUM_HEADERS] [
DATAMUL16_NUM_CHARS+ 1 ] )
323 {
324 int i = 0;
325 char ( * e n t r y ) [DATAMUL16_NUM_CHARS+ 1 ] ;
326 ew_str_t mod_s , op1_s , op2_s , s o l _ s , r e s _ s ;
327 ecc_word_t mod, op1 , op2 , s o l , r e s ;
328
329 while ( i ++ < DATAMUL16_NUM_ENTRIES ) {
330 e n t r y = * data ++;
331 / * S e t parameter v a l u e s from data s t r i n g s * /
332 i f ( ! e w _ s e t _ s t r (&mod, e n t r y [ 0 ] ) ) goto e r r o r ;
333 i f ( ! e w _ s e t _ s t r (&op1 , e n t r y [ 1 ] ) ) goto e r r o r ;
334 i f ( ! e w _ s e t _ s t r (&op2 , e n t r y [ 2 ] ) ) goto e r r o r ;
335 i f ( ! e w _ s e t _ s t r (& s o l , e n t r y [ 3 ] ) ) goto e r r o r ;
336 / * Load parameters i n t o CP r e g i s t e r s * /
337 ew_load_cr0 (&op1 ) ;
338 ew_load_cr1 (&op2 ) ;
339 EW_LOAD_MOD(&mod) ;
340 / * Perform a d d i t i o n * /
341 ECC_MUL( " 0 " , " 1 " , " 2 " ) ;
342 /* V e r i f y r e s u l t */
343 e w _ s t o r e _ c r 2 (& r e s ) ;
344 i f ( ! ew_eq(& r e s , &s o l ) )
345 goto wrong ;
346 MSG( ( " T e s t e n t r y %d passed . \ n " , i ) ) ;
347 }
348 return true ;
349
350 wrong :
351 e w _ t o _ s t r (&mod, mod_s , EW_STR_LENGTH) ;
352 e w _ t o _ s t r (&op1 , op1_s , EW_STR_LENGTH) ;
353 e w _ t o _ s t r (&op2 , op2_s , EW_STR_LENGTH) ;
354 e w _ t o _ s t r (& r e s , r e s _ s , EW_STR_LENGTH) ;
355 e w _ t o _ s t r (& s o l , s o l _ s , EW_STR_LENGTH) ;
356 MSG( ( " %s\n "
357 " * %s\n "
358 " (mod %s ) \n "
359 " = %s\n "
360 " got %s\n " ,
361 op1_s , op2_s , mod_s , r e s _ s , s o l _ s ) ) ;
362 error :
363 MSG( ( " F a i l e d . . . \ n " ) ) ;
364 return f a l s e ;
365 }
366
367 / * Modular a d d i t i o n * /
368 bool
369 t e s t _ a r i _ d i v i s i o n ( char ( * data ) [DATADIV16_NUM_HEADERS ] [ DATADIV16_NUM_CHARS
+1])
Appendix F. Benchmark & Test program 83
370 {
371 int i = 0;
372 char ( * e n t r y ) [DATADIV16_NUM_CHARS+ 1 ] ;
373 ew_str_t op1_s , op2_s , s o l _ s , r e s _ s ;
374 ecc_word_t op1 , op2 , s o l , r e s ;
375
376 while ( i ++ < DATADIV16_NUM_ENTRIES ) {
377 e n t r y = * data ++;
378 / * S e t parameter v a l u e s from data s t r i n g s * /
379 i f ( ! e w _ s e t _ s t r (&op1 , e n t r y [ 0 ] ) ) goto e r r o r ;
380 i f ( ! e w _ s e t _ s t r (&op2 , e n t r y [ 1 ] ) ) goto e r r o r ;
381 i f ( ! e w _ s e t _ s t r (& s o l , e n t r y [ 2 ] ) ) goto e r r o r ;
382 / * Load parameters i n t o CP r e g i s t e r s * /
383 ew_load_cr0 (&op1 ) ;
384 ew_load_cr1 (&op2 ) ;
385 / * Perform a d d i t i o n * /
386 ECC_DIV ( " 0 " , " 1 " , " 2 " ) ;
387 /* V e r i f y r e s u l t */
388 e w _ s t o r e _ c r 2 (& r e s ) ;
389 i f ( ! ew_eq(& r e s , &s o l ) )
390 goto wrong ;
391 MSG( ( " T e s t e n t r y %d passed . \ n " , i ) ) ;
392 }
393 return true ;
394
395 wrong :
396 e w _ t o _ s t r (&op1 , op1_s , EW_STR_LENGTH) ;
397 e w _ t o _ s t r (&op2 , op2_s , EW_STR_LENGTH) ;
398 e w _ t o _ s t r (& r e s , r e s _ s , EW_STR_LENGTH) ;
399 e w _ t o _ s t r (& s o l , s o l _ s , EW_STR_LENGTH) ;
400 MSG( ( " %s\n "
401 " / %s\n "
402 " = %s\n "
403 " got %s\n " ,
404 op1_s , op2_s , r e s _ s , s o l _ s ) ) ;
405 error :
406 MSG( ( " F a i l e d . . . \ n " ) ) ;
407 return f a l s e ;
408 }
References
[1] N. Koblitz, “Elliptic curve cryptosystems”, Math. Comp., vol. 48, pp. 203–
209, 1987, ISSN: 0025-5718. DOI: 10.1090/S0025-5718-1987-0866109-
5.
[2] V. S. Miller, “Use of elliptic curves in cryptography”, in Advances in
Cryptology — CRYPTO ’85 Proceedings, H. C. Williams, Ed., Berlin, Hei-
delberg: Springer Berlin Heidelberg, 1986, pp. 417–426, ISBN: 978-3-
540-39799-1.
[3] A. J. Menezes, S. A. Vanstone, and P. C. V. Oorschot, Handbook of Applied
Cryptography, 1st. Boca Raton, FL, USA: CRC Press, Inc., 1996, ISBN:
0849385237.
[4] W. Diffie and M. Hellman, “New directions in cryptography”, IEEE
Transactions on Information Theory, vol. 22, no. 6, pp. 644–654, Nov. 1976,
ISSN : 0018-9448. DOI : 10.1109/TIT.1976.1055638.
[5] Y. Kumar, R. Munjal, and H. Sharma, “Comparison of symmetric and
asymmetric cryptography with existing vulnerabilities and counter-
measures”, International Journal of Computer Science and Management Stud-
ies, vol. 11, no. 03, 2011.
[6] R. Tripathi and S. Agrawal, “Comparative study of symmetric and asym-
metric cryptography techniques”, International Journal of Advance Foun-
dation and Research in Computer (IJAFRC), vol. 1, no. 6, pp. 68–76, 2014.
[7] E. Rescorla. (2018). The transport layer security (tls) protocol version
1.3, [Online]. Available: https : / / tools . ietf . org / html / rfc8446
(visited on 11/09/2018).
[8] IEEE. (2017). Why we need low-power, low-latency devices, [Online].
Available: https://innovationatwork.ieee.org/why-we-need-low-
power-low-latency-devices/ (visited on 06/26/2019).
[9] M. Guerra. (2017). The power of iot devices, [Online]. Available: https:
//www.electronicdesign.com/power/power-iot-devices (visited on
06/26/2019).
[10] N. Shields. (2017). Here’s how 5g will revolutionize the internet of
things, [Online]. Available: https://www.businessinsider.com/how-
5g- will- revolutionize- the- internet- of- things- 2017- 6?r=US&
IR=T (visited on 06/26/2019).
[11] M. Hirth, Hardware acceleration of asymmetric elliptic curve cryptography,
2018.
[12] P. B. Bhattacharya, S. K. Jain, and S. Nagpaul, Basic abstract algebra, 2nd.
Cambridge University Press, 1994, ISBN: 0521460816.
86 REFERENCES