Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

On The Optimum Constructions of Composite Field For The AES Algorithm

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS, VOL. 53, NO.

10, OCTOBER 2006 1153

On the Optimum Constructions of Composite Field


for the AES Algorithm
Xinmiao Zhang, Member, IEEE, and Keshab K. Parhi, Fellow, IEEE

Abstract—In the hardware implementations of the Advanced inversion in the SubBytes/InvSubBytes transformation of the
Encryption Standard (AES) algorithm, employing composite field AES algorithm. As a result, deep subpipelining is enabled, and
arithmetic not only reduces the complexity but also enables deep hardware complexity is reduced.
subpipelining such that higher speed can be achieved. In addition,
Different construction schemes for composite fields are pro-
it is more efficient to employ composite field arithmetic only in the
SubBytes transformation of the AES algorithm. Composite fields posed for the AES algorithm in [7], [8], and [11]. In the design
can be constructed by using different irreducible polynomials. in [7], is decomposed into , and composite
Nevertheless, how the different constructions affect the complexity field arithmetic is applied to all the transformations in the AES
of the composite implementation of the SubBytes has not been an- algorithm. The optimum construction scheme for is
alyzed in prior works. This brief presents 16 ways to construct the selected based on minimizing the total gate count in the imple-
composite field (((22 )2 )2 ) for the AES algorithm. Analytical
mentation of all transformations. However, it is more efficient to
results are provided for the effects of the irreducible polynomial
coefficients on the complexity of each involved subfield operation. apply composite field arithmetic only in the computation of the
In addition, for each construction, there exist eight isomorphic multiplicative inversion in the SubBytes and InvSubBytes trans-
mappings that map the elements in (28 ) to those in composite formations [2]. In this case, the construction scheme selected in
fields. The complexities of these mappings vary. An efficient algo- [7] is no longer optimum. The schemes proposed in [8] and [11]
rithm is proposed in this brief to find all isomorphic mappings. apply composite field arithmetic only to the multiplicative in-
Based on the complexities of both the subfield operations and the version. In [8], is decomposed into , while
isomorphic mappings, the optimum constructions of the composite
field for the AES algorithm are selected to minimize gate count in [11], is decomposed into . Neverthe-
and critical path. less, each of them proposed only one possible way to construct
the composite field. There exist other construction schemes with
Index Terms—Advanced Encryption Standard (AES) algorithm,
smaller gate counts and shorter critical paths.
composite field, isomorphic mapping, multiplicative inversion.
Different irreducible polynomials can be used to construct the
composite fields of the same order. This brief presents 16 ways
I. INTRODUCTION to construct . Using composite field arithmetic,
the complicated multiplicative inversion in is mapped
RYPTOGRAPHY plays an important role in the security
C of data transmission. The development of computing
technology imposes stronger requirements on the cryptography
to operations in subfields. This brief provides the analytical re-
sults of how the coefficients in the irreducible polynomials af-
fect the complexities of the subfield operations. In addition, for
schemes. The Data Encryption Standard (DES) has been the each construction scheme, there exist eight isomorphic map-
U.S. government standard since 1977. However, now, it can pings with various complexities to map the elements between
be cracked quickly and inexpensively. In 2000, the Advanced and . An efficient algorithm is proposed
Encryption Standard (AES) [1] replaced the DES to meet the in this brief to find all the isomorphic mappings. Moreover, the
ever-increasing requirements for security. lowest mapping complexity is provided for each proposed com-
The AES algorithm has broad applications, such as smart posite field construction scheme. Based on the complexities of
cards and cell phones, WWW servers and automated teller both the subfield operations and the isomorphic mappings, the
machines, and digital video recorders. Numerous architectures optimum constructions of the composite field for
have been proposed for the hardware implementations of the the AES algorithm are proposed. Other composite field con-
AES algorithm [2]–[11]. Among these architectures, the design struction optimization approaches have been published recently
in [2] can achieve the highest speed while it is more efficient [12], [13]. However, the approach in [12] is optimized based
than the prior designs. The key idea in [2] is to employ com- only on the complexity of isomorphic mappings. The approach
posite field arithmetic in the computation of the multiplicative in [13] optimizes for overall area requirement. Nevertheless, the
critical path issue is ignored in the optimization process.
The structure of this brief is as follows. In Section II, the ar-
Manuscript received July 12, 2005; revised April 14, 2006. This work was chitecture for the implementation of SubBytes using composite
supported by the Army Research Office under Grant W911NF-04-1-0272. This field arithmetic is introduced. Section III provides different con-
paper was recommended by Associate Editor C.-T. Lin.
X. Zhang is with the Department of Electrical Engineering and Computer struction schemes for the composite field . Sec-
Science, Case Western Reserve University, Cleveland, OH 44106-7071 USA tion IV discusses how the coefficients of the field polynomials
(e-mail: xinmiao.zhang@case.edu). affect the complexity of each block in the SubBytes implemen-
K. K. Parhi is with the Department of Electrical and Computer Engineering,
University of Minnesota, Minneapolis, MN 55455 USA (e-mail: parhi@ece.
tation. An efficient scheme is presented in Section V to find all
umn.edu). the isomorphic mappings for each possible construction of the
Digital Object Identifier 10.1109/TCSII.2006.882217 composite field. The lowest achievable mapping complexity for
1057-7130/$20.00 © 2006 IEEE
1154 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS, VOL. 53, NO. 10, OCTOBER 2006

each construction scheme is listed, and some comparison results


with prior works are provided. Section VI concludes this brief.

II. COMPOSITE FIELD IMPLEMENTATIONS OF THE


SUBBYTES IN AES
In the AES algorithm, the message is divided into blocks of
128 bits. Each block is divided into 16 bytes, and each byte is
Fig. 1. Implementation of the SubBytes transformation.
considered as an element of . Although different irre-
ducible polynomials can be used to construct , the one
specified for the AES algorithm is where . According to (2), the multi-
. The AES algorithm is carried out in a number of rounds. plicative inversion in can be carried out in
Each round in the encryption consists of four transformations, by the architecture illustrated in Fig. 1 [2]. The inverse isomor-
namely: 1) SubBytes; 2) ShiftRows; 3) MixColumns; and 4) Ad- phic mapping is combined with the affine transformation to re-
dRoundKey. The decryption consists of the inverse transforma- duce the hardware complexity.
tions. Among the four transformations involved in the encryp- The multiplication in can be further decomposed into
tion, the SubBytes is the most complicated. In this transforma- multiplications in to reduce complexity as shown
tion, we need to compute the multiplicative inverse of each byte in Fig. 2(a). In addition, based on further decomposition into
in , followed by an affine transformation. Denoting each , the bit expressions can be derived for other blocks
byte by , the SubBytes can be described by in Fig. 1. Then, substructure sharing can be employed accord-
ingly to reduce the complexities of these blocks. For example,
it was proposed in [8] to use and ,
where denotes the number in the braces in binary form. In
where is an 8 8 binary matrix, and is an 8-bit binary this case, the squarer in , block, and block can
vector. be implemented as illustrated in Fig. 2(c)–(e), respectively.
Although two finite fields of the same order are isomorphic, Composite fields can be constructed using different irre-
the complexities of the field operations may heavily depend ducible polynomials. In this brief, we only consider those in the
on the representations of the field elements. Composite field form of (1). The values of and decide the complexities of
arithmetic can be employed to reduce the hardware com- subfield operations. In addition, for each fixed set of irreducible
plexity of the multiplicative inversion in . We call two polynomials used in composite field construction, there exist
pairs and multiple isomorphic mappings whose complexities vary.
a
composite field [14] if III. CHOOSING THE COEFFICIENTS IN
• is constructed from by ; IRREDUCIBLE POLYNOMIALS
• is constructed from by . is the only irreducible polynomial of degree
Composite fields will be denoted by , and a com- two over . Hence, it is the only possible choice for
posite field is isomorphic to the field for in (1). The choices of and need
. Additionally, composite fields can be built itera- to satisfy that is irreducible over
tively from lower order fields. For example, the composite field and is irreducible over .
of can be built iteratively from using the irre- A polynomial over is irreducible if it cannot be fac-
ducible polynomials tored into nontrivial polynomials other than itself over .
Since the degree of is two, if can be factored
into nontrivial polynomials, it will be factored into the form of
, where , . Therefore,
the test of irreducibility can be carried out by examining if any
(1) elements of are roots of . Using this scheme, it
where , , and the values of , can be derived that the only values of that make
satisfy that is irreducible over and is irreducible over are and .
irreducible over . Moreover, an isomorphic mapping The values of , which make irreducible over
function and its inverse need to be applied to map , can be derived in a similar way. Alternatively, we
the representation of an element in to its composite field can evaluate on each element of . The eval-
and vice versa. The 8 8 binary matrix is decided by the field uation results are all the values of that make nonirre-
polynomials of and its composite field. ducible. For example, assume .
In the composite field , an element can be ex- Then, . Hence,
pressed as , where , , and is the root of has a root . Therefore, can be factored into
. Using the extended Euclidean algorithm, the multiplica- , which is non-
tive inverse of modulo can be computed as irreducible. The values of , which make irreducible,
consist of the field elements not equaling to any of the eval-
uation results. Different representations for field elements of
(2) can be developed when and ,
ZHANG AND PARHI: ON THE OPTIMUM CONSTRUCTIONS OF COMPOSITE FIELD FOR THE AES ALGORITHM 1155

2). (e)
Fig. 2. Implementations of individual blocks. (a) Multiplier in GF ((2 ) ). (b) Multiplier in GF (2 ). (c) Squarer in GF (2 ). (d) Constant multiplier (
2
Constant multiplier ( ).

respectively. It can be derived that there are eight possible multiplier does not change with or . Accordingly, the com-
values of that make irreducible over plexity of the multiplier is the same for the two pos-
constructed by using each of and . These sible values of .
values of are Compared to a general multiplier, the implementation of a
squarer can be simplified. In the case , taking the 4
bits of as , it can be derived that
each bit in can be computed as

All together, there are sixteen ways to construct the composite


field using irreducible polynomials in the form
of (1).
In the composite field implementation of the SubBytes illus-
trated in Figs. 1 and 2, the coefficients and affect the com- Hence, the implementation of the squarer in takes 4
plexities of the following blocks. XOR gates with 2 XOR gates in the critical path by employing
• The value of affects the complexities of substructure sharing. In the case , the architecture
1) constant multiplier ; for the squarer in is provided in [2]. It can be observed
2) squarer in ; that the squarer has the same gate count and critical path when
3) multiplier in ; and .
4) inversion in ; The complexity of the inversion in is only depen-
5) constant multiplier ; dent on . In addition, the complexity of the direct computation
6) isomorphic mapping and inverse. approach as in [2] can achieve lower complexity than those em-
• The value of affects the complexities of ploying further decompositions as in [8]. In the case ,
1) constant multiplier ; the equations to compute each bit in the inverse can be found
2) isomorphic mapping and inverse. in [2]. Using substructure sharing, the inversion can be imple-
In the next section, analytical results are provided on how the mented by 14 XOR gates and 8 AND gates with 3 XOR gates and
coefficients and affect the complexity of each block. 2 AND gates in the critical path. It can be derived that in the case
IV. EFFECTS OF IRREDUCIBLE POLYNOMIAL COEFFICIENTS of , each bit in can be
computed by
Taking as the root of , an element can
be expressed as , where , . It can be
derived that

(3)

Hence, in the case of , the constant multiplier


can be implemented by one XOR gate. The constant multipli- Using substructure sharing, the equations in (3) can be com-
cation by also takes one XOR gate [2]. Therefore, puted by 14 XOR gates and 8 AND gates with 3 XOR gates and
the two choices of lead to the same complexity for the 2 AND gates in the critical path. Compared to the case when
block. As it can be observed from Fig. 2(a), the multiplier in , the computation of the inverse in in the
consists of multipliers in , a constant multi- case of requires the same number of gates and has
plier , and modulo 2 adders. The complexity of the the same critical path.
1156 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS, VOL. 53, NO. 10, OCTOBER 2006

TABLE I Another algorithm is proposed in [7] to find isomorphic map-


2
GATE COUNT FOR EACH  IMPLEMENTATION pings for nonprimitive field polynomials. However, this algo-
rithm has very high complexity. To find a mapping for ,
an average of checkings is needed,
where denotes Euler’s totient function.
Assume is a root of . Then,
the set forms a standard basis for .
To construct isomorphic mappings for , we need to find
eight base elements 1, of to which
the base elements 1, are mapped. The isomorphic
TABLE II mapping matrix is formed by taking the binary representation
OPTIMUM VALUES OF  AND  of as the entries of its th column. Since is not
primitive, is not a primitive element. Accordingly, the element
to which is mapped is not primitive. Assume is the iso-
morphic mapping matrix and . If ,
then , where , , and are the column
The complexity for the constant multiplier is affected by vectors formed by the bits in the standard basis representation
both values of and . Taking as a root of , an element of , , and , respectively. Hence, additive homomorphism is
can be expressed as . always held for such isomorphic mappings. In addition, if the
Similarly, can be expressed as . to which is mapped satisfies , then multiplicative
Accordingly, the product can be computed as homomorphism will also hold. Based on these discussions, the
isomorphic mappings between constructed using irre-
ducible polynomial and its composite field can be found
by the algorithm described below.

Algorithm 1

initialization: t = 1, stop = 0
Therefore, two values need to be computed for the constant mul- flag(i) = 0 for i = 1; 2; . . . ; 2 01
tiplication by , i.e.,
while stop == 0

(4) {! = dectobin(t; q )

(5) P (! )
if P (! ) == 0
One way to simplify the computations in the above two equa-
tions is to make or zero such that the terms consisting mapping found, output = !
of the multiplications with or can be eliminated from stop = 1
the computation. Alternatively, if , the terms and else
can be cancelled out from (4). From the discussions in Sec-
tion III, can only be or . Therefore, the values index(j ) = bintodec(! ), for j = 0 ; 1; 2; . . . ; q 01
of , which minimize the complexity of , can be , flag(index(j )) = 1, for j = 0; 1; 2; . . . ; q 01
, , or . Table I lists the gate counts for find the minimum integer l > t, such that flag(l) = 0
the implementations of for all possible combinations of t = l
and . The critical path for each implementation has two XOR
gates. It can be observed from Table I that the combinations of }
and , which minimize the cost of the constant multiplication
In Algorithm 1, means first convert the in-
by , are those listed in Table II. The results in Table II agree
teger to a -bit binary number, and then take these bits as the
with the analytical results.
standard basis representation for . Similarly,
stands for taking the standard basis representation of as an
V. OPTIMUM ISOMORPHIC MAPPINGS -bit binary number, and then converting this number into an
For a fixed set of irreducible polynomials in (1), there exist integer as the value of . It may be noted that the computation of
multiple isomorphic mappings between and is carried out according to the field operations specified
. The complexities of these mappings vary. In this in the composite field. The basic idea of this algorithm is to test
section, the lowest achievable complexity of the isomorphic if an element of the composite field is a root of . If not,
mappings for each combination of and is provided. then none of the other elements in the same conjugacy class as
An algorithm is proposed in [14, Ch. 2.2] to find the isomor- , namely , is a root of . The next
phic mapping matrices when the involved field polynomials are element to be tested is selected excluding the elements in the
primitive. However, the irreducible polynomial conjugacy classes of all tested elements. It can be derived that,
specified for the AES algorithm is not primi- on average, checkings are needed to find isomor-
tive. Therefore, the algorithm in [14] cannot be applied directly. phic mappings for a field of order .
ZHANG AND PARHI: ON THE OPTIMUM CONSTRUCTIONS OF COMPOSITE FIELD FOR THE AES ALGORITHM 1157

TABLE III TABLE IV


COMPLEXITY OF OPTIMUM ISOMORPHIC MAPPING AND INVERSE GATE COUNT AND CRITICAL PATH FOR COMPOSITE IMPLEMENTATIONS OF
SUBBYTES

VI. SUMMARY
In this brief, the optimum constructions of the composite field
for the AES algorithm are presented. How the coefficients of
the field polynomials affect each block in the composite field
implementation of the SubBytes transformation is analyzed. In
addition, an efficient algorithm is proposed to find isomorphic
mappings when the involved field polynomials are not primi-
The computed by Algorithm 1 is not the only element tive. The optimum constructions are selected by considering the
can be mapped to. can also be mapped to the other elements complexities of both the involved subfield operations and the
in the same conjugacy class as . It can be computed that for isomorphic mappings. Future work will address composite field
each of the combinations of and , there are eight isomorphic constructions using irreducible polynomials in other forms.
mappings. The optimum isomorphic mapping can be selected
based on minimizing gate count. Table III shows the complexity REFERENCES
[1] Advanced Encryption Standard (AES), FIPS PUB 197, Nov. 26, 2001,
of the optimum isomorphic mapping and its inverse for each Federal Information Processing Standards publication 197.
combination of and . [2] X. Zhang and K. K Parhi, “High-speed VLSI architecture for the AES
The optimum constructions of the composite field for the algorithm,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 12,
AES algorithm can be selected by taking the complexities of no. 9, pp. 957–967, Sep. 2004.
[3] K. U. Jarvinen, M. T. Tommiska, and J. O. Skytta, “A fully pipelined
the involved subfield operations and isomorphic mappings into memoryless 17.8 Gbps AES-128 encryptor,” in Proc. Int. Symp. FPGA,
account. From the discussions in Section IV and Section V, it Monterey, CA, Feb. 2003, pp. 207–215.
can be concluded that the implementation of the multiplicative [4] G. P. Saggese, A. Mazzeo, N. Mazocca, and A. G. M. Strollo,
“An FPGA based performance analysis of the unrolling, tiling and
inversion and the affine transformation in SubBytes has the least pipelining of the AES algorithm,” in Proc. FPL, Portugal, Sep. 2003,
gate count and the shortest critical path when is pp. 292–302.
constructed by using either , or [5] F. Standaert, G. Rouvroy, J. Quisquater, and J. Legat, “Efficient im-
, . The lowest complexity for isomorphic plementation of Rijndael encryption in reconfigurable hardware: Im-
provements and design tradeoffs,” in Proc. CHES, Cologne, Germany,
mapping and inverse can be achieved for these two construc- Sep. 2003, pp. 334–350.
tions when the root of is mapped to [6] X. Zhang and K. K. Parhi, “Implementation approaches for the ad-
and , respectively. vanced encryption standard algorithm,” IEEE Circuits Syst. Mag., vol.
2, no. 4, pp. 24–46, Fourth Quarter 2002.
Table IV shows some comparison results with prior works. [7] A. Rudra, P. K. Dubey, C. S. Jutla, V. Kumar, J. R. Rao, and P. Rohatgi,
Using the proposed optimum construction of composite field, “Efficient implementation of Rijndael encryption with composite field
the SubBytes can be implemented by 120 XOR gates and 35 AND arithmetic,” in Proc. CHES, Paris, France, May 2001, pp. 171–184.
gates with 19 XOR gates and 4 AND gates in the critical path. In [8] A. Satoh, S. Morioka, K. Takano, and S. Munetoh, “A compact Ri-
jndael hardware architecture with S-box optimization,” in Proc. ASI-
[12], the complexity of the isomorphic mapping is measured by ACRYPT, Gold Coast, Australia, Dec. 2000, pp. 239–254.
the number of 1s in the matrix. This is an incorrect approach [9] M. McLoone and J. V. McCanny, “Rijndael FPGA implementation uti-
when substructure sharing is employed. The matrix with the lizing look-up tables,” in Proc. IEEE Workshop Signal Process. Syst.,
Sep. 2001, pp. 349–360.
largest number of 1s may have more terms that can be shared, [10] H. Kuo and I. Verbauwhede, “Architectural optimization for a 1.82
which may leads to a smaller total gate number. Considering Gbits/sec VLSI implementation of the AES Rijndael algorithm,” in
substructure sharing, the optimum approach proposed in [12] Proc. CHES, Paris, France, May 2001, pp. 51–64.
is only three gates less than that in [8] with two gates less in [11] J. Wolkerstorfer, E. Oswald, and M. Lamberger, “An ASIC implemen-
tation of the AES S-boxes,” in Proc. RSA Conf., San Jose, CA, Feb.
the critical path. The approach in [13] achieves minimum gate 2002, pp. 67–78.
count by using normal basis for field element representation and [12] N. Mentens, L. Batina, B. Preneel, and I. Verbauwhede, “A system-
sharing terms between multipliers with a common input. How- atic evaluation of compact hardware implementations for the Rijndael
S-box,” in Proc. Topics Cryptology—CT-RSA, San Francisco, CA, Feb.
ever, the critical path of this approach is much longer than that of 2005, pp. 323–333.
the construction proposed in this brief. In addition, further area [13] D. Canright, “A very compact S-box for AES,” in Proc. Cryptographic
reduction can be achieved by introducing NOR gates and NOT Hardware and Embedded Syst., Edinburgh, U.K., Sep. 2005, pp.
gates in the design [13]. The numbers in Table IV are based on 441–455.
[14] C. Paar, “Efficient VLSI architecture for bit-parallel computations in
a theoretical analysis. These numbers are not the result of syn- Galois field,” Ph.D. dissertation, Inst. Exp. Math.,, Univ. Essen, Essen,
thesizing by using any cell library. Germany, 1994.

You might also like