High-Throughput Elliptic Curve Cryptography Using AVX2 Vector Instructions

Cheng, Hao; Großschädl, Johann; Tian, Jiaqi; Rønne, Peter B.; Ryan, Peter Y. A.

doi:10.1007/978-3-030-81652-0_27

Hao Cheng¹¹,
Johann Großschädl¹¹,
Jiaqi Tian¹¹,
Peter B. Rønne¹¹ &
…
Peter Y. A. Ryan¹¹

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 12804))

Included in the following conference series:

International Conference on Selected Areas in Cryptography

931 Accesses
6 Citations

Abstract

Single Instruction Multiple Data (SIMD) execution engines like Intel’s Advanced Vector Extensions 2 (AVX2) offer a great potential to accelerate elliptic curve cryptography compared to implementations using only basic x64 instructions. All existing AVX2 implementations of scalar multiplication on e.g. Curve25519 (and alternative curves) are optimized for low latency. We argue in this paper that many real-world applications, such as server-side SSL/TLS handshake processing, would benefit more from throughput-optimized implementations than latency-optimized ones. To support this argument, we introduce a throughput-optimized AVX2 implementation of variable-base scalar multiplication on Curve25519 and fixed-base scalar multiplication on Ed25519. Both implementations perform four scalar multiplications in parallel, where each uses a 64-bit element of a 256-bit vector. The field arithmetic is based on a radix-$2^{29}$ representation of the field elements, which makes it possible to carry out four parallel multiplications modulo a multiple of $p = 2^{255} - 19$ in just 88 cycles on a Skylake CPU. Four variable-base scalar multiplications on Curve25519 require less than 250,000 Skylake cycles, which translates to a throughput of 32,318 scalar multiplications per second at a clock frequency of 2 GHz. For comparison, the to-date best latency-optimized AVX2 implementation has a throughput of some 21,000 scalar multiplications per second on the same Skylake CPU.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Parallel Implementation of SM2 Elliptic Curve Cryptography on Intel Processors with AVX2

Fast Implementation of Curve25519 Using AVX2

Fast software implementation of binary elliptic curve cryptography

Article 14 February 2015

Notes

1.
SVE registers can be between 128 and 2048 bits long, in steps of 128 bits.
2.
The termination of SSL/TLS connections is often off-loaded to a so-called “reverse proxy,” which transparently translates SSL/TLS sessions to normal TCP sessions for back-end servers. The cryptographic performance of such reverse proxies can be significantly improved with dedicated hardware accelerators. Jang et al. introduced SSLShader, a SSL/TLS reverse proxy that uses a Graphics Processing Unit (GPU) to increase the throughput of public-key cryptosystems like RSA [18].
3.
A throughput-optimized implementation of variable-base scalar multiplication on a 251-bit binary Edwards curve was presented by Bernstein [3]. This implementation uses bitslicing for the low-level binary-field arithmetic and is able to execute 30,000 scalar multiplications per second on an Intel Core 2 Quad Q6600 CPU.
4.
A ($2 \times 2$)-way parallel AVX2 implementation (i.e. an implementation executing two field operations in parallel, each using two 64-bit elements of a 256-bit vector) can not profit from a radix-$2^{29}$ representation since the limbs are processed in pairs and the number of limb-pairs is the same as for radix $2^{25.5}$, namely five.

References

Aoki, K., Hoshino, F., Kobayashi, T., Oguro, H.: Elliptic curve arithmetic using SIMD. In: Davida, G.I., Frankel, Y. (eds.) ISC 2001. LNCS, vol. 2200, pp. 235–247. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-45439-X_16
Chapter Google Scholar
Bernstein, D.J.: Curve25519: new Diffie-Hellman speed records. In: Yung, M., Dodis, Y., Kiayias, A., Malkin, T. (eds.) PKC 2006. LNCS, vol. 3958, pp. 207–228. Springer, Heidelberg (2006). https://doi.org/10.1007/11745853_14
Chapter Google Scholar
Bernstein, D.J.: Batch binary Edwards. In: Halevi, S. (ed.) CRYPTO 2009. LNCS, vol. 5677, pp. 317–336. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03356-8_19
Chapter Google Scholar
Bernstein, D.J., Birkner, P., Joye, M., Lange, T., Peters, C.: Twisted Edwards curves. In: Vaudenay, S. (ed.) AFRICACRYPT 2008. LNCS, vol. 5023, pp. 389–405. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-68164-9_26
Chapter Google Scholar
Bernstein, D.J., Duif, N., Lange, T., Schwabe, P., Yang, B.-Y.: High-speed high-security signatures. J. Cryptogr. Eng. 2(2), 77–89 (2012)
Article Google Scholar
Bos, J.W., Montgomery, P.L., Shumow, D., Zaverucha, G.M.: Montgomery multiplication using vector instructions. In: Lange, T., Lauter, K., Lisoněk, P. (eds.) SAC 2013. LNCS, vol. 8282, pp. 471–489. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-43414-7_24
Chapter Google Scholar
Chou, T.: Sandy2x: new curve 25519 speed records. In: Dunkelman, O., Keliher, L. (eds.) SAC 2015. LNCS, vol. 9566, pp. 145–160. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-31301-6_8
Chapter Google Scholar
de Valence. H.: Accelerating Edwards curve arithmetic with parallel formulas. Blog post (2018). https://medium.com/hdevalence/accelerating-edwards-curve-arithmetic-with-parallel-formulas-ac12cf5015be
Faz-Hernández, A., López, J.: Fast implementation of Curve25519 using AVX2. In: Lauter, K., Rodríguez-Henríquez, F. (eds.) LATINCRYPT 2015. LNCS, vol. 9230, pp. 329–345. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-22174-8_18
Chapter Google Scholar
Faz-Hernández, A., López, J., Dahab, R.: High-performance implementation of elliptic curve cryptography using vector instructions. ACM Trans. Math. Softw. 45(3), 1–35 (2019)
Article MathSciNet Google Scholar
Grabher, P., Großschädl, J., Page, D.: On software parallel implementation of cryptographic pairings. In Avanzi, R.M., Keliher, L., Sica, F.: (eds.) Selected Areas in Cryptography – SAC 2008, vol. 5381 of Lecture Notes in Computer Science, pp. 35–50. Springer Verlag (2009)
Google Scholar
Gueron, S., Krasnov, V.: Software implementation of modular exponentiation, using advanced vector instructions architectures. In: Özbudak, F., Rodríguez-Henríquez, F. (eds.) WAIFI 2012. LNCS, vol. 7369, pp. 119–135. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31662-3_9
Chapter MATH Google Scholar
Halfhill. T.R.: RISC-V vectors know no limits. Linley Newsletter (2020). https://www.linleygroup.com/newsletters/newsletter_detail.php?num=6154
Hankerson, D.R., Menezes, A.J., Vanstone, S.A.: Guide to Elliptic Curve Cryptography. Springer Verlag (2004). https://doi.org/10.1007/b97644
Hişil, H., Eğrice, B., Yassi. M.: Fast 4 way vectorized ladder for the complete set of Montgomery curves. Cryptology ePrint Archive, Report 2020/388, 2020. https://eprint.iacr.org
Hisil, H., Wong, K.K.-H., Carter, G., Dawson, E.: Twisted Edwards curves revisited. In: Pieprzyk, J. (ed.) ASIACRYPT 2008. LNCS, vol. 5350, pp. 326–343. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-89255-7_20
Chapter Google Scholar
Huang, J., Liu, Z., Hu, Z., Großschädl, J.: Parallel implementation of sm2 elliptic curve cryptography on intel processors with AVX2. In: Liu, J.K., Cui, H. (eds.) ACISP 2020. LNCS, vol. 12248, pp. 204–224. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-55304-3_11
Chapter Google Scholar
Jang, K., Han, S., Han, S., Moon, S.B., Park, K.: SSLShader: cheap SSL acceleration with commodity processors. In: Andersen, D.G. Ratnasamy, S. (eds.) Proceedings of the 8th USENIX Symposium on Networked Systems Design and Implementation (NSDI 2011). USENIX Association (2011)
Google Scholar
Montgomery, P.L.: Speeding the Pollard and elliptic curve methods of factorization. Math. Comput. 48(177), 243–264 (1987)
Article MathSciNet Google Scholar
Nath, K., Sarkar. P.: Efficient 4-way vectorizations of the Montgomery ladder. Cryptology ePrint Archive, Report 2020/378 (2020). https://eprint.iacr.org
Page, D., Smart, N.P.: Parallel cryptographic arithmetic using a redundant Montgomery representation. IEEE Trans. Comput. 53(11), 1474–1482 (2004)
Article Google Scholar
Stephens, N., et al.: The ARM scalable vector extension. IEEE Micro 37(2), 26–39 (2017)
Google Scholar

Download references

Acknowledgements

The source code of the presented software is available online at https://gitlab.uni.lu/APSIA/AVXECC under GPLv3 license. This work was supported by the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 779391 (FutureTPM).

Author information

Authors and Affiliations

DCS and SnT, University of Luxembourg, 6, Avenue de la Fonte, L-4364, Esch-sur-Alzette, Luxembourg
Hao Cheng, Johann Großschädl, Jiaqi Tian, Peter B. Rønne & Peter Y. A. Ryan

Authors

Hao Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Johann Großschädl
View author publications
You can also search for this author in PubMed Google Scholar
Jiaqi Tian
View author publications
You can also search for this author in PubMed Google Scholar
Peter B. Rønne
View author publications
You can also search for this author in PubMed Google Scholar
Peter Y. A. Ryan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hao Cheng .

Editor information

Editors and Affiliations

University of Haifa, Haifa, Israel
Orr Dunkelman
University of Calgary, Calgary, AB, Canada
Michael J. Jacobson, Jr.
Dalhousie University, Halifax, NS, Canada
Colin O'Flynn

Appendices

A Source Code of Vectorized Field Operations

B Source Code of ($4 \times 1$)-Way Point Operations

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cheng, H., Großschädl, J., Tian, J., Rønne, P.B., Ryan, P.Y.A. (2021). High-Throughput Elliptic Curve Cryptography Using AVX2 Vector Instructions. In: Dunkelman, O., Jacobson, Jr., M.J., O'Flynn, C. (eds) Selected Areas in Cryptography. SAC 2020. Lecture Notes in Computer Science(), vol 12804. Springer, Cham. https://doi.org/10.1007/978-3-030-81652-0_27

Download citation

DOI: https://doi.org/10.1007/978-3-030-81652-0_27
Published: 21 July 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-81651-3
Online ISBN: 978-3-030-81652-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics