Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

A Novel High-Performance Implementation of CRYSTALS-Kyber with AI Accelerator

  • Conference paper
  • First Online:
Computer Security – ESORICS 2022 (ESORICS 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13556))

Included in the following conference series:

Abstract

Public-key cryptography, including conventional cryptosystems and post-quantum cryptography, involves computation-intensive workloads. With noticing the extraordinary computing power of AI accelerators, in this paper, we further explore the feasibility to introduce AI accelerators into high-performance cryptographic computing. Since AI accelerators are dedicated to machine learning or neural networks, the biggest challenge is how to transform cryptographic workloads into their operations, while ensuring the correctness of the results and bringing convincing performance gains.

After investigating and analysing the workload of NVIDIA AI accelerator, Tensor Core, we choose to utilize it to accelerate the polynomial multiplication, usually the most time-consuming part in lattice-based cryptography. We take measures to accommodate the matrix-multiply-and-add mode of Tensor Core and make a trade-off between precision and performance, to leverage it as a high-performance NTT box performing NTT/INTT through CUDA C++ WMMA APIs. Meanwhile, we take CRYSTALS-Kyber, the candidate to be standardized by NIST, as a case study on RTX 3080 with the Ampere Tensor Core. The empirical results show that the customized NTT of polynomial vector (\(n=256,k=4\)) with our NTT box obtains a speedup around 6.47x that of the state-of-the-art implementation on the same GPU platform. Compared with the AVX2 implementation submitted to NIST, our Kyber-1024 can achieve a speedup of 26x, 36x, and 35x for each phase.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Ajtai, M.: Generating hard instances of lattice problems. In: Proceedings of the Twenty-Eighth Annual ACM Symposium on Theory of Computing, pp. 99–108 (1996)

    Google Scholar 

  2. Alkım, E., Bilgin, Y.A., Cenk, M.: Compact and simple RLWE based key encapsulation mechanism. In: Schwabe, P., Thériault, N. (eds.) LATINCRYPT 2019. LNCS, vol. 11774, pp. 237–256. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30530-7_12

    Chapter  Google Scholar 

  3. Banerjee, A., Peikert, C., Rosen, A.: Pseudorandom functions and lattices. In: Pointcheval, D., Johansson, T. (eds.) EUROCRYPT 2012. LNCS, vol. 7237, pp. 719–737. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-29011-4_42

    Chapter  Google Scholar 

  4. Barrett, P.: Implementing the Rivest Shamir and Adleman public key encryption algorithm on a standard digital signal processor. In: Odlyzko, A.M. (ed.) CRYPTO 1986. LNCS, vol. 263, pp. 311–323. Springer, Heidelberg (1987). https://doi.org/10.1007/3-540-47721-7_24

    Chapter  Google Scholar 

  5. Bos, J., et al.: CRYSTALS-Kyber: a CCA-secure module-lattice-based KEM. In: 2018 IEEE European Symposium on Security and Privacy (EuroS &P), pp. 353–367. IEEE (2018)

    Google Scholar 

  6. Brakerski, Z., Gentry, C., Vaikuntanathan, V.: (Leveled) fully homomorphic encryption without bootstrapping. ACM Trans. Comput. Theor. (TOCT) 6(3), 1–36 (2014)

    Article  MathSciNet  Google Scholar 

  7. Chari, S., Jutla, C.S., Rao, J.R., Rohatgi, P.: Towards sound approaches to counteract power-analysis attacks. In: Wiener, M. (ed.) CRYPTO 1999. LNCS, vol. 1666, pp. 398–412. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-48405-1_26

    Chapter  Google Scholar 

  8. Cloud, G.: Cloud TPU. https://cloud.google.com/tpu/. Accessed 19 May 2021

  9. Cooley, J.W., Tukey, J.W.: An algorithm for the machine calculation of complex fourier series. Math. Comput. 19(90), 297–301 (1965)

    Article  MathSciNet  Google Scholar 

  10. Gao, Y., Xu, J., Wang, H.: cuNH: efficient GPU implementations of post-quantum KEM NewHope. IEEE Trans. Parallel Distrib. Syst. 33(3), 551–568 (2021)

    Article  Google Scholar 

  11. Greconici, D.O., Kannwischer, M.J., Sprenkels, D.: Compact dilithium implementations on Cortex-M3 and Cortex-M4. In: IACR Transactions on Cryptographic Hardware and Embedded Systems, pp. 1–24 (2021)

    Google Scholar 

  12. Gupta, N., Jati, A., Chauhan, A.K., Chattopadhyay, A.: PQC acceleration using GPUs: FrodoKEM, NewHope, and Kyber. IEEE Trans. Parallel Distrib. Syst. 32(3), 575–586 (2020)

    Article  Google Scholar 

  13. Inc, A.: Apple unleashes M1. www.apple.com/newsroom/2020/11/apple-unleashes-m1/. Accessed 19 May 2021

  14. Inc, N.: NVIDIA tensor cores-unprecedented acceleration for HPC and AI. www.nvidia.com/en-us/data-center/tensor-cores/. Accessed 19 May 2021

  15. Karatsuba, A.: Multiplication of multidigit numbers on automata. In: Soviet Physics Doklady, vol. 7, pp. 595–596 (1963)

    Google Scholar 

  16. Langlois, A., Stehlé, D.: Worst-case to average-case reductions for module lattices. Des. Codes Crypt. 75(3), 565–599 (2014). https://doi.org/10.1007/s10623-014-9938-4

    Article  MathSciNet  MATH  Google Scholar 

  17. Lu, X., et al.: Lac: Practical ring-LWE based public-key encryption with byte-level modulus. Cryptology ePrint Archive (2018)

    Google Scholar 

  18. Lyubashevsky, V., Peikert, C., Regev, O.: On ideal lattices and learning with errors over rings. In: Gilbert, H. (ed.) EUROCRYPT 2010. LNCS, vol. 6110, pp. 1–23. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13190-5_1

    Chapter  Google Scholar 

  19. Lyubashevsky, V., Seiler, G.: NTTRU: truly fast NTRU using NTT. In: IACR Transactions on Cryptographic Hardware and Embedded Systems, pp. 180–201 (2019)

    Google Scholar 

  20. Matthias, K., Peter, S., Douglas, S.: Wiggers: The pqclean project. https://github.com/PQClean/PQClean. Accessed 8 Apr 2022

  21. Montgomery, P.L.: Modular multiplication without trial division. Math. Comput. 44(170), 519–521 (1985)

    Article  MathSciNet  Google Scholar 

  22. Moody, D.: Status report on the third round of the NIST post-quantum cryptography standardization process. Tech. rep, Gaithersburg, MD (2022)

    Google Scholar 

  23. Nakai, T., Suzuki, D., Fujino, T.: Timing black-box attacks: Crafting adversarial examples through timing leaks against DNNs on embedded devices. In: IACR Transactions on Cryptographic Hardware and Embedded Systems, pp. 149–175 (2021)

    Google Scholar 

  24. NIST: Post-quantum cryptography, call for proposals. https://csrc.nist.gov/Projects/post-quantum-cryptography/post-quantum-cryptography-standardization/Call-for-Proposals. Accessed 31 Mar 2022

  25. NIST: Post-quantum cryptography, selected algorithms 2022. https://csrc.nist.gov/projects/post-quantum-cryptography/selected-algorithms-2022. Accessed 22 Apr 2022

  26. Prouff, E., Rivain, M.: Masking against side-channel attacks: a formal security proof. In: Johansson, T., Nguyen, P.Q. (eds.) EUROCRYPT 2013. LNCS, vol. 7881, pp. 142–159. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38348-9_9

    Chapter  Google Scholar 

  27. Regev, O.: On lattices, learning with errors, random linear codes, and cryptography. J. ACM (JACM) 56(6), 1–40 (2009)

    Article  MathSciNet  Google Scholar 

  28. Sanal, P., Karagoz, E., Seo, H., Azarderakhsh, R., Mozaffari-Kermani, M.: Kyber on ARM64: compact implementations of Kyber on 64-Bit ARM cortex-a processors. In: Garcia-Alfaro, J., Li, S., Poovendran, R., Debar, H., Yung, M. (eds.) SecureComm 2021. LNICST, vol. 399, pp. 424–440. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-90022-9_23

    Chapter  Google Scholar 

  29. Schwabe, P.: Crystals-cryptographic suite for algebraic lattices. https://pq-crystals.org/kyber/index.shtml. Accessed 18 May 2021

  30. Seiler, G.: Faster AVX2 optimized NTT multiplication for Ring-LWE lattice cryptography. IACR Cryptol. ePrint Arch. 2018, 39 (2018)

    Google Scholar 

  31. Shor, P.W.: Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer. SIAM Rev. 41(2), 303–332 (1999)

    Article  MathSciNet  Google Scholar 

  32. Toom, A.L.: The complexity of a scheme of functional elements realizing the multiplication of integers. In: Soviet Mathematics Doklady, vol. 3, pp. 714–716 (1963)

    Google Scholar 

  33. Wan, L., Zheng, F., Lin, J.: TESLAC: accelerating lattice-based cryptography with AI accelerator. In: Garcia-Alfaro, J., Li, S., Poovendran, R., Debar, H., Yung, M. (eds.) SecureComm 2021. LNICST, vol. 398, pp. 249–269. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-90019-9_13

    Chapter  Google Scholar 

  34. Xing, Y., Li, S.: A compact hardware implementation of CCA-secure key exchange mechanism CRYSTALS-KYBER on FPGA. In: IACR Transactions on Cryptographic Hardware and Embedded Systems, pp. 328–356 (2021)

    Google Scholar 

Download references

Acknowledgements

We would like to thank the anonymous reviewers for their careful reading of our manuscript and their many insightful comments and suggestions. We are grateful to Massimiliano Albanese for helping us to improve our paper. This work is supported in part by National Key RD Plan of China under Grant No. 2020YFB1005803, the National Natural Science Foundation of China No. 61902392, CCF-Tencent Open Fund under Grant No. RAGR20210131 and CCF-Huawei Populus euphratica Fund.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fangyu Zheng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wan, L. et al. (2022). A Novel High-Performance Implementation of CRYSTALS-Kyber with AI Accelerator. In: Atluri, V., Di Pietro, R., Jensen, C.D., Meng, W. (eds) Computer Security – ESORICS 2022. ESORICS 2022. Lecture Notes in Computer Science, vol 13556. Springer, Cham. https://doi.org/10.1007/978-3-031-17143-7_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-17143-7_25

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-17142-0

  • Online ISBN: 978-3-031-17143-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics