Abstract
In traditional multiple precision large integer multiplication algorithm, the required number of additions approximates the number of multiplications needed. In some platforms, the great number of add instructions will occupy about half of computing latency in the overall implementation. In this paper, we propose a multiplication algorithm using separated multiply-add-with-carry instruction supported by NVIDIA GPUs. In the algorithm, we reorder the computational sequence, in which nearly all additions and carry flags handling can be combined with the multiplication instructions. The number of add instructions needed decreases from \(O(n^2)\) in prevailing schoolbook algorithm to \(O(n)\). Our resulting 256-bit modular multiplication and modular square over Mersenne prime respectively achieve 3.3837 billion and 5.9928 billion operations per second and reach 96 % of GPU hardware limitation. An elliptic curve point multiplication implementation using our algorithm achieves 43.6 % speedup compared to the existing fastest work.
F. Zheng—This work was partially supported by the National 973 Program of China under award No. 2013CB338001 and the Strategic Priority Research Program of Chinese Academy of Sciences under Grant XDA06010702.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Antão, S., Bajard, J.C., Sousa, L.: Elliptic curve point multiplication on GPUs. In: IEEE International Conference on Application-specific Systems Architectures and Processors (ASAP), pp. 192–199 (2010)
Antão, S., Bajard, J.C., Sousa, L.: RNS-Based elliptic curve point multiplication for massive parallel architectures. Comput. J. 55(5), 629–647 (2012)
Bajard, J.C., Didier, L.S., Kornerup, P.: Modular multiplication and base extensions in residue number systems. In: Proceedings of the 15th IEEE Symposium on Computer Arithmetic, pp. 59–65 (2001)
Bernstein, D.J., Chen, H.C., Chen, M.S., Cheng, C.M., Hsiao, C.H., Lange, T., Lin, Z.C., Yang, B.Y.: The billion-mulmod-per-second PC. In: Workshop Record of SHARCS, vol. 9, pp. 131–144 (2009)
Bernstein, D.J., Chen, T.-R., Cheng, C.-M., Lange, T., Yang, B.-Y.: ECM on graphics cards. In: Joux, A. (ed.) EUROCRYPT 2009. LNCS, vol. 5479, pp. 483–501. Springer, Heidelberg (2009)
Bos, J.W.: Low-latency elliptic curve scalar multiplication. Int. J Parallel Prog. 40(5), 532–550 (2012)
Chinese Commercial Cryptography Administration Office: public key cryptographic algorithm SM2 based on elliptic curves (in Chinese) (2013). http://www.oscca.gov.cn/UpFile/2010122214822692.pdf
Cohen, A.E., Parhi, K.K.: GPU accelerated elliptic curve cryptography in GF (\(2^m\)). In: IEEE International Midwest Symposium on Circuits and Systems (MWSCAS), pp. 57–60 (2010)
Giorgi, P., Izard, T., Tisserand, A., et al.: Comparison of modular arithmetic algorithms on GPUs. In: ParCo’09: International Conference on Parallel Computing (2009)
Hankerson, D., Vanstone, S., Menezes, A.J.: Guide to Elliptic Curve Cryptography. Springer, New York (2004)
Harrison, O., Waldron, J.: Efficient acceleration of asymmetric cryptography on graphics hardware. In: Preneel, B. (ed.) AFRICACRYPT 2009. LNCS, vol. 5580, pp. 350–367. Springer, Heidelberg (2009)
Henry, R., Goldberg, I.: Solving discrete logarithms in smooth-order groups with CUDA. In: Workshop Record of SHARCS, pp. 101–118. Citeseer (2012)
Koblitz, N.: Elliptic curve cryptosystems. Math. Comput. 48(177), 203–209 (1987)
Koç, Ç.K., Acar, T., Kaliski Jr, B.S.: Analyzing and comparing montgomery multiplication algorithms. Micro IEEE 16(3), 26–33 (1996)
Miller, V.S.: Use of elliptic curves in cryptography. In: Williams, H.C. (ed.) CRYPTO 1985. LNCS, vol. 218, pp. 417–426. Springer, Heidelberg (1986)
Moss, A., Page, D., Smart, N.P.: Toward acceleration of RSA using 3D graphics hardware. In: Galbraith, S.D. (ed.) Cryptography and Coding 2007. LNCS, vol. 4887, pp. 364–383. Springer, Heidelberg (2007)
Neves, S., Araujo, F.: On the performance of GPU public-key cryptography. In: IEEE International Conference on Application-Specific Systems, Architectures and Processors (ASAP), pp. 133–140 (2011)
NVIDIA: CUDA Toolkit Documentation v6.0 (2014). http://docs.nvidia.com/cuda/index.html#axzz39iNG9lqx
Pu, S., Liu, J.-C.: EAGL: an elliptic curve arithmetic GPU-based library for bilinear pairing. In: Cao, Z., Zhang, F. (eds.) Pairing 2013. LNCS, vol. 8365, pp. 1–19. Springer, Heidelberg (2014)
Rivest, R.L., Shamir, A., Adleman, L.: A method for obtaining digital signatures and public-key cryptosystems. Commun. ACM 21(2), 120–126 (1978)
Solinas, J.A.: Generalized mersenne numbers. Citeseer, Bielefeld (1999)
Szerwinski, R., Güneysu, T.: Exploiting the power of GPUs for asymmetric cryptography. In: Oswald, E., Rohatgi, P. (eds.) CHES 2008. LNCS, vol. 5154, pp. 79–99. Springer, Heidelberg (2008)
Wikipedia: Wikipedia: list of NVIDIA graphics processing units (2014). http://en.wikipedia.org/wiki/Comparison_of_NVIDIA_Graphics_Processing_Units
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Zheng, F., Pan, W., Lin, J., Jing, J., Zhao, Y. (2015). Exploiting the Potential of GPUs for Modular Multiplication in ECC. In: Rhee, KH., Yi, J. (eds) Information Security Applications. WISA 2014. Lecture Notes in Computer Science(), vol 8909. Springer, Cham. https://doi.org/10.1007/978-3-319-15087-1_23
Download citation
DOI: https://doi.org/10.1007/978-3-319-15087-1_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-15086-4
Online ISBN: 978-3-319-15087-1
eBook Packages: Computer ScienceComputer Science (R0)