Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Montgomery Multiplication Scalable Systolic Designs Optimized for DSP48E2

Published: 27 January 2024 Publication History

Abstract

This article describes an extensive study of the use of DSP48E2 Slices in Ultrascale FPGAs to design hardware versions of the Montgomery Multiplication algorithm for the hardware acceleration of modular multiplications. Our fully scalable systolic architectures result in parallelized, DSP48E2-optimized scheduling of operations analogous to the FIOS block variant of the Montgomery Multiplication. We explore the impacts of different pipelining strategies within DSP blocks, scheduling of operations, processing element configurations, global design structures and their tradeoffs in terms of performance and resource costs. We discuss the application of our methodology to multiple types of DSP primitives. We provide ready-to-use fast, efficient, and fully parametrizable designs, which can adapt to a wide range of requirements and applications. Implementations are scalable to any operand width. Our most efficient designs can perform 128, 256, 512, 1024, 2048, and 4096 bits Montgomery modular multiplications in 0.0992 μs, 0.2032 μs, 0.3952 μs, 0.7792μs, 1.550 μs, and 3.099 μs using 4, 6, 11, 21, 41, and 82 DSP blocks, respectively.

References

[1]
Ahmed A. H. Abd-Elkader, Mostafa Rashdan, El-Sayed A. M. Hasaneen, and Hesham F. A. Hamed. 2021. FPGA-based optimized design of montgomery modular multiplier. IEEE Transactions on Circuits and Systems II: Express Briefs 68, 6 (2021), 2137–2141. DOI:
[2]
Javad Ahsan, Mohammad Esmaeildoust, Amer Kaabi, and Vahid Zarei. 2022. Efficient FPGA implementation of RNS montgomery multiplication using balanced RNS bases. Integration 84 (012022), 72–83. DOI:
[3]
Rami E. l. Khatib, Reza Azarderakhsh, and Mehran Mozaffari-Kermani. 2019. Optimized algorithms and architectures for montgomery multiplication for post-quantum cryptography. In Cryptology and Network Security. Yi Mu, Robert H. Deng, and Xinyi Huang (Eds.). Springer International Publishing, Cham, 83–98.
[4]
Sayed Mohammad-Hossein Farzam, Siavash Bayat-Sarmadi, Hatameh Mosanaei-Boorani, and Armin Alivand. 2022. Fast supersingular isogeny diffie-hellman and key encapsulation using a customized pipelined montgomery multiplier. IEEE Transactions on Circuits and Systems I: Regular Papers 69, 3 (2022), 1221–1230. DOI:
[5]
Gabriel Gallin and Arnaud Tisserand. 2019. Generation of finely-pipelined GF(PP) multipliers for flexible curve based cryptography on FPGAs. IEEE Transactions on Computers 68, 11 (2019), 1612–1622. DOI:
[6]
Paolo Gastaldo, G. Parodi, and R. Zunino. 2008. Enhanced montgomery multiplication on DSP architectures for embedded public-key cryptosystems. EURASIP Journal on Embedded Systems Article ID 583926, 2008 (012008), 1. DOI:
[7]
Miaoqing Huang, Kris Gaj, and Tarek El-Ghazawi. 2011. New hardware architectures for montgomery modular multiplication algorithm. IEEE Transactions on Computers 60, 7 (2011), 923–936. DOI:
[8]
C. Kaya Koc, T. Acar, and B. S. Kaliski. 1996. Analyzing and comparing montgomery multiplication algorithms. IEEE Micro 16, 3 (1996), 26–33. DOI:
[9]
Neal Koblitz. 1987. Elliptic curve cryptosystems. Mathematics of Computation 48, 177 (Jan.1987), 203–209. MCMPAF
[10]
C. McIvor, M. McLoone, and J. V. McCanny. 2004. FPGA montgomery multiplier architectures—a comparison. In Proceedings of the 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines. 279–282. DOI:
[11]
Arpan Mondal, Santosh Ghosh, Abhijit Das, and Dipanwita Roy Chowdhury. 2012. Efficient FPGA implementation of montgomery multiplier using DSP blocks. Proceedings of the 16th international conference on Progress in VLSI Design and Test370–372. DOI:
[12]
Peter L. Montgomery. 1985. Modular multiplication without trial division. Mathematics of Computation 44, 170 (1985), 519–521.
[13]
Peter L. Montgomery. 1987. Speeding the pollard and elliptic curve methods of factorization. Mathematics of Computation 48 (1987), 243–264.
[14]
Amine Mrabet, Nadia El Mrabet, Ronan Lashermes, Jean-Baptiste Rigaud, Belgacem Bouallegue, Sihem Mesnager, and Mohsen Machhout. 2017. A scalable and systolic architectures of montgomery modular multiplication for public key cryptosystems based on DSPs. Journal of Hardware and Systems Security 1, 3 (2017), 219–236. DOI:
[15]
Louis NOYEZ. 2017. FIOS DSP Montgomery Multiplier. (2017). Retrieved from https://github.com/LOUISNOYEZ/FIOS_DSP_MM
[16]
R. L. Rivest, A. Shamir, and L. Adleman. 1978. A method for obtaining digital signatures and public-key cryptosystems. Communications of the ACM 21, 2 (Feb.1978), 120–126. DOI:
[17]
Alexandre F. Tenca, Georgi Todorov, and Çetin K. Koç. 2001. High-radix design of a scalable modular multiplier. In Cryptographic Hardware and Embedded Systems—CHES 2001. Çetin K. Koç, David Naccache, and Christof Paar (Eds.). Springer, Berlin, 185–201.
[18]
Alexandre F. Tenca and Çetin Kaya Koç.1999. A scalable architecture for montgomery multiplication. In Proceedings of the Workshop on Cryptographic Hardware and Embedded Systems.
[19]
Colin D. Walter. 2017. Hardware aspects of montgomery modular multiplication. Topics in Computational Number Theory, P. L. Montgomery (Ed.). Cambridge University Press, Cambridge.
[20]
Xilinx 2022. UltraScale Architecture DSP Slice. Xilinx. Retrieved from https://0x04.net/mwk/xidocs/ug/ug579-ultrascale-dsp.pdf

Cited By

View all
  • (2024)High-Throughput Bilinear Pairing Processor for Server-Side FPGA ApplicationsIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2024.340216432:8(1498-1511)Online publication date: 1-Aug-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Reconfigurable Technology and Systems
ACM Transactions on Reconfigurable Technology and Systems  Volume 17, Issue 1
March 2024
446 pages
EISSN:1936-7414
DOI:10.1145/3613534
  • Editor:
  • Deming Chen
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 January 2024
Online AM: 15 September 2023
Accepted: 04 September 2023
Revised: 23 August 2023
Received: 17 February 2023
Published in TRETS Volume 17, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Montgomery Multiplication
  2. hardware acceleration
  3. FPGA
  4. DSP
  5. systolic architecture

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)256
  • Downloads (Last 6 weeks)13
Reflects downloads up to 10 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)High-Throughput Bilinear Pairing Processor for Server-Side FPGA ApplicationsIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2024.340216432:8(1498-1511)Online publication date: 1-Aug-2024

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media