Abstract
We present Adrastea, an efficient FPGA design environment for developing scientific machine learning applications. FPGA development is challenging, from deployment, proper toolchain setup, programming methods, interfacing FPGA kernels, and more importantly, the need to explore design space choices to get the best performance and area usage from the FPGA kernel design. Adrastea provides an automated and scalable design flow to parameterize, implement, and optimize complex FPGA kernels and associated interfaces. We show how virtualization of the development environment via virtual machines is leveraged to simplify the setup of the FPGA toolchain while deploying the FPGA boards and while scaling up the automated design space exploration to leverage multiple machines concurrently. Adrastea provides an automated build and test environment of FPGA kernels. By exposing design space hyper-parameters, Adrastea can automatically search the design space in parallel to optimize the FPGA design for a given metric, usually performance or area. Adrastea simplifies the task of interfacing with the FPGA kernels with a simplified interface API. To demonstrate the capabilities of Adrastea, we implement a complex random forest machine learning kernel with 10,000 input features while achieving extremely low computing latency without loss of prediction accuracy, which is required by a scientific edge application at SNS. We also demonstrate Adrastea using an FFT kernel and show that for both applications Adrastea is able to systematically and efficiently evaluate different design options, which reduced the time and effort required to develop the kernel from months of manual work to days of automatic builds.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Cabrera, A.M., Young, A.R., Vetter, J.S.: Design and analysis of cxl performance models for tightly-coupled heterogeneous computing. In: Proceedings of the 1st International Workshop on Extreme Heterogeneity Solutions, ExHET 2022. Association for Computing Machinery, New York (2022). https://doi.org/10.1145/3529336.3530817
Chacko, J., Sahin, C., Nguyen, D., Pfeil, D., Kandasamy, N., Dandekar, K.: FPGA-based latency-insensitive OFDM pipeline for wireless research. In: 2014 IEEE High Performance Extreme Computing Conference (HPEC), pp. 1–6. IEEE (2014)
Cock, D., et al.: Enzian: an open, general, CPU/FPGA platform for systems software research. In: Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2022, pp. 434–451. Association for Computing Machinery, New York (2022). https://doi.org/10.1145/3503222.3507742
Dufour, C., Cense, S., Ould-Bachir, T., Grégoire, L.A., Bélanger, J.: General-purpose reconfigurable low-latency electric circuit and motor drive solver on FPGA. In: IECON 2012-38th Annual Conference on IEEE Industrial Electronics Society, pp. 3073–3081. IEEE (2012)
Farabet, C., Poulet, C., Han, J.Y., LeCun, Y.: CNP: an FPGA-based processor for convolutional networks. In: 2009 International Conference on Field Programmable Logic and Applications, pp. 32–37. IEEE (2009)
Giordano, R., Aloisio, A.: Protocol-independent, fixed-latency links with FPGA-embedded serdeses. J. Instrum. 7(05), P05004 (2012)
Henderson, S., et al.: The spallation neutron source accelerator system design. Nucl. Instrum. Methods Phys. Res. Sect. A 763, 610–673 (2014)
Huang, B., Huan, Y., Xu, L.D., Zheng, L., Zou, Z.: Automated trading systems statistical and machine learning methods and hardware implementation: a survey. Enterp. Inf. Syst. 13(1), 132–144 (2019)
Islam, M.M., Hossain, M.S., Hasan, M.K., Shahjalal, M., Jang, Y.M.: FPGA implementation of high-speed area-efficient processor for elliptic curve point multiplication over prime field. IEEE Access 7, 178811–178826 (2019)
Javeed, K., Wang, X.: Low latency flexible FPGA implementation of point multiplication on elliptic curves over GF (P). Int. J. Circuit Theory Appl. 45(2), 214–228 (2017)
Kathail, V.: Xilinx vitis unified software platform. In: Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, FPGA 2020, pp. 173–174. Association for Computing Machinery, New York (2020). https://doi.org/10.1145/3373087.3375887
Kim, J., Lee, S., Johnston, B., Vetter, J.S.: IRIS: a portable runtime system exploiting multiple heterogeneous programming systems. In: Proceedings of the 25th IEEE High Performance Extreme Computing Conference, HPEC 2021, pp. 1–8 (2021). https://doi.org/10.1109/HPEC49654.2021.9622873
Liu, F., Miniskar, N.R., Chakraborty, D., Vetter, J.S.: Deffe: a data-efficient framework for performance characterization in domain-specific computing. In: Proceedings of the 17th ACM International Conference on Computing Frontiers, pp. 182–191 (2020)
Lockwood, J.W., Gupte, A., Mehta, N., Blott, M., English, T., Vissers, K.: A low-latency library in FPGA hardware for high-frequency trading (HFT). In: 2012 IEEE 20th Annual Symposium on High-Performance Interconnects, pp. 9–16. IEEE (2012)
Miniskar, N., Young, A., Liu, F., Blokland, W., Cabrera, A., Vetter, J.: Ultra low latency machine learning for scientific edge applications. In: Proceedings of 32nd International Conference on Field Programmable Logic and Applications (FPL 2022). IEEE (2022)
Morris, G.W., Thomas, D.B., Luk, W.: FPGA accelerated low-latency market data feed processing. In: 2009 17th IEEE Symposium on High Performance Interconnects, pp. 83–89. IEEE (2009)
Puš, V., Kekely, L., Kořenek, J.: Low-latency modular packet header parser for FPGA. In: 2012 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS), pp. 77–78. IEEE (2012)
Rodríguez-Andina, J.J., Valdes-Pena, M.D., Moure, M.J.: Advanced features and industrial applications of FPGAs-a review. IEEE Trans. Industr. Inf. 11(4), 853–864 (2015)
Sarkar, T.: DOEPY design of experiments. https://doepy.readthedocs.io/en/latest/. Accessed 30 Sept 2020
Sidler, D., Alonso, G., Blott, M., Karras, K., Vissers, K., Carley, R.: Scalable 10GBPS TCP/IP stack architecture for reconfigurable hardware. In: 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines, pp. 36–43. IEEE (2015)
Somnath, S., Belianinov, A., Kalinin, S.V., Jesse, S.: Rapid mapping of polarization switching through complete information acquisition. Nat. Commun. 7(1), 1–8 (2016). https://doi.org/10.1038/ncomms13290
Wang, Z., Schafer, B.C.: Learning from the past: efficient high-level synthesis design space exploration for FPGAs. ACM Trans. Des. Autom. Electron. Syst. 27(4), 1–23 (2022). https://doi.org/10.1145/3495531
Xilinx: Vitis high-level synthesis user guide (UG1399) (2022). https://docs.xilinx.com/r/en-US/ug1399-vitis-hls
Acknowledgments
This research used resources of the Experimental Computing Laboratory (ExCL) at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Young, A.R., Miniskar, N.R., Liu, F., Blokland, W., Vetter, J.S. (2022). Adrastea: An Efficient FPGA Design Environment for Heterogeneous Scientific Computing and Machine Learning. In: Doug, K., Al, G., Pophale, S., Liu, H., Parete-Koon, S. (eds) Accelerating Science and Engineering Discoveries Through Integrated Research Infrastructure for Experiment, Big Data, Modeling and Simulation. SMC 2022. Communications in Computer and Information Science, vol 1690. Springer, Cham. https://doi.org/10.1007/978-3-031-23606-8_14
Download citation
DOI: https://doi.org/10.1007/978-3-031-23606-8_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-23605-1
Online ISBN: 978-3-031-23606-8
eBook Packages: Computer ScienceComputer Science (R0)