Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–29 of 29 results for author: Blott, M

.
  1. arXiv:2403.18374  [pdf, other

    cs.DC cs.AR

    Optimizing Communication for Latency Sensitive HPC Applications on up to 48 FPGAs Using ACCL

    Authors: Marius Meyer, Tobias Kenter, Lucian Petrica, Kenneth O'Brien, Michaela Blott, Christian Plessl

    Abstract: Most FPGA boards in the HPC domain are well-suited for parallel scaling because of the direct integration of versatile and high-throughput network ports. However, the utilization of their network capabilities is often challenging and error-prone because the whole network stack and communication patterns have to be implemented and managed on the FPGAs. Also, this approach conceptually involves a tr… ▽ More

    Submitted 7 April, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

  2. arXiv:2312.11742  [pdf, other

    cs.DC cs.AR cs.LG cs.NI

    ACCL+: an FPGA-Based Collective Engine for Distributed Applications

    Authors: Zhenhao He, Dario Korolija, Yu Zhu, Benjamin Ramhorst, Tristan Laan, Lucian Petrica, Michaela Blott, Gustavo Alonso

    Abstract: FPGAs are increasingly prevalent in cloud deployments, serving as Smart NICs or network-attached accelerators. Despite their potential, developing distributed FPGA-accelerated applications remains cumbersome due to the lack of appropriate infrastructure and communication abstractions. To facilitate the development of distributed applications with FPGAs, in this paper we propose ACCL+, an open-sour… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

  3. arXiv:2311.12359  [pdf, other

    cs.CV cs.AI cs.AR cs.LG cs.PF

    Shedding the Bits: Pushing the Boundaries of Quantization with Minifloats on FPGAs

    Authors: Shivam Aggarwal, Hans Jakob Damsgaard, Alessandro Pappalardo, Giuseppe Franco, Thomas B. Preußer, Michaela Blott, Tulika Mitra

    Abstract: Post-training quantization (PTQ) is a powerful technique for model compression, reducing the numerical precision in neural networks without additional training overhead. Recent works have investigated adopting 8-bit floating-point formats(FP8) in the context of PTQ for model inference. However, floating-point formats smaller than 8 bits and their relative comparison in terms of accuracy-hardware c… ▽ More

    Submitted 5 July, 2024; v1 submitted 21 November, 2023; originally announced November 2023.

    Comments: Accepted in FPL (International Conference on Field-Programmable Logic and Applications) 2024 conference. Revised with updated results

  4. arXiv:2212.04703  [pdf, other

    eess.SP cs.AR cs.CC cs.LG

    Implementing Neural Network-Based Equalizers in a Coherent Optical Transmission System Using Field-Programmable Gate Arrays

    Authors: Pedro J. Freire, Sasipim Srivallapanondh, Michael Anderson, Bernhard Spinnler, Thomas Bex, Tobias A. Eriksson, Antonio Napoli, Wolfgang Schairer, Nelson Costa, Michaela Blott, Sergei K. Turitsyn, Jaroslaw E. Prilepsky

    Abstract: In this work, we demonstrate the offline FPGA realization of both recurrent and feedforward neural network (NN)-based equalizers for nonlinearity compensation in coherent optical transmission systems. First, we present a realization pipeline showing the conversion of the models from Python libraries to the FPGA chip synthesis and implementation. Then, we review the main alternatives for the hardwa… ▽ More

    Submitted 19 February, 2023; v1 submitted 9 December, 2022; originally announced December 2022.

    Comments: Invited paper at Journal of Lightwave Technology - IEEE

  5. arXiv:2209.14065  [pdf, other

    cs.AR cs.LG physics.ins-det

    LL-GNN: Low Latency Graph Neural Networks on FPGAs for High Energy Physics

    Authors: Zhiqiang Que, Hongxiang Fan, Marcus Loo, He Li, Michaela Blott, Maurizio Pierini, Alexander Tapper, Wayne Luk

    Abstract: This work presents a novel reconfigurable architecture for Low Latency Graph Neural Network (LL-GNN) designs for particle detectors, delivering unprecedented low latency performance. Incorporating FPGA-based GNNs into particle detectors presents a unique challenge since it requires sub-microsecond latency to deploy the networks for online event selection with a data rate of hundreds of terabytes p… ▽ More

    Submitted 9 January, 2024; v1 submitted 28 September, 2022; originally announced September 2022.

    Comments: This paper has been accepted by ACM Transactions on Embedded Computing Systems (TECS)

  6. arXiv:2206.12180  [pdf, other

    eess.SP cs.LG

    Towards FPGA Implementation of Neural Network-Based Nonlinearity Mitigation Equalizers in Coherent Optical Transmission Systems

    Authors: Pedro J. Freire, Michael Anderson, Bernhard Spinnler, Thomas Bex, Jaroslaw E. Prilepsky, Tobias A. Eriksson, Nelson Costa, Wolfgang Schairer, Michaela Blott, Antonio Napoli, Sergei K. Turitsyn

    Abstract: For the first time, recurrent and feedforward neural network-based equalizers for nonlinearity compensation are implemented in an FPGA, with a level of complexity comparable to that of a dispersion equalizer. We demonstrate that the NN-based equalizers can outperform a 1 step-per-span DBP.

    Submitted 24 June, 2022; originally announced June 2022.

    Comments: Accepted Oral in the European Conference on Optical Communication (ECOC) 2022

  7. arXiv:2206.11791  [pdf, other

    cs.LG cs.AR

    Open-source FPGA-ML codesign for the MLPerf Tiny Benchmark

    Authors: Hendrik Borras, Giuseppe Di Guglielmo, Javier Duarte, Nicolò Ghielmetti, Ben Hawks, Scott Hauck, Shih-Chieh Hsu, Ryan Kastner, Jason Liang, Andres Meza, Jules Muhizi, Tai Nguyen, Rushil Roy, Nhan Tran, Yaman Umuroglu, Olivia Weng, Aidan Yokuda, Michaela Blott

    Abstract: We present our development experience and recent results for the MLPerf Tiny Inference Benchmark on field-programmable gate array (FPGA) platforms. We use the open-source hls4ml and FINN workflows, which aim to democratize AI-hardware codesign of optimized neural networks on FPGAs. We present the design and implementation process for the keyword spotting, anomaly detection, and image classificatio… ▽ More

    Submitted 23 June, 2022; originally announced June 2022.

    Comments: 15 pages, 7 figures, Contribution to 3rd Workshop on Benchmarking Machine Learning Workloads on Emerging Hardware (MLBench) at 5th Conference on Machine Learning and Systems (MLSys)

    Report number: FERMILAB-CONF-22-479-SCD

  8. arXiv:2206.07527  [pdf, other

    cs.LG cs.AR cs.PL stat.ML

    QONNX: Representing Arbitrary-Precision Quantized Neural Networks

    Authors: Alessandro Pappalardo, Yaman Umuroglu, Michaela Blott, Jovan Mitrevski, Ben Hawks, Nhan Tran, Vladimir Loncar, Sioni Summers, Hendrik Borras, Jules Muhizi, Matthew Trahms, Shih-Chieh Hsu, Scott Hauck, Javier Duarte

    Abstract: We present extensions to the Open Neural Network Exchange (ONNX) intermediate representation format to represent arbitrary-precision quantized neural networks. We first introduce support for low precision quantization in existing ONNX-based quantization formats by leveraging integer clipping, resulting in two new backward-compatible variants: the quantized operator format with clipping and quantiz… ▽ More

    Submitted 24 June, 2022; v1 submitted 15 June, 2022; originally announced June 2022.

    Comments: 9 pages, 5 figures, Contribution to 4th Workshop on Accelerated Machine Learning (AccML) at HiPEAC 2022 Conference

    Report number: FERMILAB-CONF-22-471-SCD

  9. arXiv:2202.02310  [pdf, other

    cs.LG cs.AR

    EcoFlow: Efficient Convolutional Dataflows for Low-Power Neural Network Accelerators

    Authors: Lois Orosa, Skanda Koppula, Yaman Umuroglu, Konstantinos Kanellopoulos, Juan Gomez-Luna, Michaela Blott, Kees Vissers, Onur Mutlu

    Abstract: Dilated and transposed convolutions are widely used in modern convolutional neural networks (CNNs). These kernels are used extensively during CNN training and inference of applications such as image segmentation and high-resolution image generation. Although these kernels have grown in popularity, they stress current compute systems due to their high memory intensity, exascale compute demands, and… ▽ More

    Submitted 4 February, 2022; originally announced February 2022.

  10. arXiv:2201.11409  [pdf, ps, other

    cs.AR

    On the RTL Implementation of FINN Matrix Vector Compute Unit

    Authors: Syed Asad Alam, David Gregg, Giulio Gambardella, Thomas Preusser, Michaela Blott

    Abstract: FPGA-based accelerators are becoming more popular for deep neural network due to the ability to scale performance with increasing degree of specialization with dataflow architectures or custom data types. To reduce the barrier for software engineers and data scientists to adopt FPGAs, C++- and OpenCL-based design entries with high-level synthesis (HLS) have been introduced. They provide higher abs… ▽ More

    Submitted 10 April, 2022; v1 submitted 27 January, 2022; originally announced January 2022.

    Comments: 22 pages, 7 tables, 16 figures

    ACM Class: B.5.0; B.2.5

  11. arXiv:2110.13041  [pdf, other

    cs.LG cs.AR physics.data-an physics.ins-det

    Applications and Techniques for Fast Machine Learning in Science

    Authors: Allison McCarn Deiana, Nhan Tran, Joshua Agar, Michaela Blott, Giuseppe Di Guglielmo, Javier Duarte, Philip Harris, Scott Hauck, Mia Liu, Mark S. Neubauer, Jennifer Ngadiuba, Seda Ogrenci-Memik, Maurizio Pierini, Thea Aarrestad, Steffen Bahr, Jurgen Becker, Anne-Sophie Berthold, Richard J. Bonventre, Tomas E. Muller Bravo, Markus Diefenthaler, Zhen Dong, Nick Fritzsche, Amir Gholami, Ekaterina Govorkova, Kyle J Hazelwood , et al. (62 additional authors not shown)

    Abstract: In this community review report, we discuss applications and techniques for fast machine learning (ML) in science -- the concept of integrating power ML methods into the real-time experimental data processing loop to accelerate scientific discovery. The material for the report builds on two workshops held by the Fast ML for Science community and covers three main areas: applications for fast ML ac… ▽ More

    Submitted 25 October, 2021; originally announced October 2021.

    Comments: 66 pages, 13 figures, 5 tables

    Report number: FERMILAB-PUB-21-502-AD-E-SCD

    Journal ref: Front. Big Data 5, 787421 (2022)

  12. arXiv:2011.07317  [pdf, other

    cs.AR

    Memory-Efficient Dataflow Inference for Deep CNNs on FPGA

    Authors: Lucian Petrica, Tobias Alonso, Mairin Kroes, Nicholas Fraser, Sorin Cotofana, Michaela Blott

    Abstract: Custom dataflow Convolutional Neural Network (CNN) inference accelerators on FPGA are tailored to a specific CNN topology and store parameters in On-Chip Memory (OCM), resulting in high energy efficiency and low inference latency. However, in these accelerators the shapes of parameter memories are dictated by throughput constraints and do not map well to the underlying OCM, which becomes an implem… ▽ More

    Submitted 14 November, 2020; originally announced November 2020.

    Comments: To appear in FPT 2020 proceedings

  13. arXiv:2011.05873  [pdf, ps, other

    cs.LG cs.CV

    FAT: Training Neural Networks for Reliable Inference Under Hardware Faults

    Authors: Ussama Zahid, Giulio Gambardella, Nicholas J. Fraser, Michaela Blott, Kees Vissers

    Abstract: Deep neural networks (DNNs) are state-of-the-art algorithms for multiple applications, spanning from image classification to speech recognition. While providing excellent accuracy, they often have enormous compute and memory requirements. As a result of this, quantized neural networks (QNNs) are increasingly being adopted and deployed especially on embedded devices, thanks to their high accuracy,… ▽ More

    Submitted 11 November, 2020; originally announced November 2020.

  14. arXiv:2004.03021  [pdf, other

    eess.SP cs.AR cs.LG

    LogicNets: Co-Designed Neural Networks and Circuits for Extreme-Throughput Applications

    Authors: Yaman Umuroglu, Yash Akhauri, Nicholas J. Fraser, Michaela Blott

    Abstract: Deployment of deep neural networks for applications that require very high throughput or extremely low latency is a severe computational challenge, further exacerbated by inefficiencies in mapping the computation to hardware. We present a novel method for designing neural network topologies that directly map to a highly efficient FPGA implementation. By exploiting the equivalence of artificial neu… ▽ More

    Submitted 6 April, 2020; originally announced April 2020.

  15. arXiv:2003.12449  [pdf, other

    cs.DC cs.NE eess.SP

    Evolutionary Bin Packing for Memory-Efficient Dataflow Inference Acceleration on FPGA

    Authors: Mairin Kroes, Lucian Petrica, Sorin Cotofana, Michaela Blott

    Abstract: Convolutional neural network (CNN) dataflow inference accelerators implemented in Field Programmable Gate Arrays (FPGAs) have demonstrated increased energy efficiency and lower latency compared to CNN execution on CPUs or GPUs. However, the complex shapes of CNN parameter memories do not typically map well to FPGA on-chip memories (OCM), which results in poor OCM utilization and ultimately limits… ▽ More

    Submitted 24 March, 2020; originally announced March 2020.

    Comments: To appear in GECCO 2020 Proceedings

  16. arXiv:1912.07394  [pdf, ps, other

    eess.SP cs.CV cs.LG

    Efficient Error-Tolerant Quantized Neural Network Accelerators

    Authors: Giulio Gambardella, Johannes Kappauf, Michaela Blott, Christoph Doehring, Martin Kumm, Peter Zipf, Kees Vissers

    Abstract: Neural Networks are currently one of the most widely deployed machine learning algorithms. In particular, Convolutional Neural Networks (CNNs), are gaining popularity and are evaluated for deployment in safety critical applications such as self driving vehicles. Modern CNNs feature enormous memory bandwidth and high computational needs, challenging existing hardware platforms to meet throughput, l… ▽ More

    Submitted 16 December, 2019; originally announced December 2019.

    Comments: 6 pages, 5 figures

    Journal ref: 2019 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT)

  17. arXiv:1910.04313  [pdf

    physics.app-ph

    Real-Time Machine Learning Based Fiber-Induced Nonlinearity Compensation in Energy-Efficient Coherent Optical Networks

    Authors: Elias Giacoumidis, Yi Lin, Michaela Blott, Liam P. Barry

    Abstract: We experimentally demonstrate the first field-programmable gate-array-based real-time fiber nonlinearity compensator (NLC) using sparse K-means++ machine learning clustering in an energy-efficient 40-Gb/s 16-quadrature amplitude modulated self-coherent optical system. Our real-time NLC shows up to 3 dB improvement in Q-factor compared to linear equalization at 50 km of transmission.

    Submitted 9 October, 2019; originally announced October 2019.

    Comments: Submitted to ECOC post-deadline, Sep. 2019, Dublin, Ireland

  18. arXiv:1909.05009  [pdf, other

    cs.AR

    QuTiBench: Benchmarking Neural Networks on Heterogeneous Hardware

    Authors: Michaela Blott, Lisa Halder, Miriam Leeser, Linda Doyle

    Abstract: Neural Networks have become one of the most successful universal machine learning algorithms. They play a key role in enabling machine vision and speech recognition for example. Their computational complexity is enormous and comes along with equally challenging memory requirements, which limits deployment in particular within energy constrained, embedded environments. In order to address these imp… ▽ More

    Submitted 17 November, 2019; v1 submitted 11 September, 2019; originally announced September 2019.

    Journal ref: ACM JETC 2019

  19. Synetgy: Algorithm-hardware Co-design for ConvNet Accelerators on Embedded FPGAs

    Authors: Yifan Yang, Qijing Huang, Bichen Wu, Tianjun Zhang, Liang Ma, Giulio Gambardella, Michaela Blott, Luciano Lavagno, Kees Vissers, John Wawrzynek, Kurt Keutzer

    Abstract: Using FPGAs to accelerate ConvNets has attracted significant attention in recent years. However, FPGA accelerator design has not leveraged the latest progress of ConvNets. As a result, the key application characteristics such as frames-per-second (FPS) are ignored in favor of simply counting GOPs, and results on accuracy, which is critical to application success, are often not even reported. In th… ▽ More

    Submitted 10 May, 2020; v1 submitted 21 November, 2018; originally announced November 2018.

    Comments: Update to the latest results

  20. arXiv:1809.04570  [pdf, other

    cs.AR

    FINN-R: An End-to-End Deep-Learning Framework for Fast Exploration of Quantized Neural Networks

    Authors: Michaela Blott, Thomas Preusser, Nicholas Fraser, Giulio Gambardella, Kenneth O'Brien, Yaman Umuroglu

    Abstract: Convolutional Neural Networks have rapidly become the most successful machine learning algorithm, enabling ubiquitous machine vision and intelligent decisions on even embedded computing-systems. While the underlying arithmetic is structurally simple, compute and memory requirements are challenging. One of the promising opportunities is leveraging reduced-precision representations for inputs, activ… ▽ More

    Submitted 12 September, 2018; originally announced September 2018.

    Comments: to be published in ACM TRETS Special Edition on Deep Learning

  21. arXiv:1807.10577  [pdf, other

    cs.CV

    Accuracy to Throughput Trade-offs for Reduced Precision Neural Networks on Reconfigurable Logic

    Authors: Jiang Su, Nicholas J. Fraser, Giulio Gambardella, Michaela Blott, Gianluca Durelli, David B. Thomas, Philip Leong, Peter Y. K. Cheung

    Abstract: Modern CNN are typically based on floating point linear algebra based implementations. Recently, reduced precision NN have been gaining popularity as they require significantly less memory and computational resources compared to floating point. This is particularly important in power constrained compute environments. However, in many cases a reduction in precision comes at a small cost to the accu… ▽ More

    Submitted 17 July, 2018; originally announced July 2018.

    Comments: Accepted by ARC 2018

  22. arXiv:1807.04093  [pdf, other

    cs.CV cs.AR cs.LG

    FINN-L: Library Extensions and Design Trade-off Analysis for Variable Precision LSTM Networks on FPGAs

    Authors: Vladimir Rybalkin, Alessandro Pappalardo, Muhammad Mohsin Ghaffar, Giulio Gambardella, Norbert Wehn, Michaela Blott

    Abstract: It is well known that many types of artificial neural networks, including recurrent networks, can achieve a high classification accuracy even with low-precision weights and activations. The reduction in precision generally yields much more efficient hardware implementations in regards to hardware cost, memory requirements, energy, and achievable throughput. In this paper, we present the first syst… ▽ More

    Submitted 11 July, 2018; originally announced July 2018.

    Comments: Accepted for publication, 28th International Conference on Field Programmable Logic and Applications (FPL), August, 2018, Dublin, Ireland

  23. arXiv:1807.03123  [pdf, other

    cs.CV

    Scaling Neural Network Performance through Customized Hardware Architectures on Reconfigurable Logic

    Authors: Michaela Blott, Thomas B. Preusser, Nicholas Fraser, Giulio Gambardella, Kenneth OBrien, Yaman Umuroglu, Miriam Leeser

    Abstract: Convolutional Neural Networks have dramatically improved in recent years, surpassing human accuracy on certain problems and performance exceeding that of traditional computer vision algorithms. While the compute pattern in itself is relatively simple, significant compute and memory challenges remain as CNNs may contain millions of floating-point parameters and require billions of floating-point op… ▽ More

    Submitted 26 June, 2018; originally announced July 2018.

  24. arXiv:1807.00301  [pdf, other

    cs.CV

    SYQ: Learning Symmetric Quantization For Efficient Deep Neural Networks

    Authors: Julian Faraone, Nicholas Fraser, Michaela Blott, Philip H. W. Leong

    Abstract: Inference for state-of-the-art deep neural networks is computationally expensive, making them difficult to deploy on constrained hardware environments. An efficient way to reduce this complexity is to quantize the weight parameters and/or activations during training by approximating their distributions with a limited entry codebook. For very low-precisions, such as binary or ternary networks with… ▽ More

    Submitted 1 July, 2018; originally announced July 2018.

    Comments: Published as a conference paper at the 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

  25. Inference of Quantized Neural Networks on Heterogeneous All-Programmable Devices

    Authors: Thomas B. Preußer, Giulio Gambardella, Nicholas Fraser, Michaela Blott

    Abstract: Neural networks have established as a generic and powerful means to approach challenging problems such as image classification, object detection or decision making. Their successful employment foots on an enormous demand of compute. The quantization of network parameters and the processed data has proven a valuable measure to reduce the challenges of network inference so effectively that the feasi… ▽ More

    Submitted 21 June, 2018; originally announced June 2018.

  26. arXiv:1709.06262  [pdf, other

    cs.CV

    Compressing Low Precision Deep Neural Networks Using Sparsity-Induced Regularization in Ternary Networks

    Authors: Julian Faraone, Nicholas Fraser, Giulio Gambardella, Michaela Blott, Philip H. W. Leong

    Abstract: A low precision deep neural network training technique for producing sparse, ternary neural networks is presented. The technique incorporates hard- ware implementation costs during training to achieve significant model compression for inference. Training involves three stages: network training using L2 regularization and a quantization threshold regularizer, quantization pruning, and finally retra… ▽ More

    Submitted 9 October, 2017; v1 submitted 19 September, 2017; originally announced September 2017.

    Comments: To appear as a conference paper at the 24th International Conference On Neural Information Processing (ICONIP 2017)

  27. arXiv:1701.03400  [pdf, other

    cs.CV cs.LG

    Scaling Binarized Neural Networks on Reconfigurable Logic

    Authors: Nicholas J. Fraser, Yaman Umuroglu, Giulio Gambardella, Michaela Blott, Philip Leong, Magnus Jahre, Kees Vissers

    Abstract: Binarized neural networks (BNNs) are gaining interest in the deep learning community due to their significantly lower computational and memory cost. They are particularly well suited to reconfigurable logic devices, which contain an abundance of fine-grained compute resources and can result in smaller, lower power implementations, or conversely in higher classification rates. Towards this end, the… ▽ More

    Submitted 27 January, 2017; v1 submitted 12 January, 2017; originally announced January 2017.

    Comments: To appear in the PARMA-DITAM workshop at HiPEAC 2017, January 2017

  28. arXiv:1612.07119  [pdf, other

    cs.CV cs.AR cs.LG

    FINN: A Framework for Fast, Scalable Binarized Neural Network Inference

    Authors: Yaman Umuroglu, Nicholas J. Fraser, Giulio Gambardella, Michaela Blott, Philip Leong, Magnus Jahre, Kees Vissers

    Abstract: Research has shown that convolutional neural networks contain significant redundancy, and high classification accuracy can be obtained even when weights and activations are reduced from floating point to binary values. In this paper, we present FINN, a framework for building fast and flexible FPGA accelerators using a flexible heterogeneous streaming architecture. By utilizing a novel set of optim… ▽ More

    Submitted 1 December, 2016; originally announced December 2016.

    Comments: To appear in the 25th International Symposium on Field-Programmable Gate Arrays, February 2017

  29. arXiv:1408.5387  [pdf

    cs.OH

    High-Level Synthesis Case Study: Implementation of a Memcached Server

    Authors: Kimon Karras, Michaela Blott, Kees Vissers

    Abstract: High-Level Synthesis (HLS) aspires to raise the level of abstraction in hardware design without sacrificing hardware efficiency. It has so far been successfully employed in signal and video processing but has found only limited use in other areas. This paper utilizes a commercial HLS tool, namely Vivado(R) HLS, to implement the processing of a common data center application, the Key-Value Store (K… ▽ More

    Submitted 21 August, 2014; originally announced August 2014.

    Comments: Presented at First International Workshop on FPGAs for Software Programmers (FSP 2014) (arXiv:1408.4423)

    Report number: FSP/2014/15