Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–4 of 4 results for author: Petrica, L

.
  1. arXiv:2403.18374  [pdf, other

    cs.DC cs.AR

    Optimizing Communication for Latency Sensitive HPC Applications on up to 48 FPGAs Using ACCL

    Authors: Marius Meyer, Tobias Kenter, Lucian Petrica, Kenneth O'Brien, Michaela Blott, Christian Plessl

    Abstract: Most FPGA boards in the HPC domain are well-suited for parallel scaling because of the direct integration of versatile and high-throughput network ports. However, the utilization of their network capabilities is often challenging and error-prone because the whole network stack and communication patterns have to be implemented and managed on the FPGAs. Also, this approach conceptually involves a tr… ▽ More

    Submitted 7 April, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

  2. arXiv:2312.11742  [pdf, other

    cs.DC cs.AR cs.LG cs.NI

    ACCL+: an FPGA-Based Collective Engine for Distributed Applications

    Authors: Zhenhao He, Dario Korolija, Yu Zhu, Benjamin Ramhorst, Tristan Laan, Lucian Petrica, Michaela Blott, Gustavo Alonso

    Abstract: FPGAs are increasingly prevalent in cloud deployments, serving as Smart NICs or network-attached accelerators. Despite their potential, developing distributed FPGA-accelerated applications remains cumbersome due to the lack of appropriate infrastructure and communication abstractions. To facilitate the development of distributed applications with FPGAs, in this paper we propose ACCL+, an open-sour… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

  3. arXiv:2011.07317  [pdf, other

    cs.AR

    Memory-Efficient Dataflow Inference for Deep CNNs on FPGA

    Authors: Lucian Petrica, Tobias Alonso, Mairin Kroes, Nicholas Fraser, Sorin Cotofana, Michaela Blott

    Abstract: Custom dataflow Convolutional Neural Network (CNN) inference accelerators on FPGA are tailored to a specific CNN topology and store parameters in On-Chip Memory (OCM), resulting in high energy efficiency and low inference latency. However, in these accelerators the shapes of parameter memories are dictated by throughput constraints and do not map well to the underlying OCM, which becomes an implem… ▽ More

    Submitted 14 November, 2020; originally announced November 2020.

    Comments: To appear in FPT 2020 proceedings

  4. arXiv:2003.12449  [pdf, other

    cs.DC cs.NE eess.SP

    Evolutionary Bin Packing for Memory-Efficient Dataflow Inference Acceleration on FPGA

    Authors: Mairin Kroes, Lucian Petrica, Sorin Cotofana, Michaela Blott

    Abstract: Convolutional neural network (CNN) dataflow inference accelerators implemented in Field Programmable Gate Arrays (FPGAs) have demonstrated increased energy efficiency and lower latency compared to CNN execution on CPUs or GPUs. However, the complex shapes of CNN parameter memories do not typically map well to FPGA on-chip memories (OCM), which results in poor OCM utilization and ultimately limits… ▽ More

    Submitted 24 March, 2020; originally announced March 2020.

    Comments: To appear in GECCO 2020 Proceedings