Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–31 of 31 results for author: Verhelst, M

Searching in archive cs. Search in all archives.
.
  1. CMDS: Cross-layer Dataflow Optimization for DNN Accelerators Exploiting Multi-bank Memories

    Authors: Man Shi, Steven Colleman, Charlotte VanDeMieroop, Antony Joseph, Maurice Meijer, Wim Dehaene, Marian Verhelst

    Abstract: Deep neural networks (DNN) use a wide range of network topologies to achieve high accuracy within diverse applications. This model diversity makes it impossible to identify a single "dataflow" (execution schedule) to perform optimally across all possible layers and network topologies. Several frameworks support the exploration of the best dataflow for a given DNN layer and hardware. However, switc… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Journal ref: 2023 24th International Symposium on Quality Electronic Design (ISQED)

  2. arXiv:2406.09804  [pdf, other

    cs.AR

    Optimizing Layer-Fused Scheduling of Transformer Networks on Multi-accelerator Platforms

    Authors: Steven Colleman, Arne Symons, Victor J. B. Jung, Marian Verhelst

    Abstract: The impact of transformer networks is booming, yet, they come with significant computational complexity. It is therefore essential to understand how to optimally map and execute these networks on modern neural processor hardware. So far, literature on transformer scheduling optimization has been focusing on deployment on GPU and specific ASICs. This work enables extensive hardware/mapping explorat… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Accepted to ISQED2024

  3. HTVM: Efficient Neural Network Deployment On Heterogeneous TinyML Platforms

    Authors: Josse Van Delm, Maarten Vandersteegen, Alessio Burrello, Giuseppe Maria Sarda, Francesco Conti, Daniele Jahier Pagliari, Luca Benini, Marian Verhelst

    Abstract: Optimal deployment of deep neural networks (DNNs) on state-of-the-art Systems-on-Chips (SoCs) is crucial for tiny machine learning (TinyML) at the edge. The complexity of these SoCs makes deployment non-trivial, as they typically contain multiple heterogeneous compute cores with limited, programmer-managed memory to optimize latency and energy efficiency. We propose HTVM - a compiler that merges T… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Presented at DAC2023. Open-source code is available at https://github.com/KULeuven-MICAS/htvm

    ACM Class: D.3.4

    Journal ref: 2023 60th ACM/IEEE Design Automation Conference (DAC), San Francisco, CA, USA, 2023, pp. 1-6

  4. Analog or Digital In-memory Computing? Benchmarking through Quantitative Modeling

    Authors: Jiacong Sun, Pouya Houshmand, Marian Verhelst

    Abstract: In-Memory Computing (IMC) has emerged as a promising paradigm for energy-efficient, throughput-efficient and area-efficient machine learning at the edge. However, the differences in hardware architectures, array dimensions, and fabrication technologies among published IMC realizations have made it difficult to grasp their relative strengths. Moreover, previous studies have primarily focused on exp… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  5. arXiv:2308.00154  [pdf, other

    cs.AR

    PATRONoC: Parallel AXI Transport Reducing Overhead for Networks-on-Chip targeting Multi-Accelerator DNN Platforms at the Edge

    Authors: Vikram Jain, Matheus Cavalcante, Nazareno Bruschi, Michael Rogenmoser, Thomas Benz, Andreas Kurth, Davide Rossi, Luca Benini, Marian Verhelst

    Abstract: Emerging deep neural network (DNN) applications require high-performance multi-core hardware acceleration with large data bursts. Classical network-on-chips (NoCs) use serial packet-based protocols suffering from significant protocol translation overheads towards the endpoints. This paper proposes PATRONoC, an open-source fully AXI-compliant NoC fabric to better address the specific needs of multi… ▽ More

    Submitted 31 July, 2023; originally announced August 2023.

    Comments: Accepted and presented at 60th DAC

  6. arXiv:2306.05060  [pdf, other

    cs.LG

    Precision-aware Latency and Energy Balancing on Multi-Accelerator Platforms for DNN Inference

    Authors: Matteo Risso, Alessio Burrello, Giuseppe Maria Sarda, Luca Benini, Enrico Macii, Massimo Poncino, Marian Verhelst, Daniele Jahier Pagliari

    Abstract: The need to execute Deep Neural Networks (DNNs) at low latency and low power at the edge has spurred the development of new heterogeneous Systems-on-Chips (SoCs) encapsulating a diverse set of hardware accelerators. How to optimally map a DNN onto such multi-accelerator systems is an open problem. We propose ODiMO, a hardware-aware tool that performs a fine-grain mapping across different accelerat… ▽ More

    Submitted 8 June, 2023; originally announced June 2023.

    Comments: Accepted at 2023 ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED)

  7. arXiv:2305.18335  [pdf, other

    cs.AR eess.IV eess.SP

    Benchmarking and modeling of analog and digital SRAM in-memory computing architectures

    Authors: Pouya Houshmand, Jiacong Sun, Marian Verhelst

    Abstract: In-memory-computing is emerging as an efficient hardware paradigm for deep neural network accelerators at the edge, enabling to break the memory wall and exploit massive computational parallelism. Two design models have surged: analog in-memory-computing (AIMC) and digital in-memory-computing (DIMC), offering a different design space in terms of accuracy, efficiency and dataflow flexibility. This… ▽ More

    Submitted 25 May, 2023; originally announced May 2023.

  8. arXiv:2304.12931  [pdf, other

    cs.AR cs.AI

    SALSA: Simulated Annealing based Loop-Ordering Scheduler for DNN Accelerators

    Authors: Victor J. B. Jung, Arne Symons, Linyan Mei, Marian Verhelst, Luca Benini

    Abstract: To meet the growing need for computational power for DNNs, multiple specialized hardware architectures have been proposed. Each DNN layer should be mapped onto the hardware with the most efficient schedule, however, SotA schedulers struggle to consistently provide optimum schedules in a reasonable time across all DNN-HW combinations. This paper proposes SALSA, a fast dual-engine scheduler to gen… ▽ More

    Submitted 14 June, 2024; v1 submitted 20 April, 2023; originally announced April 2023.

    Comments: 5 pages, 6 figures, open-source at https://github.com/ZigZag-Project/zigzag

  9. arXiv:2304.04640  [pdf, other

    cs.AI

    NeuroBench: A Framework for Benchmarking Neuromorphic Computing Algorithms and Systems

    Authors: Jason Yik, Korneel Van den Berghe, Douwe den Blanken, Younes Bouhadjar, Maxime Fabre, Paul Hueber, Denis Kleyko, Noah Pacik-Nelson, Pao-Sheng Vincent Sun, Guangzhi Tang, Shenqi Wang, Biyan Zhou, Soikat Hasan Ahmed, George Vathakkattil Joseph, Benedetto Leto, Aurora Micheli, Anurag Kumar Mishra, Gregor Lenz, Tao Sun, Zergham Ahmed, Mahmoud Akl, Brian Anderson, Andreas G. Andreou, Chiara Bartolozzi, Arindam Basu , et al. (73 additional authors not shown)

    Abstract: Neuromorphic computing shows promise for advancing computing efficiency and capabilities of AI applications using brain-inspired principles. However, the neuromorphic research field currently lacks standardized benchmarks, making it difficult to accurately measure technological advancements, compare performance with conventional methods, and identify promising future research directions. Prior neu… ▽ More

    Submitted 17 January, 2024; v1 submitted 10 April, 2023; originally announced April 2023.

    Comments: Updated from whitepaper to full perspective article preprint

  10. TinyVers: A Tiny Versatile System-on-chip with State-Retentive eMRAM for ML Inference at the Extreme Edge

    Authors: Vikram Jain, Sebastian Giraldo, Jaro De Roose, Linyan Mei, Bert Boons, Marian Verhelst

    Abstract: Extreme edge devices or Internet-of-thing nodes require both ultra-low power always-on processing as well as the ability to do on-demand sampling and processing. Moreover, support for IoT applications like voice recognition, machine monitoring, etc., requires the ability to execute a wide range of ML workloads. This brings challenges in hardware design to build flexible processors operating in ult… ▽ More

    Submitted 9 January, 2023; originally announced January 2023.

    Comments: Accepted in IEEE Journal of Solid-State Circuits

  11. arXiv:2212.10612  [pdf, other

    cs.AR

    Towards Heterogeneous Multi-core Accelerators Exploiting Fine-grained Scheduling of Layer-Fused Deep Neural Networks

    Authors: Arne Symons, Linyan Mei, Steven Colleman, Pouya Houshmand, Sebastian Karl, Marian Verhelst

    Abstract: To keep up with the ever-growing performance demand of neural networks, specialized hardware (HW) accelerators are shifting towards multi-core and chiplet architectures. So far, these multi-accelerator systems exploit the increased parallelism by pipelining different NN layers across input batches on different cores to increase throughput. Yet, when pursuing this with non-batched layer-by-layer sc… ▽ More

    Submitted 20 December, 2022; originally announced December 2022.

    Comments: 9 pages + references, 15 figures

  12. arXiv:2212.05344  [pdf, other

    cs.AR cs.DC

    DeFiNES: Enabling Fast Exploration of the Depth-first Scheduling Space for DNN Accelerators through Analytical Modeling

    Authors: Linyan Mei, Koen Goetschalckx, Arne Symons, Marian Verhelst

    Abstract: DNN workloads can be scheduled onto DNN accelerators in many different ways: from layer-by-layer scheduling to cross-layer depth-first scheduling (a.k.a. layer fusion, or cascaded execution). This results in a very broad scheduling space, with each schedule leading to varying hardware (HW) costs in terms of energy and latency. To rapidly explore this vast space for a wide variety of hardware archi… ▽ More

    Submitted 14 June, 2024; v1 submitted 10 December, 2022; originally announced December 2022.

    Comments: Accepted by HPCA 2023

  13. arXiv:2212.00873  [pdf, other

    cs.AR

    CONVOLVE: Smart and seamless design of smart edge processors

    Authors: M. Gomony, F. Putter, A. Gebregiorgis, G. Paulin, L. Mei, V. Jain, S. Hamdioui, V. Sanchez, T. Grosser, M. Geilen, M. Verhelst, F. Zenke, F. Gurkaynak, B. Bruin, S. Stuijk, S. Davidson, S. De, M. Ghogho, A. Jimborean, S. Eissa, L. Benini, D. Soudris, R. Bishnoi, S. Ainsworth, F. Corradi , et al. (3 additional authors not shown)

    Abstract: With the rise of Deep Learning (DL), our world braces for AI in every edge device, creating an urgent need for edge-AI SoCs. This SoC hardware needs to support high throughput, reliable and secure AI processing at Ultra Low Power (ULP), with a very short time to market. With its strong legacy in edge solutions and open processing platforms, the EU is well-positioned to become a leader in this SoC… ▽ More

    Submitted 2 May, 2023; v1 submitted 1 December, 2022; originally announced December 2022.

  14. arXiv:2210.13184  [pdf, other

    cs.AR

    DPU-v2: Energy-efficient execution of irregular directed acyclic graphs

    Authors: Nimish Shah, Wannes Meert, Marian Verhelst

    Abstract: A growing number of applications like probabilistic machine learning, sparse linear algebra, robotic navigation, etc., exhibit irregular data flow computation that can be modeled with directed acyclic graphs (DAGs). The irregularity arises from the seemingly random connections of nodes, which makes the DAG structure unsuitable for vectorization on CPU or GPU. Moreover, the nodes usually represent… ▽ More

    Submitted 20 October, 2022; originally announced October 2022.

  15. arXiv:2208.12694  [pdf, other

    cs.CV

    Hardware-aware mobile building block evaluation for computer vision

    Authors: Maxim Bonnaerens, Matthias Freiberger, Marian Verhelst, Joni Dambre

    Abstract: In this work we propose a methodology to accurately evaluate and compare the performance of efficient neural network building blocks for computer vision in a hardware-aware manner. Our comparison uses pareto fronts based on randomly sampled networks from a design space to capture the underlying accuracy/complexity trade-offs. We show that our approach allows to match the information obtained by pr… ▽ More

    Submitted 26 August, 2022; originally announced August 2022.

  16. arXiv:2204.03479  [pdf, other

    cs.CL cs.LG

    Delta Keyword Transformer: Bringing Transformers to the Edge through Dynamically Pruned Multi-Head Self-Attention

    Authors: Zuzana Jelčicová, Marian Verhelst

    Abstract: Multi-head self-attention forms the core of Transformer networks. However, their quadratically growing complexity with respect to the input sequence length impedes their deployment on resource-constrained edge devices. We address this challenge by proposing a dynamic pruning method, which exploits the temporal stability of data across tokens to reduce inference cost. The threshold-based method onl… ▽ More

    Submitted 20 March, 2022; originally announced April 2022.

  17. arXiv:2112.05660  [pdf, other

    cs.AR cs.DC eess.SY

    DPU: DAG Processing Unit for Irregular Graphs with Precision-Scalable Posit Arithmetic in 28nm

    Authors: Nimish Shah, Laura Isabel Galindez Olascoaga, Shirui Zhao, Wannes Meert, Marian Verhelst

    Abstract: Computation in several real-world applications like probabilistic machine learning, sparse linear algebra, and robotic navigation, can be modeled as irregular directed acyclic graphs (DAGs). The irregular data dependencies in DAGs pose challenges to parallel execution on general-purpose CPUs and GPUs, resulting in severe under-utilization of the hardware. This paper proposes DPU, a specialized pro… ▽ More

    Submitted 10 December, 2021; originally announced December 2021.

    Comments: IEEE Journal of Solid-State Circuits

  18. Taxonomy and Benchmarking of Precision-Scalable MAC Arrays Under Enhanced DNN Dataflow Representation

    Authors: Ehab M. Ibrahim, Linyan Mei, Marian Verhelst

    Abstract: Reduced-precision and variable-precision multiply-accumulate (MAC) operations provide opportunities to significantly improve energy efficiency and throughput of DNN accelerators with no/limited algorithmic performance loss, paving a way towards deploying AI applications on resource-constraint edge devices. Accordingly, various precision-scalable MAC array (PSMA) architectures were proposed recentl… ▽ More

    Submitted 17 January, 2022; v1 submitted 10 August, 2021; originally announced August 2021.

    Journal ref: IEEE Transactions on Circuits and Systems I: Regular Papers (Early Access) (2022)

  19. GRAPHOPT: constrained-optimization-based parallelization of irregular graphs

    Authors: Nimish Shah, Wannes Meert, Marian Verhelst

    Abstract: Sparse, irregular graphs show up in various applications like linear algebra, machine learning, engineering simulations, robotic control, etc. These graphs have a high degree of parallelism, but their execution on parallel threads of modern platforms remains challenging due to the irregular data dependencies. The execution performance can be improved by efficiently partitioning the graphs such tha… ▽ More

    Submitted 16 February, 2022; v1 submitted 5 May, 2021; originally announced May 2021.

    Journal ref: IEEE Transactions on Parallel and Distributed Systems 2020

  20. Acceleration of probabilistic reasoning through custom processor architecture

    Authors: Nimish Shah, Laura I. Galindez Olascoaga, Wannes Meert, Marian Verhelst

    Abstract: Probabilistic reasoning is an essential tool for robust decision-making systems because of its ability to explicitly handle real-world uncertainty, constraints and causal relations. Consequently, researchers are developing hybrid models by combining Deep Learning with probabilistic reasoning for safety-critical applications like self-driving vehicles, autonomous drones, etc. However, probabilistic… ▽ More

    Submitted 27 February, 2021; originally announced March 2021.

    Journal ref: Design, Automation & Test in Europe Conference & Exhibition (DATE) 2020

  21. arXiv:2103.00216  [pdf, other

    cs.AR cs.LG math.NA

    ProbLP: A framework for low-precision probabilistic inference

    Authors: Nimish Shah, Laura I. Galindez Olascoaga, Wannes Meert, Marian Verhelst

    Abstract: Bayesian reasoning is a powerful mechanism for probabilistic inference in smart edge-devices. During such inferences, a low-precision arithmetic representation can enable improved energy efficiency. However, its impact on inference accuracy is not yet understood. Furthermore, general-purpose hardware does not natively support low-precision representation. To address this, we propose ProbLP, a fram… ▽ More

    Submitted 27 February, 2021; originally announced March 2021.

    Journal ref: Proceedings of the 56th Annual Design Automation Conference (DAC) 2019

  22. arXiv:2009.09675  [pdf, other

    cs.CV

    Feed-Forward On-Edge Fine-tuning Using Static Synthetic Gradient Modules

    Authors: Robby Neven, Marian Verhelst, Tinne Tuytelaars, Toon Goedemé

    Abstract: Training deep learning models on embedded devices is typically avoided since this requires more memory, computation and power over inference. In this work, we focus on lowering the amount of memory needed for storing all activations, which are required during the backward pass to compute the gradients. Instead, during the forward pass, static Synthetic Gradient Modules (SGMs) predict gradients for… ▽ More

    Submitted 21 September, 2020; originally announced September 2020.

  23. arXiv:2007.11360  [pdf, other

    cs.DC

    ZigZag: A Memory-Centric Rapid DNN Accelerator Design Space Exploration Framework

    Authors: Linyan Mei, Pouya Houshmand, Vikram Jain, Sebastian Giraldo, Marian Verhelst

    Abstract: Building efficient embedded deep learning systems requires a tight co-design between DNN algorithms, memory hierarchy, and dataflow. However, owing to the large degrees of freedom in the design space, finding an optimal solution through the implementation of individual design points becomes infeasible. Recently, several estimation frameworks for fast design space exploration (DSE) have emerged, ye… ▽ More

    Submitted 11 August, 2020; v1 submitted 22 July, 2020; originally announced July 2020.

    Comments: 14 pages, 20 figures. Source code is available at https://github.com/ZigZag-Project/zigzag

    ACM Class: C.1.4; C.3; C.4

  24. arXiv:2003.04821  [pdf, other

    cs.PF cs.LG

    Benchmarking TinyML Systems: Challenges and Direction

    Authors: Colby R. Banbury, Vijay Janapa Reddi, Max Lam, William Fu, Amin Fazel, Jeremy Holleman, Xinyuan Huang, Robert Hurtado, David Kanter, Anton Lokhmotov, David Patterson, Danilo Pau, Jae-sun Seo, Jeff Sieracki, Urmish Thakker, Marian Verhelst, Poonam Yadav

    Abstract: Recent advancements in ultra-low-power machine learning (TinyML) hardware promises to unlock an entirely new class of smart applications. However, continued progress is limited by the lack of a widely accepted benchmark for these systems. Benchmarking allows us to measure and thereby systematically compare, evaluate, and improve the performance of systems and is therefore fundamental to a field re… ▽ More

    Submitted 29 January, 2021; v1 submitted 10 March, 2020; originally announced March 2020.

    Comments: 6 pages, 1 figure, 3 tables

  25. arXiv:1812.06672  [pdf, other

    eess.AS cs.SD

    A multi-layered energy consumption model for smart wireless acoustic sensor networks

    Authors: Gert Dekkers, Fernando Rosas, Steven Lauwereins, Sreeraj Rajendran, Sofie Pollin, Bart Vanrumste, Toon van Waterschoot, Marian Verhelst, Peter Karsmakers

    Abstract: Smart sensing is expected to become a pervasive technology in smart cities and environments of the near future. These services are improving their capabilities due to integrated devices shrinking in size while maintaining their computational power, which can run diverse Machine Learning algorithms and achieve high performance in various data-processing tasks. One attractive sensor modality to be u… ▽ More

    Submitted 17 December, 2018; originally announced December 2018.

  26. arXiv:1804.05554  [pdf

    cs.DC cs.NE

    BinarEye: An Always-On Energy-Accuracy-Scalable Binary CNN Processor With All Memory On Chip in 28nm CMOS

    Authors: Bert Moons, Daniel Bankman, Lita Yang, Boris Murmann, Marian Verhelst

    Abstract: This paper introduces BinarEye: a digital processor for always-on Binary Convolutional Neural Networks. The chip maximizes data reuse through a Neuron Array exploiting local weight Flip-Flops. It stores full network models and feature maps and hence requires no off-chip bandwidth, which leads to a 230 1b-TOPS/W peak efficiency. Its 3 levels of flexibility - (a) weight reconfiguration, (b) a progra… ▽ More

    Submitted 16 April, 2018; originally announced April 2018.

    Comments: Presented at the 2018 IEEE Custom Integrated Circuits Conference (CICC). Presentation is available here: https://www.researchgate.net/publication/324452819_Presentation_on_Binareye_at_CICC

  27. arXiv:1803.04840  [pdf, other

    cs.CV

    Resource aware design of a deep convolutional-recurrent neural network for speech recognition through audio-visual sensor fusion

    Authors: Matthijs Van keirsbilck, Bert Moons, Marian Verhelst

    Abstract: Today's Automatic Speech Recognition systems only rely on acoustic signals and often don't perform well under noisy conditions. Performing multi-modal speech recognition - processing acoustic speech signals and lip-reading video simultaneously - significantly enhances the performance of such systems, especially in noisy environments. This work presents the design of such an audio-visual system for… ▽ More

    Submitted 13 March, 2018; originally announced March 2018.

    Comments: Tech. report

  28. arXiv:1711.00215  [pdf, other

    cs.NE cs.AR cs.LG

    Minimum Energy Quantized Neural Networks

    Authors: Bert Moons, Koen Goetschalckx, Nick Van Berckelaer, Marian Verhelst

    Abstract: This work targets the automated minimum-energy optimization of Quantized Neural Networks (QNNs) - networks using low precision weights and activations. These networks are trained from scratch at an arbitrary fixed point precision. At iso-accuracy, QNNs using fewer bits require deeper and wider network architectures than networks using higher precision operators, while they require less complex ari… ▽ More

    Submitted 23 November, 2017; v1 submitted 1 November, 2017; originally announced November 2017.

    Comments: preprint for work presented at the 51st Asilomar Conference on Signals, Systems and Computers

  29. arXiv:1606.05094  [pdf

    cs.AR

    A 0.3-2.6 TOPS/W Precision-Scalable Processor for Real-Time Large-Scale ConvNets

    Authors: Bert Moons, Marian Verhelst

    Abstract: A low-power precision-scalable processor for ConvNets or convolutional neural networks (CNN) is implemented in a 40nm technology. Its 256 parallel processing units achieve a peak 102GOPS running at 204MHz. To minimize energy consumption while maintaining throughput, this works is the first to both exploit the sparsity of convolutions and to implement dynamic precision-scalability enabling supply-… ▽ More

    Submitted 16 June, 2016; originally announced June 2016.

    Comments: Published at the Symposium on VLSI Circuits, 2016, Honolulu, HI, US

    Report number: paper C17p1

  30. Energy-Efficient ConvNets Through Approximate Computing

    Authors: Bert Moons, Bert De Brabandere, Luc Van Gool, Marian Verhelst

    Abstract: Recently ConvNets or convolutional neural networks (CNN) have come up as state-of-the-art classification and detection algorithms, achieving near-human performance in visual detection. However, ConvNet algorithms are typically very computation and memory intensive. In order to be able to embed ConvNet-based classification into wearable platforms and embedded systems such as smartphones or ubiquito… ▽ More

    Submitted 22 March, 2016; originally announced March 2016.

    Comments: Published in IEEE Winter Conference on Applications of Computer Vision (WACV 2016)

  31. Understanding interdependency through complex information sharing

    Authors: Fernando Rosas, Vasilis Ntranos, Christopher J. Ellison, Sofie Pollin, Marian Verhelst

    Abstract: The interactions between three or more random variables are often nontrivial, poorly understood, and yet, are paramount for future advances in fields such as network information theory, neuroscience, genetics and many others. In this work, we propose to analyze these interactions as different modes of information sharing. Towards this end, we introduce a novel axiomatic framework for decomposing t… ▽ More

    Submitted 15 September, 2015; originally announced September 2015.

    Comments: 39 pages, 4 figures