Automatic Generation of Fast and Accurate Performance Models for Deep Neural Network Accelerators

Lübeck, Konstantin; Jung, Alexander Louis-Ferdinand; Wedlich, Felix; Müller, Mika Markus; Peccia, Federico Nicolás; Thömmes, Felix; Steinmetz, Jannik; Biermaier, Valentin; Frischknecht, Adrian; Bernardo, Paul Palomero; Bringmann, Oliver

Computer Science > Performance

arXiv:2409.08595 (cs)

[Submitted on 13 Sep 2024]

Title:Automatic Generation of Fast and Accurate Performance Models for Deep Neural Network Accelerators

Authors:Konstantin Lübeck, Alexander Louis-Ferdinand Jung, Felix Wedlich, Mika Markus Müller, Federico Nicolás Peccia, Felix Thömmes, Jannik Steinmetz, Valentin Biermaier, Adrian Frischknecht, Paul Palomero Bernardo, Oliver Bringmann

View PDF HTML (experimental)

Abstract:Implementing Deep Neural Networks (DNNs) on resource-constrained edge devices is a challenging task that requires tailored hardware accelerator architectures and a clear understanding of their performance characteristics when executing the intended AI workload. To facilitate this, we present an automated generation approach for fast performance models to accurately estimate the latency of a DNN mapped onto systematically modeled and concisely described accelerator architectures. Using our accelerator architecture description method, we modeled representative DNN accelerators such as Gemmini, UltraTrail, Plasticine-derived, and a parameterizable systolic array. Together with DNN mappings for those modeled architectures, we perform a combined DNN/hardware dependency graph analysis, which enables us, in the best case, to evaluate only 154 loop kernel iterations to estimate the performance for 4.19 billion instructions achieving a significant speedup. We outperform regression and analytical models in terms of mean absolute percentage error (MAPE) compared to simulation results, while being several magnitudes faster than an RTL simulation.

Comments:	Accepted version for: ACM Transactions on Embedded Computing Systems
Subjects:	Performance (cs.PF); Artificial Intelligence (cs.AI); Hardware Architecture (cs.AR); Machine Learning (cs.LG)
Cite as:	arXiv:2409.08595 [cs.PF]
	(or arXiv:2409.08595v1 [cs.PF] for this version)
	https://doi.org/10.48550/arXiv.2409.08595

Submission history

From: Konstantin Lübeck [view email]
[v1] Fri, 13 Sep 2024 07:27:55 UTC (3,664 KB)

Computer Science > Performance

Title:Automatic Generation of Fast and Accurate Performance Models for Deep Neural Network Accelerators

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Performance

Title:Automatic Generation of Fast and Accurate Performance Models for Deep Neural Network Accelerators

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators