Author: Herkersdorf, Andreas : Search

research-article

Open Access

XCS with dynamic sized experience replay for memory constrained applications

GECCO '24 Companion: Proceedings of the Genetic and Evolutionary Computation Conference CompanionPages 1807–1814https://doi.org/10.1145/3638530.3664148

The eXtended Classifier System (XCS) is the most widely studied classifier system in the community. It is a class of interpretable AI which has shown strong capability to master various classification and regression tasks. It has also shown strong ...

poster

Hardware Assist for Linux IPC on an FPGA Platform

CF '24: Proceedings of the 21st ACM International Conference on Computing FrontiersPages 322–323https://doi.org/10.1145/3649153.3652998

Specialized hardware units often accelerate compute-intensive or memory-heavy functions. In previous publications, we proposed concepts to assist Linux with a hardware unit for managing waiting threads to improve blocking inter-process communication (IPC)...

research-article

Open Access

HASIIL: Hardware-Assisted Scheduling to Improve IPC Latency in Linux

CF '24: Proceedings of the 21st ACM International Conference on Computing FrontiersPages 80–87https://doi.org/10.1145/3649153.3649197

Inter-processes communication (IPC) is essential for multi-threaded applications to achieve efficient execution. Synchronization through IPC can become a bottleneck for these applications. The effectiveness of IPC is determined by both its latency and ...

research-article

HW-FUTEX: Hardware-Assisted Futex Syscall

IEEE Transactions on Very Large Scale Integration (VLSI) Systems (ITVL), Volume 32, Issue 1Pages 16–29https://doi.org/10.1109/TVLSI.2023.3317926

Efficient thread synchronization primitives are crucial in modern computer systems for the performant execution of interdependent code segments. In Linux, the futex() syscall is used to construct blocking synchronization primitives such as mutexes or ...

poster

LCT-DER: Learning Classifier Table with Dynamic-Sized Experience Replay for Run-time SoC Performance-Power Optimization

GECCO '23 Companion: Proceedings of the Companion Conference on Genetic and Evolutionary ComputationPages 331–334https://doi.org/10.1145/3583133.3590573

Learning classifier tables (LCTs) are lightweight, classifier based, hardware implemented reinforcement learning (RL) building blocks which enable self-adaptivity and self-optimization properties in multicore systems. LCTs are deployed per-core to ...

Article

CoLeCTs: Cooperative Learning Classifier Tables for Resource Management in MPSoCs

Architecture of Computing SystemsPages 215–229https://doi.org/10.1007/978-3-031-42785-5_15

Abstract

The increasing complexity and unpredictability of emerging applications makes it challenging for multi-processor system-on-chips to satisfy their performance requirements while keeping power consumption within bounds. In order to tackle this ...

Article

GAE-LCT: A Run-Time GA-Based Classifier Evolution Method for Hardware LCT Controlled SoC Performance-Power Optimization

Architecture of Computing SystemsPages 271–285https://doi.org/10.1007/978-3-031-21867-5_18

Abstract

Learning classifier tables (LCTs) are classifier based and lightweight hardware reinforcement learning building blocks which inherit the concepts of learning classifier systems. LCTs are used as a per-core low level controllers to learn and ...

research-article

SmartNIC-based Load Management and Network Health Monitoring for Time Sensitive Applications

NOMS 2022-2022 IEEE/IFIP Network Operations and Management SymposiumPages 1–6https://doi.org/10.1109/NOMS54207.2022.9789863

Time sensitive network applications, for example in Intra-Vehicular Networks, aim to give predictable end-to-end latency guarantees. As a consequence, processing resources of involved host systems remain partially unused, because they are reserved for ...

research-article

Fine-Grained Power Modeling of Multicore Processors Using FFNNs

International Journal of Parallel Programming (IJPP), Volume 50, Issue 2Pages 243–266https://doi.org/10.1007/s10766-022-00730-9

Abstract

To minimize power consumption while maximizing performance, today’s multicore processors rely on fine-grained run-time dynamic power information—both in the time domain, e.g. $μ$ s to ms, and space domain, e.g. core-level. The state-of-the-art for ...

poster

Precise real-time monitoring of time-critical flows

CoNEXT '21: Proceedings of the 17th International Conference on emerging Networking EXperiments and TechnologiesPages 489–490https://doi.org/10.1145/3485983.3493356

Ethernet is increasingly used in areas where time-critical and safety-relevant data are transported over the network along with best-effort flows, for example in intra vehicle networks or industrial networks. The resulting complex network architectures, ...

research-article

Protection switching schemes and mapping strategies for fail-operational hard real-time NoCs

Microprocessors & Microsystems (MSYS), Volume 87, Issue Chttps://doi.org/10.1016/j.micpro.2021.104385

Abstract

Communication infrastructures designed for mixed-critical MPSoCs must provide isolation of traffic, hard real-time guarantees, and fault-tolerance. In previous work, we proposed the combination of protection-switching with a hybrid ...

research-article

Exploring a Hybrid Voting-based Eviction Policy for Caches and Sparse Directories on Manycore Architectures

Microprocessors & Microsystems (MSYS), Volume 87, Issue Chttps://doi.org/10.1016/j.micpro.2021.104384

Abstract

In manycore systems, eviction decisions related to caches and memory coherence greatly impact system performance, thereby emphasizing their importance. Extensive research has produced numerous standalone eviction policies such as LRU, ...

short-paper

PEPERONI: Pre-Estimating the Performance of Near-Memory Integration

MEMSYS '21: Proceedings of the International Symposium on Memory SystemsArticle No.: 9, Pages 1–6https://doi.org/10.1145/3488423.3519329

Near-memory integration strives to tackle the challenge of low data locality and power consumption originating from cross- chip data transfers, meanwhile referred to as locality wall. In order to keep costly engineering efforts bounded when ...

research-article

DynaCo: Dynamic Coherence Management for Tiled Manycore Architectures

International Journal of Parallel Programming (IJPP), Volume 49, Issue 4Pages 570–599https://doi.org/10.1007/s10766-020-00688-6

Abstract

Embedded system applications, with their inherently limited parallelism, rarely exploit all available processing resources in large DSM-based manycore architectures. From a cache coherence perspective, this provides an opportunity to move away ...

research-article

DySHARQ: Dynamic Software-Defined Hardware-Managed Queues for Tile-Based Architectures

International Journal of Parallel Programming (IJPP), Volume 49, Issue 4Pages 506–540https://doi.org/10.1007/s10766-020-00687-7

Abstract

The recent trend towards tile-based manycore architectures has helped to tackle the memory wall by physically distributing memories and processing nodes. However, this introduced a data-to-task locality challenge and inter-tile communication thus ...

research-article

Open Access

SEAMS: Self-Optimizing Runtime Manager for Approximate Memory Hierarchies

ACM Transactions on Embedded Computing Systems (TECS), Volume 20, Issue 5Article No.: 48, Pages 1–26https://doi.org/10.1145/3466875

Memory approximation techniques are commonly limited in scope, targeting individual levels of the memory hierarchy. Existing approximation techniques for a full memory hierarchy determine optimal configurations at design-time provided a goal and ...

research-article

X-Centric: A Survey on Compute-, Memory- and Application-Centric Computer Architectures

MEMSYS '20: Proceedings of the International Symposium on Memory SystemsPages 178–193https://doi.org/10.1145/3422575.3422792

Big Data and machine learning constitute the multifaceted challenge of computer engineering in the past decade. The meaningful processing of vast amounts of unstructured data from a myriad of sensors and devices is a complicated endeavor already. ...

research-article

Machine Learning Approaches for Efficient Design Space Exploration of Application-Specific NoCs

ACM Transactions on Design Automation of Electronic Systems (TODAES), Volume 25, Issue 5Article No.: 44, Pages 1–27https://doi.org/10.1145/3403584

In many Multi-Processor Systems-on-Chip (MPSoCs), traffic between cores is unbalanced. This motivates the use of an application-specific Network-on-Chip (NoC) that is customized and can provide a high performance at low cost in terms of power and area. ...

Article

Fine-Grained Power Modeling of Multicore Processors Using FFNNs

Embedded Computer Systems: Architectures, Modeling, and SimulationPages 186–199https://doi.org/10.1007/978-3-030-60939-9_13

Abstract

To minimize power consumption while maximizing performance, today’s multicore processors rely on fine-grained run-time dynamic power information – both in the time domain, e.g. $μ s$ to ms, and space domain, e.g. core-level. The state-of-the-art for ...

research-article

Combinatorial Auctions for Temperature-Constrained Resource Management in Manycores

IEEE Transactions on Parallel and Distributed Systems (TPDS), Volume 31, Issue 7Pages 1605–1620https://doi.org/10.1109/TPDS.2020.2965523

Although manycore processors have plenty of cores, not all of them may run simultaneously at full speed and even some of them might need to be power-gated in order to keep the chip within safe temperature limits. Hence, a resource management technique, ...

Applied Filters

People

Names

Institutions

Authors

Editors

Publications

Journal/Magazine Names

Proceedings/Book Names

All Publications

Content Type

Supplemental Material Type

Media Formats

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Publication Date

Save to Binder

Upcoming Conferences