Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleJanuary 2025JUST ACCEPTED
HeterogeneousRTOS: A CPU-FPGA Real-Time OS for Fault Tolerance on COTS at Near-Zero Timing Cost
ACM Transactions on Embedded Computing Systems (TECS), Just Accepted https://doi.org/10.1145/3712062Ionizing particles in the atmosphere may strike circuits causing Single Event Upsets (SEU), affecting the output correctness. Critical real-time systems are traditionally custom-designed, featuring redundancy for guaranteeing fault resilience. The ...
- research-articleAugust 2024
Transient Fault Detection in Tensor Cores for Modern GPUs
ACM Transactions on Embedded Computing Systems (TECS), Volume 23, Issue 5Article No.: 82, Pages 1–29https://doi.org/10.1145/3687483Deep neural networks (DNNs) have emerged as an effective solution for many machine learning applications. However, the great success comes with the cost of excessive computation. The Volta graphics processing unit (GPU) from NVIDIA introduced a ...
- research-articleApril 2024
Energy Management for Fault-tolerant (m,k)-constrained Real-time Systems That Use Standby-Sparing
ACM Transactions on Embedded Computing Systems (TECS), Volume 23, Issue 3Article No.: 36, Pages 1–36https://doi.org/10.1145/3648365Fault tolerance, energy management, and quality of service (QoS) are essential aspects for the design of real-time embedded systems. In this work, we focus on exploring methods that can simultaneously address the above three critical issues under standby-...
- research-articleJanuary 2024
Modeling and Analysis of ETC Control System with Colored Petri Net and Dynamic Slicing
ACM Transactions on Embedded Computing Systems (TECS), Volume 23, Issue 1Article No.: 14, Pages 1–27https://doi.org/10.1145/3633450Nowadays, Electronic Toll Collection (ETC) control systems have been widely adopted to smoothen traffic flow on highways. However, as it is a complex business interaction system, there are inevitably flaws in its control logic process, such as the problem ...
- research-articleJuly 2023
Optimal Checkpointing Strategy for Real-time Systems with Both Logical and Timing Correctness
ACM Transactions on Embedded Computing Systems (TECS), Volume 22, Issue 4Article No.: 66, Pages 1–21https://doi.org/10.1145/3603172Real-time systems are susceptible to adversarial factors such as faults and attacks, leading to severe consequences. This paper presents an optimal checkpoint scheme to bolster fault resilience in real-time systems, addressing both logical consistency and ...
-
- research-articleJuly 2023
A Methodology for Fault-tolerant Pareto-optimal Approximate Designs of FPGA-based Accelerators
ACM Transactions on Embedded Computing Systems (TECS), Volume 22, Issue 4Article No.: 78, Pages 1–31https://doi.org/10.1145/3568021Approximate Computing Techniques (ACTs) take advantage of resilience computing applications to trade off among output precision, area, power, and performance. ACTs can lead to significant gains at affordable costs when efficiently implemented on Field ...
- research-articleApril 2023
Reliability Assessment and Safety Arguments for Machine Learning Components in System Assurance
- Yi Dong,
- Wei Huang,
- Vibhav Bharti,
- Victoria Cox,
- Alec Banks,
- Sen Wang,
- Xingyu Zhao,
- Sven Schewe,
- Xiaowei Huang
ACM Transactions on Embedded Computing Systems (TECS), Volume 22, Issue 3Article No.: 48, Pages 1–48https://doi.org/10.1145/3570918The increasing use of Machine Learning (ML) components embedded in autonomous systems—so-called Learning-Enabled Systems (LESs)—has resulted in the pressing need to assure their functional safety. As for traditional functional safety, the emerging ...
- research-articleApril 2023
Multi-bit Data Flow Error Detection Method Based on SDC Vulnerability Analysis
ACM Transactions on Embedded Computing Systems (TECS), Volume 22, Issue 3Article No.: 49, Pages 1–30https://doi.org/10.1145/3572838One of the most difficult data flow errors to detect caused by single-event upsets in space radiation is the Silent Data Corruption (SDC). To solve the problem of multi-bit upsets causing program SDC, an instruction multi-bit SDC vulnerability prediction ...
- research-articleJanuary 2023
A Contrastive Plan Explanation Framework for Hybrid System Models
ACM Transactions on Embedded Computing Systems (TECS), Volume 22, Issue 2Article No.: 22, Pages 1–51https://doi.org/10.1145/3561532In artificial intelligence planning, having an explanation of a plan given by a planner is often desirable. The ability to explain various aspects of a synthesized plan to an end user not only brings in trust on the planner but also reveals insights of ...
- research-articleFebruary 2022
Read Refresh Scheduling and Data Reallocation against Read Disturb in SSDs
ACM Transactions on Embedded Computing Systems (TECS), Volume 21, Issue 2Article No.: 18, Pages 1–27https://doi.org/10.1145/3495254Read disturb is a circuit-level noise in flash-based Solid-State Drives (SSDs), induced by intensive read requests, which may result in unexpected read errors. The approach of read refresh (RR) is commonly adopted to mitigate its negative effects by ...
- research-articleJanuary 2022
CORIDOR: Using COherence and TempoRal LocalIty to Mitigate Read Disurbance ErrOR in STT-RAM Caches
ACM Transactions on Embedded Computing Systems (TECS), Volume 21, Issue 1Article No.: 2, Pages 1–24https://doi.org/10.1145/3484493In the deep sub-micron region, “spin-transfer torque RAM” (STT-RAM) suffers from “read-disturbance error” (RDE), whereby a read operation disturbs the stored data. Mitigation of RDE requires restore operations, which imposes latency and energy penalties. ...
- research-articleOctober 2021
Horizontal Auto-Scaling for Multi-Access Edge Computing Using Safe Reinforcement Learning
ACM Transactions on Embedded Computing Systems (TECS), Volume 20, Issue 6Article No.: 109, Pages 1–33https://doi.org/10.1145/3475991Multi-Access Edge Computing (MEC) has emerged as a promising new paradigm allowing low latency access to services deployed on edge servers to avert network latencies often encountered in accessing cloud services. A key component of the MEC environment is ...
- research-articleSeptember 2021
Declarative Power Sequencing
ACM Transactions on Embedded Computing Systems (TECS), Volume 20, Issue 5sArticle No.: 84, Pages 1–21https://doi.org/10.1145/3477039Modern computer server systems are increasingly managed at a low level by baseboard management controllers (BMCs). BMCs are processors with access to the most critical parts of the platform, below the level of OS or hypervisor, including control over ...
- research-articleSeptember 2021
Tolerating Defects in Low-Power Neural Network Accelerators Via Retraining-Free Weight Approximation
ACM Transactions on Embedded Computing Systems (TECS), Volume 20, Issue 5sArticle No.: 85, Pages 1–21https://doi.org/10.1145/3477016Hardware accelerators are essential to the accommodation of ever-increasing Deep Neural Network (DNN) workloads on the resource-constrained embedded devices. While accelerators facilitate fast and energy-efficient DNN operations, their accuracy is ...
- research-articleSeptember 2021
- research-articleSeptember 2021
REPAIR: Control Flow Protection based on Register Pairing Updates for SW-Implemented HW Fault Tolerance
ACM Transactions on Embedded Computing Systems (TECS), Volume 20, Issue 5sArticle No.: 70, Pages 1–22https://doi.org/10.1145/3477001Safety-critical embedded systems may either use specialized hardware or rely on Software-Implemented Hardware Fault Tolerance (SIHFT) to meet soft error resilience requirements. SIHFT has the advantage that it can be used with low-cost, off-the-shelf ...
- research-articleSeptember 2021
Learning to Train CNNs on Faulty ReRAM-based Manycore Accelerators
ACM Transactions on Embedded Computing Systems (TECS), Volume 20, Issue 5sArticle No.: 55, Pages 1–23https://doi.org/10.1145/3476986The growing popularity of convolutional neural networks (CNNs) has led to the search for efficient computational platforms to accelerate CNN training. Resistive random-access memory (ReRAM)-based manycore architectures offer a promising alternative to ...
- research-articleJuly 2021
Integrated Hardware Garbage Collection
ACM Transactions on Embedded Computing Systems (TECS), Volume 20, Issue 5Article No.: 40, Pages 1–25https://doi.org/10.1145/3450147Garbage collected programming languages, such as Python and C#, have accelerated software development. These modern languages increase productivity and software reliability as they provide high-level data representation and control structures. Modern ...
- research-articleMay 2021
Reliability-aware Scheduling and Routing for Messages in Time-sensitive Networking
ACM Transactions on Embedded Computing Systems (TECS), Volume 20, Issue 5Article No.: 41, Pages 1–24https://doi.org/10.1145/3458768Time-sensitive Networking (TSN) on Ethernet is a promising communication technology in the automotive and industrial automation industries due to its real-time and high-bandwidth communication capabilities. Time-triggered scheduling and static routing ...
- research-articleMarch 2021
Precise Cache Profiling for Studying Radiation Effects
ACM Transactions on Embedded Computing Systems (TECS), Volume 20, Issue 3Article No.: 25, Pages 1–25https://doi.org/10.1145/3442339Increased access to space has led to an increase in the usage of commodity processors in radiation environments. These processors are vulnerable to transient faults such as single event upsets that may cause bit-flips in processor components. Caches in ...