Online Learning Algorithms for Context-Aware Video Caching in D2D Edge Networks
With the emergence of various short video platforms such as TikTok and Instagram, coupled with the accelerated pace of people's lives, people are spending more time sharing and watching online videos than ever before, and they gradually turn their ...
Adaptive Auto-Tuning Framework for Global Exploration of Stencil Optimization on GPUs
Stencil computations are widely used in high performance computing (HPC) applications. Many HPC platforms utilize the high computation capability of GPUs to accelerate stencil computations. In recent years, stencils have become more diverse in terms of ...
CLIC: An Extensible and Efficient Cross-Platform Data Analytics System
With the ever-increasing data volume and application diversity, a modern data analytics job is generally built as a workflow consisting of multiple tasks. For either specific functionalities or higher performance, tasks in a workflow may need to be ...
A High-Performance and Energy-Efficient Photonic Architecture for Multi-DNN Acceleration
Large-scale deep neural network (DNN) accelerators are poised to facilitate the concurrent processing of diverse DNNs, imposing demanding challenges on the interconnection fabric. These challenges encompass overcoming performance degradation and energy ...
Demystifying the Cost of Serverless Computing: Towards a Win-Win Deal
Serverless is an emerging computing paradigm that greatly simplifies the development, deployment, and maintenance of cloud applications. However, due to potential cost issues brought by the widely adopted pricing, it is difficult to answer how to use and ...
SpatialSSJP: QoS-Aware Adaptive Approximate Stream-Static Spatial Join Processor
The widespread adoption of Internet of Things (IoT) motivated the emergence of mixed workloads in smart cities, where fast arriving geo-referenced big data streams are joined with archive tables, aiming at enriching streams with descriptive attributes ...
Hopscotch: A Hardware-Software Co-Design for Efficient Cache Resizing on Multi-Core SoCs
Following the trend of increasing autonomy in real-time systems, multi-core System-on-Chips (SoCs) have enabled devices to better handle the large streams of data and intensive computation required by such autonomous systems. In modern multi-core SoCs, ...
Enabling Streaming Analytics in Satellite Edge Computing via Timely Evaluation of Big Data Queries
Internet-of-Things (IoT) applications from many industries, such as transportation (maritime, road, rail, air) and fleet management, offshore monitoring, and farming are located in remote areas without cellular connectivity. Such IoT applications ...
US-Byte: An Efficient Communication Framework for Scheduling Unequal-Sized Tensor Blocks in Distributed Deep Learning
The communication bottleneck severely constrains the scalability of distributed deep learning, and efficient communication scheduling accelerates distributed DNN training by overlapping computation and communication tasks. However, existing approaches ...
Flexible and Efficient Memory Swapping Across Mobile Devices With LegoSwap
This article presents LegoSwap, a cross-device memory swapping mechanism for mobile devices. It exploits the unbalanced utilization of memory resources across devices. With LegoSwap, remote memory is utilized in a seamless plug-and-play manner. It ...
Enabling Efficient Erasure Coding in Disaggregated Memory Systems
Disaggregated memory (DM) separates compute and memory resources to build a huge memory pool. Erasure coding (EC) is expected to provide fault tolerance in DM with low memory cost. In DM with EC, objects are first coded in compute servers, then directly ...
Batch Jobs Load Balancing Scheduling in Cloud Computing Using Distributional Reinforcement Learning
In cloud computing, how to reasonably allocate computing resources for batch jobs to ensure the load balance of dynamic clusters and meet user requests is an important and challenging task. Most existing studies are based on deep Q network, which utilizes ...
PaVM: A Parallel Virtual Machine for Smart Contract Execution and Validation
The performance bottleneck of blockchain has shifted from consensus to serial smart contract execution in transaction validation. Previous works predominantly focus on inter-contract parallel execution, but they fail to address the inherent limitations of ...