Author: Cao, Ting : Search

research-article

Free

JUST ACCEPTED

Anatomizing Deep Learning Inference in Web Browsers

ACM Transactions on Software Engineering and Methodology (TOSEM), Just Accepted https://doi.org/10.1145/3688843

Web applications have increasingly adopted Deep Learning (DL) through in-browser inference, wherein DL inference performs directly within Web browsers. The actual performance of in-browser inference and its impacts on the quality of experience (QoE) ...

research-article

Open Access

Hybrid SLM and LLM for Edge-Cloud Collaborative Inference

EdgeFM '24: Proceedings of the Workshop on Edge and Mobile Foundation ModelsJune 2024, Pages 36–41https://doi.org/10.1145/3662006.3662067

Edge-Cloud collaboration for deep learning inference has been actively studied, to enhance the inference performance by leveraging both Edge and Cloud resources. However, traditional Edge-Cloud collaboration based on model partitioning or confidence ...

research-article

Open Access

Empowering In-Browser Deep Learning Inference on Edge Through Just-In-Time Kernel Optimization

MOBISYS '24: Proceedings of the 22nd Annual International Conference on Mobile Systems, Applications and ServicesJune 2024, Pages 438–450https://doi.org/10.1145/3643832.3661892

Web is increasingly becoming the primary platform to deliver AI services onto edge devices, making in-browser deep learning (DL) inference more prominent. Nevertheless, the heterogeneity of edge devices, combined with the underdeveloped state of Web ...

short-paper

Open Access

Poster: Design of Elastic Deep Neural Network Candidate Spaces for Inference on Diverse Devices

MOBISYS '24: Proceedings of the 22nd Annual International Conference on Mobile Systems, Applications and ServicesJune 2024, Pages 734–735https://doi.org/10.1145/3643832.3661445

Deep Neural Network (DNN) inference on edge devices is now a common practice. However, tailoring a model for multiple devices involves a lot of time and effort. While elastic models, also known as weight-sharing models, have been proposed as an efficient ...

research-article

Open Access

FlexNN: Efficient and Adaptive DNN Inference on Memory-Constrained Edge Devices

ACM MobiCom '24: Proceedings of the 30th Annual International Conference on Mobile Computing and NetworkingMay 2024, Pages 709–723https://doi.org/10.1145/3636534.3649391

Due to the popularity of deep neural networks (DNNs) and considerations over network overhead, data privacy, and inference latency, there is a growing interest in deploying DNNs to edge devices in recent years. However, the limited memory becomes a major ...

research-article

PIM-DL: Expanding the Applicability of Commodity DRAM-PIMs for Deep Learning via Algorithm-System Co-Optimization

ASPLOS '24: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2April 2024, Pages 879–896https://doi.org/10.1145/3620665.3640376

DRAM-based processing-in-memory (DRAM-PIM) has gained commercial prominence in recent years. However, their integration for deep learning acceleration poses inherent challenges. Existing DRAM-PIMs are limited in computational capabilities, primarily ...

research-article

Open Access

ConvStencil: Transform Stencil Computation to Matrix Multiplication on Tensor Cores

PPoPP '24: Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel ProgrammingMarch 2024, Pages 333–347https://doi.org/10.1145/3627535.3638476

Tensor Core Unit (TCU) is increasingly integrated into modern high-performance processors to enhance matrix multiplication performance. However, constrained to its over-specification, its potential for improving other critical scientific operations like ...

research-article

LUT-NN: Empower Efficient Neural Network Inference with Centroid Learning and Table Lookup

ACM MobiCom '23: Proceedings of the 29th Annual International Conference on Mobile Computing and NetworkingOctober 2023, Article No.: 70, Pages 1–15https://doi.org/10.1145/3570361.3613285

On-device Deep Neural Network (DNN) inference consumes significant computing resources and development efforts. To alleviate that, we propose LUT-NN, the first system to empower inference by table lookup, to reduce inference cost. LUT-NN learns the ...

research-article

Free

Constraint-aware and Ranking-distilled Token Pruning for Efficient Transformer Inference

KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data MiningAugust 2023, Pages 1280–1290https://doi.org/10.1145/3580305.3599284

Deploying pre-trained transformer models like BERT on downstream tasks in resource-constrained scenarios is challenging due to their high inference cost, which grows rapidly with input sequence length. In this work, we propose a constraint-aware and ...

research-article

HiMoDepth: Efficient Training-Free High-Resolution On-Device Depth Perception

IEEE Transactions on Mobile Computing (ITMV), Volume 23, Issue 5May 2024, Pages 4648–4664https://doi.org/10.1109/TMC.2023.3294188

High-resolution depth estimation, with a minimum resolution of <inline-formula><tex-math notation="LaTeX">$1280\times 960$</tex-math><alternatives><mml:math><mml:mrow><mml:mn>1280</mml:mn><mml:mo>×</mml:mo><mml:mn>960</mml:mn></mml:mrow></mml:math><...

research-article

Pavement Crack Detection Based on 3D Edge Representation and Data Communication With Digital Twins

IEEE Transactions on Intelligent Transportation Systems (ITS-TRANSACTIONS), Volume 24, Issue 7July 2023, Pages 7697–7706https://doi.org/10.1109/TITS.2022.3194013

With digital information applied in intelligent transportation system, pavement crack detection with digital twins has drawn widely attention since the past several years. However, it is still a challenge task to accomplish crack detection satisfactory ...

research-article

NN-Stretch: Automatic Neural Network Branching for Parallel Inference on Heterogeneous Multi-Processors

MobiSys '23: Proceedings of the 21st Annual International Conference on Mobile Systems, Applications and ServicesJune 2023, Pages 70–83https://doi.org/10.1145/3581791.3596870

Mobile devices are increasingly equipped with heterogeneous multiprocessors, e.g., CPU + GPU + DSP. Yet existing Neural Network (NN) inference fails to fully utilize the computing power of the heterogeneous multi-processors due to the sequential ...

research-article

Boosting DNN Cold Inference on Edge Devices

MobiSys '23: Proceedings of the 21st Annual International Conference on Mobile Systems, Applications and ServicesJune 2023, Pages 516–529https://doi.org/10.1145/3581791.3596842

DNNs are ubiquitous on edge devices nowadays. With its increasing importance and use cases, it's not likely to pack all DNNs into device memory and expect that each inference has been warmed up. Therefore, cold inference, the process to read, initialize, ...

research-article

Generated Pseudo-Labels Guided by Background Skeletons for Overcoming Under-Segmentation in Overlapping Particle Objects

IEEE Transactions on Circuits and Systems for Video Technology (IEEETCSVT), Volume 33, Issue 6June 2023, Pages 2906–2919https://doi.org/10.1109/TCSVT.2022.3230451

Unlike general image segmentation, highly complex particle images have significant challenges in labeling and segmentation due to the information occlusion and texture disturbance. Aiming at the highly under-segmentation problem caused by complex particle ...

research-article

Open Access

Hyperion: A Generic and Distributed Mobile Offloading Framework on OpenCL

SenSys '22: Proceedings of the 20th ACM Conference on Embedded Networked Sensor SystemsNovember 2022, Pages 607–621https://doi.org/10.1145/3560905.3568511

Despite the significant development of mobile device SoCs, they are still inefficient in computing computation-intensive workloads, such as high-resolution image processing and AR/VR applications. Offloading offers a promising way to leverage cloud or ...

research-article

Turbo: Opportunistic Enhancement for Edge Video Analytics

SenSys '22: Proceedings of the 20th ACM Conference on Embedded Networked Sensor SystemsNovember 2022, Pages 263–276https://doi.org/10.1145/3560905.3568501

Edge computing is being widely used for video analytics. To alleviate the inherent tension between accuracy and cost, various video analytics pipelines have been proposed to optimize the usage of GPU on edge nodes. Nonetheless, we find that GPU compute ...

research-article

SwiftPruner: Reinforced Evolutionary Pruning for Efficient Ad Relevance

CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge ManagementOctober 2022, Pages 3654–3663https://doi.org/10.1145/3511808.3557139

Ad relevance modeling plays a critical role in online advertising systems including Microsoft Bing. To leverage powerful transformers like BERT in this low-latency setting, many existing approaches perform ad-side computations offline. While efficient, ...

research-article

Open Access

MobiDepth: real-time depth estimation using on-device dual cameras

MobiCom '22: Proceedings of the 28th Annual International Conference on Mobile Computing And NetworkingOctober 2022, Pages 528–541https://doi.org/10.1145/3495243.3560517

Real-time depth estimation is critical for the increasingly popular augmented reality and virtual reality applications on mobile devices. Yet existing solutions are insufficient as they require expensive depth sensors or motion of the device, or have a ...

research-article

Open Access

Romou: rapidly generate high-performance tensor kernels for mobile GPUs

MobiCom '22: Proceedings of the 28th Annual International Conference on Mobile Computing And NetworkingOctober 2022, Pages 487–500https://doi.org/10.1145/3495243.3517020

Mobile GPU, as a ubiquitous and powerful accelerator, plays an important role in accelerating on-device DNN (Deep Neural Network) inference. The frequent-upgrade and diversity of mobile GPUs require automatic kernel generation to empower fast DNN ...

research-article

Open Access

Unified Holistic Memory Management Supporting Multiple Big Data Processing Frameworks over Hybrid Memories

ACM Transactions on Computer Systems (TOCS), Volume 39, Issue 1-4Article No.: 2, Pages 1–38https://doi.org/10.1145/3511211

To process real-world datasets, modern data-parallel systems often require extremely large amounts of memory, which are both costly and energy inefficient. Emerging non-volatile memory (NVM) technologies offer high capacity compared to DRAM and low energy ...

Applied Filters

People

Names

Institutions

Authors

Editors

Advisors

Publications

Journal/Magazine Names

Proceedings/Book Names

All Publications

Content Type

Supplemental Material Type

Media Formats

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Reproducibility Badges

Publication Date

Save to Binder

Upcoming Conferences