Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleAugust 2024JUST ACCEPTED
Anatomizing Deep Learning Inference in Web Browsers
ACM Transactions on Software Engineering and Methodology (TOSEM), Just Accepted https://doi.org/10.1145/3688843Web applications have increasingly adopted Deep Learning (DL) through in-browser inference, wherein DL inference performs directly within Web browsers. The actual performance of in-browser inference and its impacts on the quality of experience (QoE) ...
- research-articleJune 2024
Hybrid SLM and LLM for Edge-Cloud Collaborative Inference
EdgeFM '24: Proceedings of the Workshop on Edge and Mobile Foundation ModelsJune 2024, Pages 36–41https://doi.org/10.1145/3662006.3662067Edge-Cloud collaboration for deep learning inference has been actively studied, to enhance the inference performance by leveraging both Edge and Cloud resources. However, traditional Edge-Cloud collaboration based on model partitioning or confidence ...
- research-articleJune 2024
Empowering In-Browser Deep Learning Inference on Edge Through Just-In-Time Kernel Optimization
- Fucheng Jia,
- Shiqi Jiang,
- Ting Cao,
- Wei Cui,
- Tianrui Xia,
- Xu Cao,
- Yuanchun Li,
- Qipeng Wang,
- Deyu Zhang,
- Ju Ren,
- Yunxin Liu,
- Lili Qiu,
- Mao Yang
MOBISYS '24: Proceedings of the 22nd Annual International Conference on Mobile Systems, Applications and ServicesJune 2024, Pages 438–450https://doi.org/10.1145/3643832.3661892Web is increasingly becoming the primary platform to deliver AI services onto edge devices, making in-browser deep learning (DL) inference more prominent. Nevertheless, the heterogeneity of edge devices, combined with the underdeveloped state of Web ...
- short-paperJune 2024
Poster: Design of Elastic Deep Neural Network Candidate Spaces for Inference on Diverse Devices
MOBISYS '24: Proceedings of the 22nd Annual International Conference on Mobile Systems, Applications and ServicesJune 2024, Pages 734–735https://doi.org/10.1145/3643832.3661445Deep Neural Network (DNN) inference on edge devices is now a common practice. However, tailoring a model for multiple devices involves a lot of time and effort. While elastic models, also known as weight-sharing models, have been proposed as an efficient ...
- research-articleMay 2024
FlexNN: Efficient and Adaptive DNN Inference on Memory-Constrained Edge Devices
ACM MobiCom '24: Proceedings of the 30th Annual International Conference on Mobile Computing and NetworkingMay 2024, Pages 709–723https://doi.org/10.1145/3636534.3649391Due to the popularity of deep neural networks (DNNs) and considerations over network overhead, data privacy, and inference latency, there is a growing interest in deploying DNNs to edge devices in recent years. However, the limited memory becomes a major ...
-
- research-articleApril 2024
PIM-DL: Expanding the Applicability of Commodity DRAM-PIMs for Deep Learning via Algorithm-System Co-Optimization
ASPLOS '24: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2April 2024, Pages 879–896https://doi.org/10.1145/3620665.3640376DRAM-based processing-in-memory (DRAM-PIM) has gained commercial prominence in recent years. However, their integration for deep learning acceleration poses inherent challenges. Existing DRAM-PIMs are limited in computational capabilities, primarily ...
- research-articleFebruary 2024
ConvStencil: Transform Stencil Computation to Matrix Multiplication on Tensor Cores
- Yuetao Chen,
- Kun Li,
- Yuhao Wang,
- Donglin Bai,
- Lei Wang,
- Lingxiao Ma,
- Liang Yuan,
- Yunquan Zhang,
- Ting Cao,
- Mao Yang
PPoPP '24: Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel ProgrammingMarch 2024, Pages 333–347https://doi.org/10.1145/3627535.3638476Tensor Core Unit (TCU) is increasingly integrated into modern high-performance processors to enhance matrix multiplication performance. However, constrained to its over-specification, its potential for improving other critical scientific operations like ...
LUT-NN: Empower Efficient Neural Network Inference with Centroid Learning and Table Lookup
ACM MobiCom '23: Proceedings of the 29th Annual International Conference on Mobile Computing and NetworkingOctober 2023, Article No.: 70, Pages 1–15https://doi.org/10.1145/3570361.3613285On-device Deep Neural Network (DNN) inference consumes significant computing resources and development efforts. To alleviate that, we propose LUT-NN, the first system to empower inference by table lookup, to reduce inference cost. LUT-NN learns the ...
- research-articleAugust 2023
Constraint-aware and Ranking-distilled Token Pruning for Efficient Transformer Inference
- Junyan Li,
- Li Lyna Zhang,
- Jiahang Xu,
- Yujing Wang,
- Shaoguang Yan,
- Yunqing Xia,
- Yuqing Yang,
- Ting Cao,
- Hao Sun,
- Weiwei Deng,
- Qi Zhang,
- Mao Yang
KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data MiningAugust 2023, Pages 1280–1290https://doi.org/10.1145/3580305.3599284Deploying pre-trained transformer models like BERT on downstream tasks in resource-constrained scenarios is challenging due to their high inference cost, which grows rapidly with input sequence length. In this work, we propose a constraint-aware and ...
- research-articleJuly 2023
HiMoDepth: Efficient Training-Free High-Resolution On-Device Depth Perception
- Jinrui Zhang,
- Huan Yang,
- Ju Ren,
- Deyu Zhang,
- Bangwen He,
- Youngki Lee,
- Ting Cao,
- Yuanchun Li,
- Yaoxue Zhang,
- Yunxin Liu
IEEE Transactions on Mobile Computing (ITMV), Volume 23, Issue 5May 2024, Pages 4648–4664https://doi.org/10.1109/TMC.2023.3294188High-resolution depth estimation, with a minimum resolution of <inline-formula><tex-math notation="LaTeX">$1280\times 960$</tex-math><alternatives><mml:math><mml:mrow><mml:mn>1280</mml:mn><mml:mo>×</mml:mo><mml:mn>960</mml:mn></mml:mrow></mml:math><...
- research-articleJuly 2023
Pavement Crack Detection Based on 3D Edge Representation and Data Communication With Digital Twins
IEEE Transactions on Intelligent Transportation Systems (ITS-TRANSACTIONS), Volume 24, Issue 7July 2023, Pages 7697–7706https://doi.org/10.1109/TITS.2022.3194013With digital information applied in intelligent transportation system, pavement crack detection with digital twins has drawn widely attention since the past several years. However, it is still a challenge task to accomplish crack detection satisfactory ...
NN-Stretch: Automatic Neural Network Branching for Parallel Inference on Heterogeneous Multi-Processors
MobiSys '23: Proceedings of the 21st Annual International Conference on Mobile Systems, Applications and ServicesJune 2023, Pages 70–83https://doi.org/10.1145/3581791.3596870Mobile devices are increasingly equipped with heterogeneous multiprocessors, e.g., CPU + GPU + DSP. Yet existing Neural Network (NN) inference fails to fully utilize the computing power of the heterogeneous multi-processors due to the sequential ...
- research-articleJune 2023
Boosting DNN Cold Inference on Edge Devices
MobiSys '23: Proceedings of the 21st Annual International Conference on Mobile Systems, Applications and ServicesJune 2023, Pages 516–529https://doi.org/10.1145/3581791.3596842DNNs are ubiquitous on edge devices nowadays. With its increasing importance and use cases, it's not likely to pack all DNNs into device memory and expect that each inference has been warmed up. Therefore, cold inference, the process to read, initialize, ...
- research-articleJune 2023
Generated Pseudo-Labels Guided by Background Skeletons for Overcoming Under-Segmentation in Overlapping Particle Objects
IEEE Transactions on Circuits and Systems for Video Technology (IEEETCSVT), Volume 33, Issue 6June 2023, Pages 2906–2919https://doi.org/10.1109/TCSVT.2022.3230451Unlike general image segmentation, highly complex particle images have significant challenges in labeling and segmentation due to the information occlusion and texture disturbance. Aiming at the highly under-segmentation problem caused by complex particle ...
- research-articleJanuary 2023
Hyperion: A Generic and Distributed Mobile Offloading Framework on OpenCL
SenSys '22: Proceedings of the 20th ACM Conference on Embedded Networked Sensor SystemsNovember 2022, Pages 607–621https://doi.org/10.1145/3560905.3568511Despite the significant development of mobile device SoCs, they are still inefficient in computing computation-intensive workloads, such as high-resolution image processing and AR/VR applications. Offloading offers a promising way to leverage cloud or ...
- research-articleJanuary 2023
Turbo: Opportunistic Enhancement for Edge Video Analytics
SenSys '22: Proceedings of the 20th ACM Conference on Embedded Networked Sensor SystemsNovember 2022, Pages 263–276https://doi.org/10.1145/3560905.3568501Edge computing is being widely used for video analytics. To alleviate the inherent tension between accuracy and cost, various video analytics pipelines have been proposed to optimize the usage of GPU on edge nodes. Nonetheless, we find that GPU compute ...
- research-articleOctober 2022
SwiftPruner: Reinforced Evolutionary Pruning for Efficient Ad Relevance
CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge ManagementOctober 2022, Pages 3654–3663https://doi.org/10.1145/3511808.3557139Ad relevance modeling plays a critical role in online advertising systems including Microsoft Bing. To leverage powerful transformers like BERT in this low-latency setting, many existing approaches perform ad-side computations offline. While efficient, ...
- research-articleOctober 2022
MobiDepth: real-time depth estimation using on-device dual cameras
MobiCom '22: Proceedings of the 28th Annual International Conference on Mobile Computing And NetworkingOctober 2022, Pages 528–541https://doi.org/10.1145/3495243.3560517Real-time depth estimation is critical for the increasingly popular augmented reality and virtual reality applications on mobile devices. Yet existing solutions are insufficient as they require expensive depth sensors or motion of the device, or have a ...
- research-articleOctober 2022
Romou: rapidly generate high-performance tensor kernels for mobile GPUs
MobiCom '22: Proceedings of the 28th Annual International Conference on Mobile Computing And NetworkingOctober 2022, Pages 487–500https://doi.org/10.1145/3495243.3517020Mobile GPU, as a ubiquitous and powerful accelerator, plays an important role in accelerating on-device DNN (Deep Neural Network) inference. The frequent-upgrade and diversity of mobile GPUs require automatic kernel generation to empower fast DNN ...
- research-articleJuly 2022
Unified Holistic Memory Management Supporting Multiple Big Data Processing Frameworks over Hybrid Memories
- Lei Chen,
- Jiacheng Zhao,
- Chenxi Wang,
- Ting Cao,
- John Zigman,
- Haris Volos,
- Onur Mutlu,
- Fang Lv,
- Xiaobing Feng,
- Guoqing Harry Xu,
- Huimin Cui
ACM Transactions on Computer Systems (TOCS), Volume 39, Issue 1-4Article No.: 2, Pages 1–38https://doi.org/10.1145/3511211To process real-world datasets, modern data-parallel systems often require extremely large amounts of memory, which are both costly and energy inefficient. Emerging non-volatile memory (NVM) technologies offer high capacity compared to DRAM and low energy ...