research-article

Open access

Towards a Machine Learning-Assisted Kernel with LAKE

Authors:

Henrique Fingler,

Christopher J. RossbachAuthors Info & Claims

ASPLOS 2023: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2

Pages 846 - 861

https://doi.org/10.1145/3575693.3575697

Published: 30 January 2023 Publication History

Abstract

The complexity of modern operating systems (OSes), rapid diversification of hardware, and steady evolution of machine learning (ML) motivate us to explore the potential of ML to improve decision-making in OS kernels. We conjecture that ML can better manage tradeoff spaces for subsystems such as memory management and process and I/O scheduling that currently rely on hand-tuned heuristics to provide reasonable average-case performance. We explore the replacement of heuristics with ML-driven decision-making in five kernel subsystems, consider the implications for kernel design, shared OS-level components, and access to hardware acceleration. We identify obstacles, address challenges and characterize tradeoffs for the benefits ML can provide that arise in kernel-space. We find that use of specialized hardware such as GPUs is critical to absorbing the additional computational load required by ML decisioning, but that poor accessibility of accelerators in kernel space is a barrier to adoption. We also find that the benefits of ML and acceleration for OSes is subsystem-, workload- and hardware-dependent, suggesting that using ML in kernels will require frameworks to help kernel developers navigate new tradeoff spaces. We address these challenge by building a system called LAKE for supporting ML and exposing accelerators in kernel space. LAKE includes APIs for feature collection and management across abstraction layers and module boundaries. LAKE provides mechanisms for managing the variable profitability of acceleration, and interfaces for mitigating contention for resources between user and kernel space. We show that an ML-backed I/O latency predictor can have its inference time reduced by up to 96% with acceleration.

References

[1]

[n. d.]. Amazon EBS volume types. https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-volume-types.html

[2]

[n. d.]. Available first on Google Cloud: Intel Optane DC Persistent Memory | Google Cloud Blog. https://cloud.google.com/blog/topics/partners/available-first-on-google-cloud-intel-optane-dc-persistent-memory June, 2022.

[3]

[n. d.]. Bitfusion: The Elastic AI Infrastructure for Multi-Cloud. https://bitfusion.io Accessed: 2019-04.

[4]

[n. d.]. eBPF - Introduction, Tutorials & Community Resources. https://ebpf.io/ June, 2022.

[5]

[n. d.]. Graphcore: Accelerating machine learning for a world of intelligent machines. https://www.graphcore.ai Accessed: 2019-12.

[6]

[n. d.]. Introducing new product innovations for SAP HANA, Expanded AI collaboration with SAP and more | Azure Blog and Updates | Microsoft Azure. https://azure.microsoft.com/en-us/blog/introducing-new-product-innovations-for-sap-hana-expanded-ai-collaboration-with-sap-and-more/ June, 2022.

[7]

[n. d.]. NVIDIA Releases Open-Source GPU Kernel Modules. https://developer.nvidia.com/blog/nvidia-releases-open-source-gpu-kernel-modules/ June, 2022.

[8]

[n. d.]. Optimizing Recurrent Neural Networks in cuDNN 5. https://developer.nvidia.com/blog/optimizing-recurrent-neural-networks-cudnn-5/ May, 2022.

[9]

Accessed: 2020. DPDK documentation. https://doc.dpdk.org/guides/

[10]

Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A System for Large-Scale Machine Learning. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI’16). USENIX Association, USA. 265–283. isbn:9781931971331

Digital Library

[11]

Ibrahim Umit Akgun, Ali Selman Aydin, Aadil Shaikh, Lukas Velikov, and Erez Zadok. 2021. A Machine Learning Framework to Improve Storage System Performance. In Proceedings of the 13th ACM Workshop on Hot Topics in Storage and File Systems (HotStorage ’21). Association for Computing Machinery, New York, NY, USA. 94–102. isbn:9781450385503 https://doi.org/10.1145/3465332.3470875

Digital Library

[12]

Ibrahim Umit Akgun, Ali Selman Aydin, and Erez Zadok. 2020. KMLIB: TOWARDS MACHINE LEARNING FOR OPERATING SYSTEMS.

[13]

Amogh Akshintala, Hangchen Yu, Arthur Peters, and Christopher J. Rossbach. 2019. Trillium: The code is the IR. In 2019 International Conference on High Performance Computing & Simulation (HPCS). 880–889. https://doi.org/10.1109/HPCS48598.2019.9188169

[14]

André Brinkmann and Dominic Eschweiler. 2009. A microdriver architecture for error correcting codes inside the Linux kernel. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis. 35.

Digital Library

[15]

Chandranil Chakraborttii and Heiner Litz. 2020. Learning I/O Access Patterns to Improve Prefetching in SSDs. In Machine Learning and Knowledge Discovery in Databases: Applied Data Science Track: European Conference, ECML PKDD 2020, Ghent, Belgium, September 14-18, 2020, Proceedings, Part IV. Springer-Verlag, Berlin, Heidelberg. 427–443. isbn:978-3-030-67666-7 https://doi.org/10.1007/978-3-030-67667-4_26

Digital Library

[16]

Jingde Chen, Subho S. Banerjee, Zbigniew T. Kalbarczyk, and Ravishankar K. Iyer. 2020. Machine Learning for Load Balancing in the Linux Kernel. In Proceedings of the 11th ACM SIGOPS Asia-Pacific Workshop on Systems (APSys ’20). Association for Computing Machinery, New York, NY, USA. 67–74. isbn:9781450380690 https://doi.org/10.1145/3409963.3410492

Digital Library

[17]

Eli Cortez, Anand Bonde, Alexandre Muzio, Mark Russinovich, Marcus Fontoura, and Ricardo Bianchini. 2017. Resource central: Understanding and predicting workloads for improved resource management in large cloud platforms. In Proceedings of the 26th Symposium on Operating Systems Principles. 153–167.

Digital Library

[18]

John Demme, Matthew Maycock, Jared Schmitz, Adrian Tang, Adam Waksman, Simha Sethumadhavan, and Salvatore Stolfo. 2013. On the Feasibility of Online Malware Detection with Performance Counters. SIGARCH Comput. Archit. News, 41, 3 (2013), June, 559–570. issn:0163-5964 https://doi.org/10.1145/2508148.2485970

Digital Library

[19]

Thaleia Dimitra Doudali, Sergey Blagodurov, Abhinav Vishnu, Sudhanva Gurumurthi, and Ada Gavrilovska. 2019. Kleio: A Hybrid Memory Page Scheduler with Machine Intelligence. In Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing (HPDC ’19). Association for Computing Machinery, New York, NY, USA. 37–48. isbn:9781450366700 https://doi.org/10.1145/3307681.3325398

Digital Library

[20]

Peter Druschel, Larry L. Peterson, and Bruce S. Davie. 1994. Experiences with a High-Speed Network Adaptor: A Software Perspective. In Proceedings of the Conference on Communications Architectures, Protocols and Applications (SIGCOMM ’94). Association for Computing Machinery, New York, NY, USA. 2–13. isbn:0897916824 https://doi.org/10.1145/190314.190315

Digital Library

[21]

José Duato, Antonio J Pena, Federico Silla, Juan C Fernandez, Rafael Mayo, and Enrique S Quintana-Orti. 2011. Enabling CUDA Acceleration within Virtual Machines using rCUDA. In 2011 18th International Conference on High Performance Computing. 1–10.

Digital Library

[22]

E. Eskin, Wenke Lee, and S. J. Stolfo. 2001. Modeling system calls for intrusion detection with dynamic window sizes. In Proceedings DARPA Information Survivability Conference and Exposition II. DISCEX’01. 1, 165–175 vol.1. https://doi.org/10.1109/DISCEX.2001.932213

[23]

Henrique Fingler and Isha Tarte. 2022. utcs-scea/LAKE-linux-6.0: v1. Nov., https://doi.org/10.5281/zenodo.7277147

[24]

Henrique Fingler, Isha Tarte, and ishaaaa. 2022. utcs-scea/LAKE: v1. Nov., https://doi.org/10.5281/zenodo.7277139

Digital Library

[25]

Henrique Fingler, Zhiting Zhu, Esther Yoon, Zhipeng Jia, Emmett Witchel, and Christopher J. Rossbach. 2022. DGSF: Disaggregated GPUs for Serverless Functions. In 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 739–750. https://doi.org/10.1109/IPDPS53621.2022.00077

[26]

Gaddisa Olani Ganfure, Chun-Feng Wu, Yuan-Hao Chang, and Wei-Kuan Shih. 2020. DeepPrefetcher: A Deep Learning Framework for Data Prefetching in Flash Storage Devices. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 39, 11 (2020), 3311–3322. https://doi.org/10.1109/TCAD.2020.3012173

[27]

Mathias Gottschlag, Marius Hillenbrand, Jens Kehne, Jan Stoess, and Frank Bellosa. 2013. LoGV: Low-overhead GPGPU virtualization. In High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing (HPCC_EUC), 2013 IEEE 10th International Conference on. 1721–1726.

[28]

Shay Gueron. 2009. Intel’s New AES Instructions for Enhanced Performance and Security. In FSE.

[29]

Vishakha Gupta, Ada Gavrilovska, Karsten Schwan, Harshvardhan Kharche, Niraj Tolia, Vanish Talwar, and Parthasarathy Ranganathan. 2009. GViM: GPU-accelerated Virtual Machines. In Proceedings of the 3rd ACM Workshop on System-level Virtualization for High Performance Computing. 17–24.

Digital Library

[30]

Mike Halcrow. 2007. ECryptfs: A stacked cryptographic filesystem. Linux Journal, 2007 (2007), 04.

[31]

Sangjin Han, Keon Jang, KyoungSoo Park, and Sue Moon. 2011. PacketShader: a GPU-accelerated software router. ACM SIGCOMM Computer Communication Review, 41, 4 (2011), 195–206.

Digital Library

[32]

Mingzhe Hao, Levent Toksoz, Nanqinqin Li, Edward Edberg Halim, Henry Hoffmann, and Haryadi S. Gunawi. 2020. LinnOS: Predictability on Unpredictable Flash Storage with a Light Neural Network. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). USENIX Association, 173–190. isbn:978-1-939133-19-9 https://www.usenix.org/conference/osdi20/presentation/hao

[33]

Shifu Hou, Aaron Saas, Lifei Chen, and Yanfang Ye. 2016. Deep4MalDroid: A Deep Learning Framework for Android Malware Detection Based on Linux Kernel System Call Graphs. In 2016 IEEE/WIC/ACM International Conference on Web Intelligence Workshops (WIW). 104–111. https://doi.org/10.1109/WIW.2016.040

[34]

Jack Tigar Humphries, Neel Natu, Ashwin Chaugule, Ofir Weisse, Barret Rhoden, Josh Don, Luigi Rizzo, Oleg Rombakh, Paul Turner, and Christos Kozyrakis. 2021. GhOSt: Fast & Flexible User-Space Delegation of Linux Scheduling. In Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles (SOSP ’21). Association for Computing Machinery, New York, NY, USA. 588–604. isbn:9781450387095 https://doi.org/10.1145/3477132.3483542

Digital Library

[35]

Tyler Hunt, Zhipeng Jia, Vance Miller, Ariel Szekely, Yige Hu, Christopher J. Rossbach, and Emmett Witchel. 2020. Telekine: Secure Computing with Cloud GPUs. In 17th USENIX Symposium on Networked Systems Design and Implementation (NSDI 20). USENIX Association, Santa Clara, CA. 817–833. isbn:978-1-939133-13-7 https://www.usenix.org/conference/nsdi20/presentation/hunt

[36]

Paras Jain, Xiangxi Mo, Ajay Jain, Harikaran Subbaraj, Rehan Sohail Durrani, Alexey Tumanov, Joseph Gonzalez, and Ion Stoica. 2018. Dynamic Space-Time Scheduling for GPU Inference. In Thirty-second Conference on Neural Information Processing Systems.

[37]

Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, Rick Boyle, Pierre-luc Cantin, Clifford Chao, Chris Clark, Jeremy Coriell, Mike Daley, Matt Dau, Jeffrey Dean, Ben Gelb, Tara Vazir Ghaemmaghami, Rajendra Gottipati, William Gulland, Robert Hagmann, C. Richard Ho, Doug Hogberg, John Hu, Robert Hundt, Dan Hurt, Julian Ibarz, Aaron Jaffey, Alek Jaworski, Alexander Kaplan, Harshit Khaitan, Daniel Killebrew, Andy Koch, Naveen Kumar, Steve Lacy, James Laudon, James Law, Diemthu Le, Chris Leary, Zhuyuan Liu, Kyle Lucke, Alan Lundin, Gordon MacKean, Adriana Maggiore, Maire Mahony, Kieran Miller, Rahul Nagarajan, Ravi Narayanaswami, Ray Ni, Kathy Nix, Thomas Norrie, Mark Omernick, Narayana Penukonda, Andy Phelps, Jonathan Ross, Matt Ross, Amir Salek, Emad Samadiani, Chris Severn, Gregory Sizikov, Matthew Snelham, Jed Souter, Dan Steinberg, Andy Swing, Mercedes Tan, Gregory Thorson, Bo Tian, Horia Toma, Erick Tuttle, Vijay Vasudevan, Richard Walter, Walter Wang, Eric Wilcox, and Doe Hyun Yoon. 2017. In-Datacenter Performance Analysis of a Tensor Processing Unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA ’17). ACM, New York, NY, USA. 1–12. isbn:978-1-4503-4892-8 https://doi.org/10.1145/3079856.3080246

Digital Library

[38]

Kostis Kaffes, Dragos Sbirlea, Yiyan Lin, David Lo, and Christos Kozyrakis. 2020. Leveraging Application Classes to Save Power in Highly-Utilized Data Centers. In Proceedings of the 11th ACM Symposium on Cloud Computing (SoCC ’20). Association for Computing Machinery, New York, NY, USA. 134–149. isbn:9781450381376 https://doi.org/10.1145/3419111.3421274

Digital Library

[39]

Y. Kang, Y. Kee, E. L. Miller, and C. Park. 2013. Enabling cost-effective data processing with smart SSD. In 2013 IEEE 29th Symposium on Mass Storage Systems and Technologies (MSST). 1–12.

[40]

Emilia Käsper and Peter Schwabe. 2009. Faster and timing-attack resistant AES-GCM. In International Workshop on Cryptographic Hardware and Embedded Systems. 1–17.

Digital Library

[41]

Ahmed Khawaja, Joshua Landgraf, Rohith Prakash, Michael Wei, Eric Schkufza, and Christopher J Rossbach. 2018. Sharing, Protection, and Compatibility for Reconfigurable Fabric with AmorphOS. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI’18). 107–127.

[42]

Gyuwan Kim, Hayoon Yi, Jangho Lee, Yunheung Paek, and Sungroh Yoon. 2016. LSTM-Based System-Call Language Modeling and Robust Ensemble Method for Designing Host-Based Intrusion Detection Systems. arxiv:cs.CR/1611.01726.

[43]

Jonghyeon Kim, Wonkyo Choe, and Jeongseob Ahn. 2021. Exploring the Design Space of Page Management for Multi-Tiered Memory Systems. In 2021 USENIX Annual Technical Conference (USENIX ATC 21). USENIX Association, 715–728. isbn:978-1-939133-23-6 https://www.usenix.org/conference/atc21/presentation/kim-jonghyeon

[44]

Joongi Kim, Keon Jang, Kyung A Lee, Sangwook Ma, Junhyun Shim, and Sunny Moon. 2015. NBA (network balancing act): A high-performance packet processing framework for heterogeneous processors. Proceedings of the 10th European Conference on Computer Systems, EuroSys 2015, 04, https://doi.org/10.1145/2741948.2741969

Digital Library

[45]

Paul Kocher, Jann Horn, Anders Fogh, Daniel Genkin, Daniel Gruss, Werner Haas, Mike Hamburg, Moritz Lipp, Stefan Mangard, and Thomas Prescher. 2019. Spectre attacks: Exploiting speculative execution. In 2019 IEEE Symposium on Security and Privacy (SP). 1–19.

[46]

Tim Kraska, Alex Beutel, Ed H. Chi, Jeffrey Dean, and Neoklis Polyzotis. 2018. The Case for Learned Index Structures. In Proceedings of the 2018 International Conference on Management of Data (SIGMOD ’18). Association for Computing Machinery, New York, NY, USA. 489–504. isbn:9781450347037 https://doi.org/10.1145/3183713.3196909

Digital Library

[47]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1 (NIPS’12). Curran Associates Inc., Red Hook, NY, USA. 1097–1105.

Digital Library

[48]

Patrick Kutch. 2011. PCI-SIG SR-IOV Primer: An introduction to SR-IOV Technology. Intel application note, 321211–002.

[49]

Arezki Laga, Jalil Boukhobza, Michel Koskas, and Frank Singhoff. 2016. Lynx: a learning linux prefetching mechanism for SSD performance model. In 2016 5th Non-Volatile Memory Systems and Applications Symposium (NVMSA). 1–6. https://doi.org/10.1109/NVMSA.2016.7547186

[50]

Arezki Laga, Jalil Boukhobza, Michel Koskas, and Frank Singhoff. 2016. Lynx: a learning linux prefetching mechanism for SSD performance model. In 2016 5th Non-Volatile Memory Systems and Applications Symposium (NVMSA). 1–6. https://doi.org/10.1109/NVMSA.2016.7547186

[51]

A. Laga, J. Boukhobza, M. Koskas, and F. Singhoff. 2016. Lynx: a learning linux prefetching mechanism for SSD performance model. In 2016 5th Non-Volatile Memory Systems and Applications Symposium (NVMSA). 1–6.

[52]

Joshua Landgraf, Tiffany Yang, Will Lin, Christopher J. Rossbach, and Eric Schkufza. 2021. Compiler-Driven FPGA Virtualization with SYNERGY. In Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’21). Association for Computing Machinery, New York, NY, USA. 818–831. isbn:9781450383172 https://doi.org/10.1145/3445814.3446755

Digital Library

[53]

W. Lin, C. Tu, C. Yeh, and S. Hung. 2017. GPU acceleration for Kernel Samepage Merging. In 2017 IEEE 23rd International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA). 1–6.

[54]

Ming Liu, Zhi Xue, Xianghua Xu, Changmin Zhong, and Jinjun Chen. 2018. Host-Based Intrusion Detection System with System Calls: Review and Future Trends. ACM Comput. Surv., 51, 5 (2018), Article 98, Nov., 36 pages. issn:0360-0300 https://doi.org/10.1145/3214304

Digital Library

[55]

Jean-Pierre Lozi, Baptiste Lepers, Justin Funston, Fabien Gaud, Vivien Quéma, and Alexandra Fedorova. 2016. The Linux Scheduler: A Decade of Wasted Cores. In Proceedings of the Eleventh European Conference on Computer Systems (EuroSys ’16). Association for Computing Machinery, New York, NY, USA. Article 1, 16 pages. isbn:9781450342407 https://doi.org/10.1145/2901318.2901326

Digital Library

[56]

Martin Maas, David G. Andersen, Michael Isard, Mohammad Mahdi Javanmard, Kathryn S. McKinley, and Colin Raffel. 2020. Learning-Based Memory Allocation for C++ Server Workloads. Association for Computing Machinery, New York, NY, USA. 541–556. isbn:9781450371025 https://doi.org/10.1145/3373376.3378525

Digital Library

[57]

Hasan Al Maruf and Mosharaf Chowdhury. 2020. Effectively Prefetching Remote Memory with Leap. In 2020 USENIX Annual Technical Conference (USENIX ATC 20). USENIX Association, 843–857. isbn:978-1-939133-14-4 https://www.usenix.org/conference/atc20/presentation/al-maruf

[58]

Mitesh R. Meswani, Sergey Blagodurov, David Roberts, John Slice, Mike Ignatowski, and Gabriel H. Loh. 2015. Heterogeneous memory architectures: A HW/SW approach for mixing die-stacked and off-package memories. In 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA). 126–136. https://doi.org/10.1109/HPCA.2015.7056027

[59]

Atul Negi and P Kishore Kumar. 2005. Applying machine learning techniques to improve linux process scheduling. In TENCON 2005-2005 IEEE Region 10 Conference. 1–6.

[60]

Rajiv Nishtala, Paul Carpenter, Vinicius Petrucci, and Xavier Martorell. 2017. Hipster: Hybrid Task Manager for Latency-Critical Cloud Workloads. In 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA). 409–420. https://doi.org/10.1109/HPCA.2017.13

[61]

Marco Nobile, Paolo Cazzaniga, Andrea Tangherloni, and Daniela Besozzi. 2016. Graphics processing units in bioinformatics, computational biology and systems biology. Briefings in Bioinformatics, 18 (2016), 07, bbw058. https://doi.org/10.1093/bib/bbw058

[62]

Amy Ousterhout, Joshua Fried, Jonathan Behrens, Adam Belay, and Hari Balakrishnan. 2019. Shenango: Achieving High CPU Efficiency for Latency-sensitive Datacenter Workloads. In 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI 19). USENIX Association, Boston, MA. 361–378. isbn:978-1-931971-49-2 https://www.usenix.org/conference/nsdi19/presentation/ousterhout

Digital Library

[63]

Sébastien Pinneterre, Spyros Chiotakis, Michele Paolino, and Daniel Raho. 2018. vFPGAmanager: A Virtualization Framework for Orchestrated FGPA Accelerator Sharing in 5G Cloud Environments. In 2018 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB). 1–5.

[64]

I. Pratt and K. Fraser. 2001. Arsenic: a user-accessible gigabit Ethernet interface. In Proceedings IEEE INFOCOM 2001. Conference on Computer Communications. Twentieth Annual Joint Conference of the IEEE Computer and Communications Society (Cat. No.01CH37213). 1, 67–76 vol.1.

[65]

Yiming Qiu, Hongyi Liu, Thomas Anderson, Yingyan Lin, and Ang Chen. 2021. Toward Reconfigurable Kernel Datapaths with Learned Optimizations. In Proceedings of the Workshop on Hot Topics in Operating Systems (HotOS ’21). Association for Computing Machinery, New York, NY, USA. 175–182. isbn:9781450384384 https://doi.org/10.1145/3458336.3465288

Digital Library

[66]

Rajat Raina, Anand Madhavan, and Andrew Y. Ng. 2009. Large-Scale Deep Unsupervised Learning Using Graphics Processors. In Proceedings of the 26th Annual International Conference on Machine Learning (ICML ’09). Association for Computing Machinery, New York, NY, USA. 873–880. isbn:9781605585161 https://doi.org/10.1145/1553374.1553486

Digital Library

[67]

Carlos Reaño, Antonio J Peña, Federico Silla, José Duato, Rafael Mayo, and Enrique S Quintana-Ortí. 2012. CU2rCU: Towards the Complete rCUDA Remote GPU Virtualization and Sharing Solution. In 2012 19th International Conference on High Performance Computing. 1–10.

[68]

Christopher J Rossbach, Jon Currey, Mark Silberstein, Baishakhi Ray, and Emmett Witchel. 2011. PTask: operating system abstractions to manage GPUs as compute devices. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles. 233–248.

Digital Library

[69]

Lin Shi, Hao Chen, Jianhua Sun, and Kenli Li. 2012. vCUDA: GPU-Accelerated High-Performance Computing in Virtual Machines. IEEE Trans. Comput., 61, 6 (2012), 804–816.

Digital Library

[70]

Elizabeth Shriver, Christopher Small, and Keith A. Smith. 1999. Why Does File System Prefetching Work? In Proceedings of the Annual Conference on USENIX Annual Technical Conference (ATEC ’99). USENIX Association, USA. 6.

[71]

Warren Smith, Ian Foster, and Valerie Taylor. 2006. Predicting application run times using historical information. 64, 122–142. https://doi.org/10.1007/BFb0053984

[72]

Pavel Sukharev, Dmitry Silnov, and Maxim Shishkin. 2019. Determining Optimal Mining Work Size on the OpenCL Platform for the Ethereum Cryptocurrency. International Journal on Advanced Science, Engineering and Information Technology, 9 (2019), 10, 1528. https://doi.org/10.18517/ijaseit.9.5.5820

[73]

Weibin Sun and Robert Ricci. 2013. Augmenting operating systems with the GPU. arXiv preprint arXiv:1305.3345.

[74]

Weibin Sun and Robert Ricci. 2013. Fast and flexible: parallel packet processing with GPUs and click. In Proceedings of the ninth ACM/IEEE symposium on Architectures for networking and communications systems. 25–36.

[75]

Weibin Sun, Robert Ricci, and Matthew L Curry. 2012. GPUstore: harnessing GPU computing for storage systems in the kernel. In Proceedings of the 5th Annual International Systems and Storage Conference. 9.

Digital Library

[76]

Sukanya Suranauwarat and Hideo Taniguchi. 2001. The Design, Implementation and Initial Evaluation of an Advanced Knowledge-Based Process Scheduler. SIGOPS Oper. Syst. Rev., 35, 4 (2001), Oct., 61–81. issn:0163-5980 https://doi.org/10.1145/506084.506090

Digital Library

[77]

Yusuke Suzuki, Shinpei Kato, Hiroshi Yamada, and Kenji Kono. 2014. GPUvm: Why not virtualizing GPUs at the hypervisor? In 2014 USENIX Annual Technical Conference (USENIX ATC 14). 109–120.

Digital Library

[78]

Kun Tian, Yaozu Dong, and David Cowperthwaite. 2014. A Full GPU Virtualization Solution with Mediated Pass-Through. In 2014 USENIX Annual Technical Conference (USENIX ATC’14). 121–132.

[79]

Stavros Volos, Kapil Vaswani, and Rodrigo Bruno. 2018. Graviton: Trusted Execution Environments on GPUs. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18). USENIX Association, Carlsbad, CA. 681–696. isbn:978-1-939133-08-3 https://www.usenix.org/conference/osdi18/presentation/volos

Digital Library

[80]

T. von Eicken, A. Basu, V. Buch, and W. Vogels. 1995. U-Net: A User-Level Network Interface for Parallel and Distributed Computing. In Proceedings of the Fifteenth ACM Symposium on Operating Systems Principles (SOSP ’95). Association for Computing Machinery, New York, NY, USA. 40–53. isbn:0897917154 https://doi.org/10.1145/224056.224061

Digital Library

[81]

Lan Vu, Hari Sivaraman, and Rishi Bidarkar. 2014. GPU Virtualization for High Performance General Purpose Computing on the ESX Hypervisor. In Proceedings of the High Performance Computing Symposium (HPC ’14). Society for Computer Simulation International, San Diego, CA, USA. Article 2, 8 pages. http://dl.acm.org/citation.cfm?id=2663510.2663512

Digital Library

[82]

Yawen Wang, Daniel Crankshaw, Neeraja J. Yadwadkar, Daniel Berger, Christos Kozyrakis, and Ricardo Bianchini. 2022. SOL: Safe on-Node Learning in Cloud Platforms. In Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2022). Association for Computing Machinery, New York, NY, USA. 622–634. isbn:9781450392051 https://doi.org/10.1145/3503222.3507704

Digital Library

[83]

Zhiyuan Xu, Jian Tang, Chengxiang Yin, Yanzhi Wang, and Guoliang Xue. 2019. Experience-Driven Congestion Control: When Multi-Path TCP Meets Deep Reinforcement Learning. IEEE Journal on Selected Areas in Communications, 37, 6 (2019), 1325–1336. https://doi.org/10.1109/JSAC.2019.2904358

[84]

Tsung Tai Yeh, Amit Sabne, Putt Sakdhnagool, Rudolf Eigenmann, and Timothy G Rogers. 2017. Pagoda: Fine-grained GPU Resource Virtualization for Narrow Tasks. In ACM SIGPLAN Notices. 52, 221–234.

Digital Library

[85]

Hangchen Yu, Arthur Michener Peters, Amogh Akshintala, and Christopher J. Rossbach. 2020. AvA: Accelerated Virtualization of Accelerators. Association for Computing Machinery, New York, NY, USA. 807–825. isbn:9781450371025 https://doi.org/10.1145/3373376.3378466

Digital Library

[86]

Xiao Zhang, Eric Tune, Robert Hagmann, Rohit Jnagal, Vrigo Gokhale, and John Wilkes. 2013. CPI2: CPU Performance Isolation for Shared Compute Clusters. In Proceedings of the 8th ACM European Conference on Computer Systems (EuroSys ’13). Association for Computing Machinery, New York, NY, USA. 379–391. isbn:9781450319942 https://doi.org/10.1145/2465351.2465388

Digital Library

[87]

Yiying Zhang and Yutong Huang. 2019. “Learned”: Operating Systems. SIGOPS Oper. Syst. Rev., July, 40–45. issn:0163-5980 https://doi.org/10.1145/3352020.3352027

Digital Library

Cited By

Cao XPatel SLim SHan XPasquier TBagchi SZhang Y(2024)FetchBPFProceedings of the 2024 USENIX Conference on Usenix Annual Technical Conference10.5555/3691992.3692014(369-378)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.5555/3691992.3692014
Gao MWang YShi QCao BXu Y(2024)vLFS: Learning-based Scheduling for AI Chip Virtualization2024 IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA)10.1109/ISPA63168.2024.00187(1393-1399)Online publication date: 30-Oct-2024
https://doi.org/10.1109/ISPA63168.2024.00187

Index Terms

Towards a Machine Learning-Assisted Kernel with LAKE

Recommendations

GPUfs: Integrating a file system with GPUs

As GPU hardware becomes increasingly general-purpose, it is quickly outgrowing the traditional, constrained GPU-as-coprocessor programming model. This article advocates for extending standard operating system services and abstractions to GPUs in order ...
M3: A Hardware/Operating-System Co-Design to Tame Heterogeneous Manycores
ASPLOS'16

In the last decade, the number of available cores increased and heterogeneity grew. In this work, we ask the question whether the design of the current operating systems (OSes) is still appropriate if these trends continue and lead to abundantly ...
Performance and toolchain of a combined GPU/FPGA desktop (abstract only)
FPGA '13: Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays

Low-power, high-performance computing nowadays relies on accelerator cards to speed up the calculations. Combining the power of GPUs with the flexibility of FPGAs enlarges the scope of problems that can be accelerated [2, 3]. We describe the performance ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ASPLOS 2023: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2

January 2023

947 pages

ISBN:9781450399166

DOI:10.1145/3575693

General Chair:
Tor M. Aamodt
University of British Columbia, Canada
,
Program Chairs:
Natalie Enright Jerger
University of Toronto, Canada
,
Michael Swift
University of Wisconsin-Madison, USA

Copyright © 2023 Owner/Author.

This work is licensed under a Creative Commons Attribution 4.0 International License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 January 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation
National Nuclear Security Administration
Texas Systems Research Consortium

Conference

ASPLOS '23

Sponsor:

ASPLOS '23: 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2

March 25 - 29, 2023

BC, Vancouver, Canada

Acceptance Rates

Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
2,991
Total Downloads

Downloads (Last 12 months)1,434
Downloads (Last 6 weeks)146

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Cao XPatel SLim SHan XPasquier TBagchi SZhang Y(2024)FetchBPFProceedings of the 2024 USENIX Conference on Usenix Annual Technical Conference10.5555/3691992.3692014(369-378)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.5555/3691992.3692014
Gao MWang YShi QCao BXu Y(2024)vLFS: Learning-based Scheduling for AI Chip Virtualization2024 IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA)10.1109/ISPA63168.2024.00187(1393-1399)Online publication date: 30-Oct-2024
https://doi.org/10.1109/ISPA63168.2024.00187

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten