Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

Accurate Data Race Prediction in the Linux Kernel through Sparse Fourier Learning

Published: 29 April 2024 Publication History

Abstract

Testing for data races in the Linux OS kernel is challenging because there is an exponentially large space of system calls and thread interleavings that can potentially lead to concurrent executions with races. In this work, we introduce a new approach for modeling execution trace feasibility and apply it to Linux OS Kernel race prediction. To address the fundamental scalability challenge posed by the exponentially large domain of possible execution traces, we decompose the task of predicting trace feasibility into independent prediction subtasks encoded as learning Boolean indicator functions for specific memory accesses, and apply a sparse fourier learning approach to learning each feasibility subtask.
Boolean functions that are sparse in their fourier domain can be efficiently learned by estimating the coefficients of their fourier expansion. Since the feasibility of each memory access depends on only a few other relevant memory accesses or system calls (e.g., relevant inter-thread communications), we observe that trace feasibility functions often have this sparsity property and can be learned efficiently. We use learned trace feasibility functions in conjunction with conservative alias analysis to implement a kernel race-testing system, HBFourier, that uses sparse fourier learning to efficiently model feasibility when making predictions. We evaluate our approach on a recent Linux development kernel and show it finds 44 more races with 15.7% more accurate race predictions than the next best performing system in our evaluation, in addition to identifying 5 new race bugs confirmed by kernel developers.

References

[1]
2015. Kernel panic due to race condition. https://access.redhat.com/solutions/1593553
[2]
2016. Dirty COW (CVE-2016-5195). https://dirtycow.ninja/
[3]
2022. Huawei Kernel Module Race Condition (CVE-2022-31758). https://nvd.nist.gov/vuln/detail/CVE-2022-31758
[4]
2022. An Introduction to Lockless Algorithms. https://lwn.net/Articles/844224/
[5]
2022. Kernel race exploit for Denial-of-Service (CVE-2022-1652). https://www.cvedetails.com/cve/CVE-2022-1652/
[6]
2022. Kernel race exploit leading to information leak, memory corruption (CVE-2022-3028). https://nvd.nist.gov/vuln/detail/CVE-2022-3028
[7]
2022. Syzkaller. https://github.com/google/syzkaller
[8]
2023. HBFourier Replication Artifact. https://doi.org/10.6084/m9.figshare.25365340.v1
[9]
2023. [PATCH] fix for blk-mq racy attribute. https://github.com/torvalds/linux/commit/49e60333d743ae32db3bdde2f93bc818482dd741
[10]
2023. [PATCH] Fix potential data race at PCM memory allocation helpers. https://github.com/torvalds/linux/commit/bd55842ed998a622ba6611fe59b3358c9f76773d
[11]
Sebastian Burckhardt, Pravesh Kothari, Madanlal Musuvathi, and Santosh Nagarakatte. 2010. A randomized scheduler with probabilistic guarantees of finding bugs. ACM SIGARCH Computer Architecture News, 38, 1 (2010), 167–178.
[12]
Yves Crama and Peter L Hammer. 2011. Boolean functions: Theory, algorithms, and applications. Cambridge University Press.
[13]
Cormac Flanagan and Stephen N. Freund. 2009. FastTrack: efficient and precise dynamic race detection. In Proceedings of the 2009 ACM SIGPLAN Conference on Programming Language Design and Implementation, Michael Hind and Amer Diwan (Eds.). https://doi.org/10.1145/1542476.1542490
[14]
Sishuai Gong, Deniz Altinbüken, Pedro Fonseca, and Petros Maniatis. 2021. Snowboard: Finding kernel concurrency bugs through systematic inter-thread communication analysis. In Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles. 66–83.
[15]
Jeff Huang, Patrick O’Neil Meredith, and Grigore Rosu. 2014. Maximal sound predictive race detection with control flow abstraction. In Proceedings of the 35th ACM SIGPLAN conference on programming language design and implementation. 337–348.
[16]
Dae R Jeong, Kyungtae Kim, Basavesh Shivakumar, Byoungyoung Lee, and Insik Shin. 2019. Razzer: Finding kernel race bugs through fuzzing. In 2019 IEEE Symposium on Security and Privacy (SP). 754–768.
[17]
Zu-Ming Jiang, Jia-Ju Bai, Kangjie Lu, and Shi-Min Hu. 2022. Context-Sensitive and Directional Concurrency Fuzzing for Data-Race Detection.
[18]
Dileep Kini, Umang Mathur, and Mahesh Viswanathan. 2017. Dynamic race prediction in linear time. ACM SIGPLAN Notices, 52, 6 (2017), 157–170.
[19]
Eyal Kushilevitz and Yishay Mansour. 1991. Learning decision trees using the Fourier spectrum. In Proceedings of the twenty-third annual ACM symposium on Theory of computing. 455–464.
[20]
Leslie Lamport. 1978. Time, Clocks, and the Ordering of Events in a Distributed System. Commun. ACM, 21, 7 (1978), 558–565. https://doi.org/10.1145/359545.359563
[21]
Yishay Mansour. 1994. Learning Boolean functions via the Fourier transform. Theoretical advances in neural computation and learning, 391–424.
[22]
Umang Mathur, Dileep Kini, and Mahesh Viswanathan. 2018. What happens-after the first race? enhancing the predictive power of happens-before based dynamic race detection. Proceedings of the ACM on Programming Languages, 2, OOPSLA (2018), 1–29.
[23]
Umang Mathur, Andreas Pavlogiannis, and Mahesh Viswanathan. 2021. Optimal prediction of synchronization-preserving races. Proceedings of the ACM on Programming Languages, 5, POPL (2021), 1–29.
[24]
Friedemann Mattern. 1989. Virtual time and global states of distributed systems. In Proc. Workshop on Parallel and Distributed Algorithms,.
[25]
Madan Musuvathi, Shaz Qadeer, and Thomas Ball. 2007. CHESS: A systematic testing tool for concurrent software. 16. https://www.microsoft.com/en-us/research/publication/chess-a-systematic-testing-tool-for-concurrent-software/
[26]
Robert Netzer and Barton P Miller. 1989. Detecting data races in parallel program executions. University of Wisconsin-Madison Department of Computer Sciences.
[27]
Robert HB Netzer and Barton P Miller. 1992. What are race conditions? Some issues and formalizations. ACM Letters on Programming Languages and Systems (LOPLAS), 1, 1 (1992), 74–88.
[28]
Ryan O’Donnell. 2014. Analysis of boolean functions. Cambridge University Press.
[29]
Andreas Pavlogiannis. 2019. Fast, sound, and effectively complete dynamic race prediction. Proceedings of the ACM on Programming Languages, 4, POPL (2019), 1–29.
[30]
Jake Roemer, Kaan Genç, and Michael D Bond. 2020. SmartTrack: efficient predictive race detection. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation. 747–762.
[31]
Gabriel Ryan, Abhishek Shah, Dongdong She, and Suman Jana. 2023. Precise Detection of Kernel Data Races with Probabilistic Lockset Analysis. In 2023 IEEE Symposium on Security and Privacy (SP).
[32]
Mahmoud Said, Chao Wang, Zijiang Yang, and Karem Sakallah. 2011. Generating data race witnesses by an SMT-based analysis. In NASA Formal Methods Symposium. 313–327.
[33]
Traian Florin Şerbănuţă, Feng Chen, and Grigore Roşu. 2012. Maximal causal models for sequentially consistent systems. In International Conference on Runtime Verification. 136–150.
[34]
Yannis Smaragdakis, Jacob Evans, Caitlin Sadowski, Jaeheon Yi, and Cormac Flanagan. 2012. Sound predictive race detection in polynomial time. In Proceedings of the 39th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2012, Philadelphia, Pennsylvania, USA, January 22-28, 2012, John Field and Michael Hicks (Eds.). ACM, 387–400. https://doi.org/10.1145/2103656.2103702
[35]
Peter Stobbe and Andreas Krause. 2012. Learning fourier sparse set functions. In Artificial Intelligence and Statistics. 1125–1133.
[36]
Mosaad Al Thokair, Minjian Zhang, Umang Mathur, and Mahesh Viswanathan. 2023. Dynamic Race Detection with O (1) Samples. Proceedings of the ACM on Programming Languages, 7, POPL (2023), 1308–1337.
[37]
David Wentzlaff and Anant Agarwal. 2009. Factored operating systems (fos) the case for a scalable operating system for multicores. ACM SIGOPS Operating Systems Review, 43, 2 (2009), 76–85.
[38]
Meng Xu, Sanidhya Kashyap, Hanqing Zhao, and Taesoo Kim. 2020. Krace: Data race fuzzing for kernel file systems. In 2020 IEEE Symposium on Security and Privacy (SP). 1643–1660.
[39]
Yi Zhang, Jianmei Guo, Eric Blais, and Krzysztof Czarnecki. 2015. Performance prediction of configurable software systems by fourier learning (t). In 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE). 365–373.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Programming Languages
Proceedings of the ACM on Programming Languages  Volume 8, Issue OOPSLA1
April 2024
1492 pages
EISSN:2475-1421
DOI:10.1145/3554316
Issue’s Table of Contents
This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 April 2024
Published in PACMPL Volume 8, Issue OOPSLA1

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. Data Race Prediction
  2. Linux Kernel Testing
  3. Sparse Fourier Learning

Qualifiers

  • Research-article

Funding Sources

  • NSF (National Science Foundation)
  • Google Faculty Gift

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 401
    Total Downloads
  • Downloads (Last 12 months)401
  • Downloads (Last 6 weeks)37
Reflects downloads up to 02 Feb 2025

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media