Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3649158.3657032acmconferencesArticle/Chapter ViewAbstractPublication PagessacmatConference Proceedingsconference-collections
research-article

Pairing Human and Artificial Intelligence: Enforcing Access Control Policies with LLMs and Formal Specifications

Published: 25 June 2024 Publication History

Abstract

Large Language Models (LLMs), such as ChatGPT and Google Bard, have performed interestingly well when assisting developers on computer programming tasks, a.k.a., coding, thus potentially resulting in convenient and faster software constructions. This new approach significantly enhances efficiency but also presents challenges in unsupervised code construction with limited security guarantees. LLMs excel in producing code with accurate grammar, yet they are not specifically trained to guarantee the security of the code. In this paper, we provide an initial exploration into using formal software specifications as a starting point for software construction, allowing developers to translate descriptions of security-related behavior into natural language instructions for LLMs, a.k.a., prompts. In addition, we leveraged automated verification tools to evaluate the code produced against the aforementioned specifications, following a modular, step-by-step software construction process. For our study, we leveraged Role-based Access Control (RBAC), a mature security model, and the Java Modeling Language (JML), a behavioral specification language for Java. We test our approach on different publicly-available LLMs, namely, OpenAI ChatGPT 4.0, Google Bard, and Microsoft CoPilot. We provide a description of two applications-a security-sensitive Banking application employing RBAC and an RBAC API module itself-, the corresponding JML specifications, as well as a description of the prompts, the generated code, the verification results, as well as a series of interesting insights for practitioners interested in further exploring the use of LLMs for securely constructing applications.

References

[1]
Reed Albergotti and Louise Matsakis. 2023. Openai has hired an army of contractors to make basic coding obsolete. Semafor. https://www.semafor.com/art icle/01/27/2023/openai-has-hired-an-army-of-contractors-to-make-basic-c oding-obsolete. [Online; accessed June-9--2023]. (2023).
[2]
Bernhard Beckert, Oliver Denninger, Jonas Klamroth, Max Scheerer, and Jöerg Henß. 2023. Formal Verification of Complex Software Systems-A Study. Agentur für Innovation in der Cybersicherheit GmbH, (Ed.) https://www.cyberagen tur.de/wp-content/uploads/2023/07/OevIT-Vorstudien-Los-1.pdf. (2023).
[3]
Thorsten Brants, Ashok C. Popat, Peng Xu, Franz J. Och, and Jeffrey Dean. 2007. Large language models in machine translation. In Proc. of the 2007 Joint Conf. on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL). Prague, Czech Republic, (June 2007), 858--867.
[4]
Lilian Burdy, Yoonsik Cheon, David Cok, Michael D. Ernst, Joe Kiniry, Gary T. Leavens, K. Rustan, M. Leino, and Erik Poll. 2003. An overview of jml tools and applications1 1www.jmlspecs.org. Electronic Notes in Theoretical Computer Science, 80, 75--91. Eighth International Workshop on Formal Methods for Industrial Critical Systems (FMICS'03).
[5]
Xinyun Chen, Maxwell Lin, Nathanael Schärli, and Denny Zhou. 2023. Teaching large language models to self-debug. arXiv preprint arXiv:2304.05128.
[6]
Liying Cheng, Xingxuan Li, and Lidong Bing. 2023. Is gpt-4 a good data analyst? arXiv preprint arXiv:2305.15038.
[7]
Yoonsik Cheon. 2007. Automated random testing to detect specification-code inconsistencies. In Proceedings of the 2007 International Conference on Software Engineering Theory and Practice. Orlando, Florida, U.S.A.
[8]
Yoonsik Cheon, Gary Leavens, Murali Sitaraman, and Stephen Edwards. 2005. Model variables: cleanly supporting abstraction in design by contract. Softw., Pract. Exper., 35, (May 2005), 583--599.
[9]
Brian Chess and Jacob West. 2007. Secure Programming with Static Analysis. (First ed.). Addison-Wesley Professional. isbn: 9780321424778.
[10]
David R. Cok. 2014. OpenJML: software verification for Java 7 using JML, OpenJDK, and Eclipse. Electronic Proceedings in Theoretical Computer Science, 149, (Apr. 2014), 79--92.
[11]
Michael D. Ernst, Jeff H. Perkins, Philip J. Guo, Stephen McCamant, Carlos Pacheco, Matthew S. Tschantz, and Chen Xiao. 2007. The daikon system for dynamic detection of likely invariants. Science of Computer Programming, 69, 1, 35--45. Special issue on Experimental Software and Toolkits. .org/10.1016/j.scico.2007.01.015.
[12]
David F. Ferraiolo, Ravi Sandhu, Serban Gavrila, D. Richard Kuhn, and Ramaswamy Chandramouli. 2001. Proposed NIST standard for role-based access control. ACM Trans. Inf. Syst. Secur., 4, 3, (Aug. 2001), 224--274. 1978.501980.
[13]
Cormac Flanagan and K. Rustan M. Leino. 2001. Houdini, an annotation assistant for esc/java. In FME 2001: Formal Methods for Increasing Software Productivity. José Nuno Oliveira and Pamela Zave, (Eds.) Springer Berlin Heidelberg, Berlin, Heidelberg, 500--517. isbn: 978--3--540--45251--5.
[14]
Cormac Flanagan, K. Rustan M. Leino, Mark Lillibridge, Greg Nelson, James B. Saxe, and Raymie Stata. 2002. Extended static checking for java. In Proc. of the ACMSIGPLAN 2002 Conf. on Programming Language Design and Implementation (PLDI '02). Berlin, Germany, 234--245. isbn: 1581134630. 12558.
[15]
Mahadevan Ganapathi, Charles N Fischer, and John L Hennessy. 1982. Retargetable compiler code generation. ACM Computing Surveys (CSUR), 14, 4, 573--592.
[16]
Gunnet Kaur. 2023. How to improve your coding skills using chatgpt. Coin- Telegraph. https://cointelegraph.com/news/how-to-improve-your-coding-s kills-using-chatgpt. [Online; accessed June-9--2023]. (2023).
[17]
Jiawei Liu, Chunqiu Steven Xia, Yuyao Wang, and Lingming Zhang. 2023. Is your code generated by chatgpt really correct? rigorous evaluation of large language models for code generation. (2023). arXiv: 2305.01210 [cs.SE].
[18]
Stephen MacNeil, Andrew Tran, Dan Mogil, Seth Bernstein, Erin Ross, and Ziheng Huang. 2022. Generating diverse code explanations using the gpt-3 large language model. In Proceedings of the 2022 ACM Conference on International Computing Education Research-Volume 2, 37--39.
[19]
Bertrand Meyer. 1992. Applying "design by contract". Computer, 25, 10, (Oct. 1992), 40--51.
[20]
Bertrand Meyer. 2022. What do chatgpt and ai-based automatic program generation mean for the future of software. Blog entry-Commun. ACM, 65, 12, 5.
[21]
Tanveer Mustafa and Karsten Sohr. 2015. Understanding the implemented access control policy of android system services with slicing and extended static checking. Int. J. Inf. Secur., 14, 4, (Aug. 2015), 347--366. 7-014-0260-y.
[22]
Daniel Najafali, Justin M Camacho, Erik Reiche, Logan Galbraith, Shane D Morrison, and Amir H Dorafshar. 2023. Truth or lies? the pitfalls and limitations of chatgpt in systematic review creation. Aesthetic Surgery Journal, sjad093.
[23]
National Institute of Standards and Technology. 2020. Role-Based Access Control-RBAC. https://csrc.nist.gov/projects/role- based- access- contro l#economic-impact. [Online; accessed June-8--2023]. (2020).
[24]
Ansong Ni, Srini Iyer, Dragomir Radev, Veselin Stoyanov, Wen-Tau Yih, Sida Wang, and Xi Victoria Lin. 2023. LEVER: learning to verify language-to-code generation with execution. In Proceedings of the 40th International Conference on Machine Learning (Proceedings of Machine Learning Research). Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett, (Eds.) Vol. 202. PMLR, (July 2023), 26106--26128. https: //proceedings.mlr.press/v202/ni23b.html.
[25]
Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, and Caiming Xiong. 2023. Codegen: an open large language model for code with multi-turn program synthesis. (2023). arXiv: 2203.13474
[27]
Shishir G Patil, Tianjun Zhang, Xin Wang, and Joseph E Gonzalez. 2023. Gorilla: large language model connected with massive apis. arXiv preprint arXiv:2305.15334.
[28]
Mohammadreza Pourreza and Davood Rafiei. 2023. Din-sql: decomposed incontext learning of text-to-sql with self-correction. arXiv preprint arXiv:2304.11015.
[29]
Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever, et al. 2018. Improving language understanding by generative pre-training.
[30]
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. 2019. Language models are unsupervised multitask learners. OpenAI blog, 1, 8, 9.
[31]
Kevin Roose. 2022. The brilliance and weirdness of chatgpt. The New York Times. https://www.nytimes.com/2022/12/05/technology/chatgpt-ai-twitter .html. [Online; accessed June-9--2023]. (2022).
[32]
D. S. Rosenblum. 1995. A practical approach to programming with assertions. IEEE Transactions on Software Engineering, 21, 1, 19--31.
[33]
R. S. Sandhu, E. J. Coyne, H. L. Feinstein, and C. E. Youman. 1996. Role-based access control models. Computer, 29, 2, (Feb. 1996), 38--47.
[34]
Gustavo Sandoval, Hammond Pearce, Teo Nys, Ramesh Karri, Siddharth Garg, and Brendan Dolan-Gavitt. 2023. Lost at c: a user study on the security implications of large language model code assistants. In 32nd USENIX Security Symposium (USENIX Security 23). USENIX Association, Anaheim, CA, (Aug. 2023), 2205--2222. isbn: 978--1--939133--37--3. https://www.usenix.org/conferenc e/usenixsecurity23/presentation/sandoval.
[35]
Jiho Shin and Jaechang Nam. 2021. A survey of automatic code generation from natural language. Journal of Information Processing Systems, 17, 3, 537--555.
[36]
Varshini Subhash, Anna Bialas,Weiwei Pan, and Finale Doshi-Velez. 2023. Why do universal adversarial attacks work on large language models?: geometry might be the answer. (2023). arXiv: 2309.00254 [cs.LG].
[37]
Priyan Vaithilingam, Tianyi Zhang, and Elena L Glassman. 2022. Expectation vs. experience: evaluating the usability of code generation tools powered by large language models. In Chi conference on human factors in computing systems extended abstracts, 1--7.
[38]
Priyan Vaithilingam, Tianyi Zhang, and Elena L. Glassman. 2022. Expectation vs. experience: evaluating the usability of code generation tools powered by large language models. In Extended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems (CHI EA '22) Article 332. Association for Computing Machinery, New Orleans, LA, USA, 7 pages. isbn: 9781450391566.
[39]
Julien Vanegue and Shuvendu K. Lahiri. 2013. Towards practical reactive security audit using extended static checkers. In 2013 IEEE Symposium on Security and Privacy, 33--47.
[40]
Alexandros Vassiliades, Nick Bassiliades, and Theodore Patkos. 2021. Argumentation and explainable artificial intelligence: a survey. The Knowledge Engineering Review, 36, e5.
[41]
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Ed Chi, Quoc Le, and Denny Zhou. 2022. Chain of thought prompting elicits reasoning in large language models. arXiv preprint arXiv:2201.11903.
[42]
Denny Zhou et al. 2023. Least-to-most prompting enables complex reasoning in large language models. (2023). arXiv: 2205.10625 [cs.AI].

Index Terms

  1. Pairing Human and Artificial Intelligence: Enforcing Access Control Policies with LLMs and Formal Specifications

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        SACMAT 2024: Proceedings of the 29th ACM Symposium on Access Control Models and Technologies
        June 2024
        205 pages
        ISBN:9798400704918
        DOI:10.1145/3649158
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 25 June 2024

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. chatgpt
        2. formal specifications
        3. large language models
        4. prompt engineering
        5. software construction. java modeling language

        Qualifiers

        • Research-article

        Funding Sources

        Conference

        SACMAT 2024
        Sponsor:

        Acceptance Rates

        Overall Acceptance Rate 177 of 597 submissions, 30%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • 0
          Total Citations
        • 177
          Total Downloads
        • Downloads (Last 12 months)177
        • Downloads (Last 6 weeks)18
        Reflects downloads up to 13 Jan 2025

        Other Metrics

        Citations

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media