Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3611643.3613083acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
research-article
Open access

Getting pwn’d by AI: Penetration Testing with Large Language Models

Published: 30 November 2023 Publication History

Abstract

The field of software security testing, more specifically penetration testing, requires high levels of expertise and involves many manual testing and analysis steps. This paper explores the potential use of large-language models, such as GPT3.5, to augment penetration testers with AI sparring partners. We explore two distinct use cases: high-level task planning for security testing assignments and low-level vulnerability hunting within a vulnerable virtual machine. For the latter, we implemented a closed-feedback loop between LLM-generated low-level actions with a vulnerable virtual machine (connected through SSH) and allowed the LLM to analyze the machine state for vulnerabilities and suggest concrete attack vectors which were automatically executed within the virtual machine. We discuss promising initial results, detail avenues for improvement, and close deliberating on the ethics of AI sparring partners.

References

[1]
AIAAIC. 2023. AIAAIC Repository of incidents and controversies related to AI, algorithms and automation. https://www.aiaaic.org/
[2]
The Wassenaar Arrangement. 1982. The Wassenaar Arrangement on Export Controls for Conventional Arms and Dual-Use Goods and Technologies. https://www.wassenaar.org/
[3]
MITRE ATT&CK. 2020. Abuse Elevation Control Mechanism: Sudo and Sudo Caching. https://attack.mitre.org/techniques/T1548/003/
[4]
MITRE ATT&CK. 2020. Steal or Forge Kerberos Tickets: Kerberoasting. https://attack.mitre.org/techniques/T1558/003/
[5]
Edward Beeching, Younes Belkada, Kashif Rasul, Lewis Tunstall, Leandro von Werra, Nazneen Rajani, and Nathan Lambert. 2023. StackLLaMA: An RL Fine-tuned LLaMA Model for Stack Exchange Question and Answering. https://doi.org/10.57967/hf/0513
[6]
Erik Brynjolfsson. 2023. The turing trap: The promise & peril of human-like artificial intelligence. In Augmented Education in the Global Age. Routledge, 103–116.
[7]
Erik Brynjolfsson, Danielle Li, and Lindsey Raymond. 2023. Generative AI at Work. NBER Working Paper No. 31161. National Bureau of Economic Research, April.
[8]
Vit Bukac, Vaclav Lorenc, and Vashek Matyáš. 2014. Red queen’s race: APT win-win game. In Cambridge International Workshop on Security Protocols. 55–61.
[9]
Mike Conover, Matt Hayes, Ankit Mathur, Jianwei Xie, Jun Wan, Sam Shah, Ali Ghodsi, Patrick Wendell, Matei Zaharia, and Reynold Xin. 2023. Free Dolly: Introducing the World’s First Truly Open Instruction-Tuned LLM. https://www.databricks.com/blog/2023/04/12/dolly-first-open-commercially-viable-instruction-tuned-llm
[10]
Paul Denny, Viraj Kumar, and Nasser Giacaman. 2023. Conversing with Copilot: Exploring prompt engineering for solving CS1 problems using natural language. In Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1. 1136–1142.
[11]
The Economist. 2022. Huge foundation models are turbo-charging AI progress. https://www.economist.com/interactive/briefing/2022/06/11/huge-foundation-models-are-turbo-charging-ai-progress
[12]
The Economist. 2023. Large, creative AI models will transform lives and labour markets. https://www.economist.com/interactive/science-and-technology/2023/04/22/large-creative-ai-models-will-transform-how-we-live-and-work
[13]
Xinyang Geng, Arnav Gudibande, Hao Liu, Eric Wallace, Pieter Abbeel, Sergey Levine, and Dawn Song. 2023. Koala: A Dialogue Model for Academic Research. Blog post. https://bair.berkeley.edu/blog/2023/04/03/koala/
[14]
Georgi Gerganov. 2023. llama.cpp: Inference of LLaMA model in pure C/C++. https://github.com/ggerganov/llama.cpp
[15]
Significant Gravitas. 2023. Auto-GPT: An Autonomous GPT-4 Experiment. https://github.com/Significant-Gravitas/Auto-GPT
[16]
Andreas Happe and Jürgen Cito. 2023. Understanding Hackers’ Work: An Empirical Study of Offensive Security Practitioners. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2023). Association for Computing Machinery, New York, NY, USA. 11 pages.
[17]
Richard Harang and Felipe N Ducau. 2018. Measuring the speed of the Red Queen’s Race. BlackHat: Las Vegas, NV, USA.
[18]
(ISC)2. 2022. (ISC)2 CYBERSECURITY WORKFORCE STUDY 2022. https://www.isc2.org//-/media/ISC2/Research/2022-WorkForce-Study/ISC2-Cybersecurity-Workforce-Study.ashx
[19]
Sydney Lake. 2022. The cybersecurity industry is short 3.4 million workers—that’s good news for cyber wages. https://fortune.com/education/articles/the-cybersecurity-industry-is-short-3-4-million-workers-thats-good-news-for-cyber-wages/
[20]
Selena Larson and Daniel Blackford. 2021. Cobalt Strike: Favorite Tool from APT to Crimeware. https://www.proofpoint.com/us/blog/threat-insight/cobalt-strike-favorite-tool-apt-crimeware
[21]
lin.security. 2018. Lin.Security: 1. https://www.vulnhub.com/entry/linsecurity-1,244/
[22]
Vivian Liu and Lydia B Chilton. 2022. Design guidelines for prompt engineering text-to-image generative models. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 1–23.
[23]
Nestor Maslej, Loredana Fattorini, Erik Brynjolfsson, John Etchemendy, Katrina Ligett, Terah Lyons, James Manyika, Helen Ngo, Juan Carlos Niebles, Vanessa Parli, Yoav Shoham, Russell Wald, Jack Clark, and Raymond Perraul. 2023. The AI Index 2023 Annual Report. https://aiindex.stanford.edu/wp-content/uploads/2023/04/HAI_AI-Index-Report_2023.pdf
[24]
Ron Miller. 2023. Sam Altman: Size of LLMs won’t matter as much moving forward. https://techcrunch.com/2023/04/14/sam-altman-size-of-llms-wont-matter-as-much-moving-forward/
[25]
Yohei Nakajima. 2023. BabyAGI. https://github.com/yoheinakajima/babyagi
[26]
Yohei Nakajima. 2023. Introducing Task-driven Autonomous Agent. https://twitter.com/yoheinakajima/status/1640934493489070080
[27]
Yohei Nakajima. 2023. Task-driven Autonomous Agent Utilizing GPT-4, Pinecone, and LangChain for Diverse Applications. https://yoheinakajima.com/task-driven-autonomous-agent-utilizing-gpt-4-pinecone-and-langchain-for-diverse-applications/
[28]
Joon Sung Park, Joseph C. O’Brien, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, and Michael S. Bernstein. 2023. Generative Agents: Interactive Simulacra of Human Behavior. arxiv:2304.03442.
[29]
Baolin Peng, Michel Galley, Pengcheng He, Hao Cheng, Yujia Xie, Yu Hu, Qiuyuan Huang, Lars Liden, Zhou Yu, Weizhu Chen, and Jianfeng Gao. 2023. Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback. arxiv:2302.12813.
[30]
Carlos Polop. 2023. LinPEAS - Linux Privilege Escalation Awesome Script. https://github.com/carlospolop/PEASS-ng/tree/master/linPEAS
[31]
Katyanna Quach. 2023. LLaMA drama as Meta’s mega language model leaks. https://www.theregister.com/2023/03/08/meta_llama_ai_leak/
[32]
Kevin Schaul, Szu Yu Chean, and Nitasha Tiku. 2023. Inside the secret list of websites that make AI like ChatGPT sound smart. https://www.washingtonpost.com/technology/interactive/2023/ai-chatbot-learning/
[33]
Yongliang Shen, Kaitao Song, Xu Tan, Dongsheng Li, Weiming Lu, and Yueting Zhuang. 2023. HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in HuggingFace. arxiv:2303.17580.
[34]
Cybereason Global SOC and Incident Response Team. 2023. Sliver C2 Leveraged by Many Threat Actors. https://www.cybereason.com/blog/sliver-c2-leveraged-by-many-threat-actors
[35]
stability.ai. 2023. Stability AI Launches the First of its StableLM Suite of Language Models. https://stability.ai/blog/stability-ai-launches-the-first-of-its-stablelm-suite-of-language-models
[36]
Hendrik Strobelt, Albert Webson, Victor Sanh, Benjamin Hoover, Johanna Beyer, Hanspeter Pfister, and Alexander M Rush. 2022. Interactive and Visual Prompt Engineering for Ad-hoc Task Adaptation with Large Language Models. IEEE transactions on visualization and computer graphics, 29, 1 (2022), 1146–1156.
[37]
Blake E Strom, Andy Applebaum, Doug P Miller, Kathryn C Nickels, Adam G Pennington, and Cody B Thomas. 2018. Mitre att&ck: Design and philosophy. In Technical report. The MITRE Corporation.
[38]
Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian Borgeaud, Dani Yogatama, Maarten Bosma, Denny Zhou, Donald Metzler, Ed H. Chi, Tatsunori Hashimoto, Oriol Vinyals, Percy Liang, Jeff Dean, and William Fedus. 2022. Emergent Abilities of Large Language Models. arxiv:2206.07682.
[39]
Renrui Zhang, Jiaming Han, Aojun Zhou, Xiangfei Hu, Shilin Yan, Pan Lu, Hongsheng Li, Peng Gao, and Yu Qiao. 2023. Llama-adapter: Efficient fine-tuning of language models with zero-init attention. arXiv preprint arXiv:2303.16199.
[40]
Kaiyang Zhou, Jingkang Yang, Chen Change Loy, and Ziwei Liu. 2022. Learning to prompt for vision-language models. International Journal of Computer Vision, 130, 9 (2022), 2337–2348.

Cited By

View all
  • (2024)Challenges and Limitations of Using LLMs in Software SecurityApplication of Large Language Models (LLMs) for Software Vulnerability Detection10.4018/979-8-3693-9311-6.ch012(439-464)Online publication date: 18-Oct-2024
  • (2024)Comparative Analysis of LLMs vs. Traditional Methods in Vulnerability DetectionApplication of Large Language Models (LLMs) for Software Vulnerability Detection10.4018/979-8-3693-9311-6.ch009(335-374)Online publication date: 18-Oct-2024
  • (2024)Integration of LLMs With Traditional Security ToolsApplication of Large Language Models (LLMs) for Software Vulnerability Detection10.4018/979-8-3693-9311-6.ch008(295-334)Online publication date: 18-Oct-2024
  • Show More Cited By

Index Terms

  1. Getting pwn’d by AI: Penetration Testing with Large Language Models

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ESEC/FSE 2023: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering
    November 2023
    2215 pages
    ISBN:9798400703270
    DOI:10.1145/3611643
    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 30 November 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. large language models
    2. penetration testing
    3. security testing

    Qualifiers

    • Research-article

    Conference

    ESEC/FSE '23
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 112 of 543 submissions, 21%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)2,901
    • Downloads (Last 6 weeks)336
    Reflects downloads up to 22 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Challenges and Limitations of Using LLMs in Software SecurityApplication of Large Language Models (LLMs) for Software Vulnerability Detection10.4018/979-8-3693-9311-6.ch012(439-464)Online publication date: 18-Oct-2024
    • (2024)Comparative Analysis of LLMs vs. Traditional Methods in Vulnerability DetectionApplication of Large Language Models (LLMs) for Software Vulnerability Detection10.4018/979-8-3693-9311-6.ch009(335-374)Online publication date: 18-Oct-2024
    • (2024)Integration of LLMs With Traditional Security ToolsApplication of Large Language Models (LLMs) for Software Vulnerability Detection10.4018/979-8-3693-9311-6.ch008(295-334)Online publication date: 18-Oct-2024
    • (2024)Ethical Considerations in the Use of LLMs for Vulnerability DetectionApplication of Large Language Models (LLMs) for Software Vulnerability Detection10.4018/979-8-3693-9311-6.ch007(263-294)Online publication date: 18-Oct-2024
    • (2024)Performance Evaluation of LLM-Based Security SystemsApplication of Large Language Models (LLMs) for Software Vulnerability Detection10.4018/979-8-3693-9311-6.ch005(131-166)Online publication date: 18-Oct-2024
    • (2024)Techniques and Approaches for Leveraging LLMs in Security AnalysisApplication of Large Language Models (LLMs) for Software Vulnerability Detection10.4018/979-8-3693-9311-6.ch003(75-104)Online publication date: 18-Oct-2024
    • (2024)Foundations of Large Language Models in Software Vulnerability DetectionApplication of Large Language Models (LLMs) for Software Vulnerability Detection10.4018/979-8-3693-9311-6.ch002(41-74)Online publication date: 18-Oct-2024
    • (2024)Harnessing the Power of Large Language Models for CybersecurityApplication of Large Language Models (LLMs) for Software Vulnerability Detection10.4018/979-8-3693-9311-6.ch001(1-40)Online publication date: 18-Oct-2024
    • (2024)CIPHER: Cybersecurity Intelligent Penetration-Testing Helper for Ethical ResearcherSensors10.3390/s2421687824:21(6878)Online publication date: 26-Oct-2024
    • (2024)Digital Sentinels and Antagonists: The Dual Nature of Chatbots in CybersecurityInformation10.3390/info1508044315:8(443)Online publication date: 29-Jul-2024
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media