Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3670105.3670190acmotherconferencesArticle/Chapter ViewAbstractPublication PagescniotConference Proceedingsconference-collections
research-article
Open access

WebShell Detection Based on CodeBERT and Deep Learning Model

Published: 29 July 2024 Publication History

Abstract

The web shell attacks (WebShells) have long been a source of persistent annoyance for administrators. They have become a major security concern in cloud computing environments since the scalability and distributed nature of the cloud services could intensify the potential risks and impacts of such attacks. In response, researchers have proposed numerous strategies to shield assets from WebShell intrusions. Consequently, this study proposes a method that utilizes the BPE (Byte Pair Encoding) for tokenization, the CodeBERT for extracting the word embedding vector of a given source code piece, and a deep model (GRU or Bidirectional GRU) for determining whether the code contains a WebShell. This architecture is designed to detect the presence of WebShells in PHP code through analysis of the source code. Our experimental results indicate that, the proposed method with GRU achieves the best performance, with an accuracy of 99.72%, a precision of 99.36%, and an F1-score of 99.36%. Furthermore, it outperforms the methods proposed in the prior related studies as presented in the paper.

References

[1]
Acunetix. 2021. Spring 2021 Edition: Acunetix Web Vulnerability Report. Retrieved January 15, 2024, from https://www.acunetix.com/white-papers/acunetix-web-application-vulnerability-report-2021/.
[2]
Kaspersky. 2021. PHP language source code compromise attempt. Retrieved January 15, 2024, from https://www.kaspersky.com/blog/php-git-backdor/39191/.
[3]
Feng, Z., Guo, D., Tang, D., Duan, N., Feng, X., Gong, M., Shou, L., Qin, B., Liu, T., and Jiang, D. 2020. CodeBERT: A Pre-Trained Model for Programming and Natural Languages. arXiv preprint arXiv:2002.08155.
[4]
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv preprint arXiv:1810.04805.
[5]
Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., and Stoyanov, V. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv preprint arXiv:1907.11692.
[6]
Le, H. V., Nguyen, T. N., Nguyen, H. N., and Le, L. 2021. An Efficient Hybrid Webshell Detection Method for Webserver of Marine Transportation Systems. IEEE Transactions on Intelligent Transportation Systems.
[7]
D-shield. D-shield. Retrieved January 15, 2024, from https://www.d99net.net/
[8]
PHP-malware-finder. Retrieved January 15, 2024, from https://github.com/nbs-system/php-malware-finder
[9]
Sun, X., Lu, X., and Dai, H. 2017. A Matrix Decomposition based Webshell Detection Method. In Proceedings of the 2017 International Conference on Cryptography, Security and Privacy (ICCSP '17). Association for Computing Machinery, New York, NY, USA, 66–70.
[10]
Zhang, H., Liu, M., Yue, Z., Xue, Z., Shi, Y., and He, X. 2020. A PHP and JSP Web Shell Detection System with Text Processing Based on Machine Learning. In Proceedings of the 2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), Guangzhou, China, 1584-1591.
[11]
Zhu, T., Weng, Z., Fu, L., and Ruan, L. 2020. A Web Shell Detection Method Based on Multiview Feature Fusion. Applied Sciences, 10(18), 6274.
[12]
Cui, H., Huang, D., Fang, Y., Liu, L., and Huang, C. 2018. Webshell Detection Based on Random Forest–Gradient Boosting Decision Tree Algorithm. In Proceedings of the 2018 IEEE Third International Conference on Data Science in Cyberspace (DSC), Guangzhou, China, 153-160.
[13]
Zhang, Z., Li, M., Zhu, L., and Li, X. 2018. SmartDetect: A Smart Detection Scheme for Malicious Web Shell Codes via Ensemble Learning. In Qiu, M. (Ed.), Smart Computing and Communication. SmartCom 2018. Lecture Notes in Computer Science, vol. 11344. Springer, Cham.
[14]
Yong, B., Wei, W., Li, K.-C., Shen, J., Zhou, Q., Wozniak, M., Połap, D., and Damaševičius, R. 2022. Ensemble machine learning approaches for webshell detection in Internet of things environments. Transactions on Emerging Telecommunications Technologies, 33.
[15]
Ai, Z., Luktarhan, N., Zhou, A., and Lv, D. 2020. WebShell Attack Detection Based on a Deep Super Learner. Symmetry, 12(9), 1406.
[16]
Hannousse, A., Nait-Hamoud, M.C., and Yahiouche, S. 2023. A deep learner model for multi-language webshell detection. International Journal of Information Security, 22, 47–61.
[17]
Liu, X. 2024. SmartEagleEye: A Cloud-Oriented Webshell Detection System Based on Dynamic Gray-Box and Deep Learning. In Tsinghua Science and Technology, vol. 29, no. 3, pp. 766-783, June 2024.
[18]
Cheng, B., Guo, Y., Ren, Y., Yang, G., and Xu, G. 2022. MSDetector: A Static PHP Webshell Detection System Based on Deep-Learning. In Aït-Ameur, Y., Crăciun, F. (Eds.), Theoretical Aspects of Software Engineering. TASE 2022. Lecture Notes in Computer Science, vol. 13299. Springer, Cham.
[19]
An, T., Shui, X., and Gao, H. 2022. Deep Learning Based Webshell Detection Coping with Long Text and Lexical Ambiguity. In Alcaraz, C., Chen, L., Li, S., Samarati, P. (Eds.), Information and Communications Security. ICICS 2022. Lecture Notes in Computer Science, vol. 13407. Springer, Cham.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
CNIOT '24: Proceedings of the 2024 5th International Conference on Computing, Networks and Internet of Things
May 2024
668 pages
ISBN:9798400716751
DOI:10.1145/3670105
This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 July 2024

Check for updates

Author Tags

  1. BPE
  2. Bidirectional GRU
  3. CodeBERT
  4. GRU
  5. WebShell

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • NSTC

Conference

CNIOT 2024

Acceptance Rates

Overall Acceptance Rate 39 of 82 submissions, 48%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 77
    Total Downloads
  • Downloads (Last 12 months)77
  • Downloads (Last 6 weeks)49
Reflects downloads up to 04 Oct 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media