Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Challenges and Pitfalls in Generating Representative ICS Datasets in Cyber Security Research

  • Conference paper
  • First Online:
Computer Security. ESORICS 2022 International Workshops (ESORICS 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13785))

Included in the following conference series:

  • 1431 Accesses

Abstract

The increasing digitization and interconnection of Industrial Control Systems (ICS) to the Internet make them an attractive target for sophisticated attacks performed by experienced adversaries with high motivation and resources. As ICS incorporate decades-old devices and communication infrastructure, new generation embedded devices with computing capabilities and Ethernet-based communication protocols, the integration of proactive security protection methods is very challenging. Thus, a major line of research focuses on the development of reactive security solutions in the form of industrial intrusion detection systems (IIDS) aiming to detect anomalies in otherwise predictable “normal” ICS behavior. A crucial requirement for the assessment of the actual, real-world performance of these methods and their fair comparison is the existence of a representative dataset. Although the number of public ICS datasets increases gradually, it remains unclear to which extent these datasets can be considered as representative.

In this work, we identify key properties a given ICS dataset should own to be designated as representative based on typical IIDS evaluation scenarios. Our systematization of knowledge highlights that these properties are only partially represented in the existing public ICS datasets, which makes them unrepresentative, and shed light on the need for new datasets for IIDS evaluation. We further make a step into the direction of generating a representative dataset and present our ongoing work on the construction of a Hardware in the Loop tesbed of a real water distribution system. Our testbed replicates the operation of a real German medium-sized water supplier and allows for the collection of three different types of data sources, i.e., physical information, network data, and system logs. Our initial dataset contains more than 20 attacks targeting both at the distortion of the underlying physical process and at network- and system-based cyber attacks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    The first dataset contains two labels indicating attack and no attack. The second dataset contains three labels indicating attack, no attack, or normal failure. The third dataset includes additionally a separate label for each attack.

  2. 2.

    We omit the discussion of another group of gas pipeline and water storage tank datasets by Morris et al. as these datasets contain unintended patterns and, thus, are broken [20].

  3. 3.

    In our work, we categorize these attacks as getting-access attacks.

  4. 4.

    During the preparation of this work, the website [1], which the dataset is published on, was not reachable, which hindered the investigation of this dataset.

References

  1. Electra dataset: Anomaly detection ICS dataset. https://perception.inf.um.es/ICS-datasets/

  2. HAI (HIL-based Augmented ICS) Security Dataset. https://github.com/icsdataset/hai

  3. Siemens communications overview. https://snap7.sourceforge.net/siemens_comm.html

  4. Gómez, Á.L.P., et al.: On the generation of anomaly detection datasets in industrial control systems. IEEE Access 7 (2019)

    Google Scholar 

  5. Erba, A., et al.: Constrained concealment attacks against reconstruction-based anomaly detectors in industrial control systems. In: Annual Computer Security Applications Conference, ACSAC. ACM (2020)

    Google Scholar 

  6. Ahmed, C.M., et al.: WADI: a water distribution testbed for research in the design of secure cyber physical systems. In: 3rd International Workshop on Cyber-Physical Systems for Smart Water Networks, CySWATER. ACM (2017)

    Google Scholar 

  7. Myers, D., et al.: Anomaly detection for industrial control systems using process mining. Comput. Secur. 78 (2018)

    Google Scholar 

  8. Zizzo, G., et al.: Adversarial attacks on time-series intrusion detection for industrial control systems. In: 19th International Conference on Trust, Security and Privacy in Computing and Communications, TrustCom. IEEE (2020)

    Google Scholar 

  9. Shin, H.K., et al.: Implementation of programmable CPS testbed for anomaly detection. In: 12th Workshop on Cyber Security Experimentation and Test, CSET. USENIX Association (2019)

    Google Scholar 

  10. Shin, H.K., et al.: HAI 1.0: HIL-based augmented ICS security dataset. In: 13th Workshop on Cyber Security Experimentation and Test, CSET. USENIX Association (2020)

    Google Scholar 

  11. Shin, H.K., et al.: Two ICS security datasets and anomaly detection contest on the HIL-based augmented ICS testbed. In: 14th Workshop on Cyber Security Experimentation and Test., CSET. ACM (2021)

    Google Scholar 

  12. Giraldo, J., et al.: A survey of physics-based attack detection in cyber-physical systems. ACM Comput. Surv. 51(4) (2018)

    Google Scholar 

  13. Suaboot, J., et al.: A taxonomy of supervised learning for IDSs in SCADA environments. ACM Comput. Surv. 53(2) (2020)

    Google Scholar 

  14. Goh, J., Adepu, S., Junejo, K.N., Mathur, A.: A dataset to support research in the design of secure water treatment systems. In: Havarneanu, G., Setola, R., Nassopoulos, H., Wolthusen, S. (eds.) CRITIS 2016. LNCS, vol. 10242, pp. 88–99. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-71368-7_8

    Chapter  Google Scholar 

  15. Conti, M., et al.: A survey on industrial control system testbeds and datasets for security research. IEEE Commun. Surv. Tutor. 23(4) (2021)

    Google Scholar 

  16. Taormina, R., et al.: Battle of the attack detection algorithms: disclosing cyber attacks on water distribution networks. J. Water Resour. Plan. Manag. 144(8) (2018)

    Google Scholar 

  17. Taormina, R., et al.: A toolbox for assessing the impacts of cyber-physical attacks on water distribution systems. Environ. Model. Softw. 112 (2019)

    Google Scholar 

  18. Pan, S., et al.: Developing a hybrid intrusion detection system using data mining for power systems. IEEE Trans. Smart Grid 6(6) (2015)

    Google Scholar 

  19. Adepu, S., Kandasamy, N.K., Mathur, A.: EPIC: an electric power testbed for research and training in cyber physical systems security. In: Katsikas, S.K., et al. (eds.) SECPRE/CyberICPS 2018. LNCS, vol. 11387, pp. 37–52. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-12786-2_3

    Chapter  Google Scholar 

  20. Adhikari, U., et al.: Industrial Control System (ICS) Cyber Attack Datasets (2022). https://sites.google.com/a/uah.edu/tommy-morris-uah/ics-data-sets

  21. ATT &CK, M.: ICS Matrix. https://attack.mitre.org/matrices/ics/

  22. Choi, S., Yun, J.-H., Kim, S.-K.: A comparison of ICS datasets for security research based on attack paths. In: Luiijf, E., Žutautaitė, I., Hämmerli, B.M. (eds.) CRITIS 2018. LNCS, vol. 11260, pp. 154–166. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-05849-4_12

    Chapter  Google Scholar 

  23. Frazão, I., Abreu, P.H., Cruz, T., Araújo, H., Simões, P.: Denial of service attacks: detecting the frailties of machine learning algorithms in the classification process. In: Luiijf, E., Žutautaitė, I., Hämmerli, B.M. (eds.) CRITIS 2018. LNCS, vol. 11260, pp. 230–235. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-05849-4_19

    Chapter  Google Scholar 

  24. Hink, R.C.B., Goseva-Popstojanova, K.: Characterization of cyberattacks aimed at integrated industrial control and enterprise systems: a case study. In: 17th International Symposium on High Assurance Systems Engineering, HASE. IEEE (2016)

    Google Scholar 

  25. iTrust: Secure Water Treatment (SWaT) Testbed (2022). https://itrust.sutd.edu.sg/testbeds/secure-water-treatment-swat/

  26. Kravchik, M., Shabtai, A.: Efficient Cyber Attacks Detection in Industrial Control Systems Using Lightweight Neural Networks and PCA (2019)

    Google Scholar 

  27. Kus, D., et al.: A false sense of security? Revisiting the state of machine learning-based industrial intrusion detection. In: 8th Workshop on Cyber-Physical System Security, CPSS. ACM (2022)

    Google Scholar 

  28. Lemay, A., Fernandez, J.M.: Providing SCADA network data sets for intrusion detection research. In: 9th Workshop on Cyber Security Experimentation and Test, CSET. USENIX Association (2016)

    Google Scholar 

  29. Lounge, G.: Capture files from 4SICS Geek Lounge. https://www.netresec.com/?page=PCAP4SICS

  30. Mehner, S., Schuster, F., Hohlfeld, O.: Lights on power plant control networks. In: Hohlfeld, O., Moura, G., Pelsser, C. (eds.) PAM 2022. LNCS, vol. 13210, pp. 470–484. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-98785-5_21

    Chapter  Google Scholar 

  31. Microsoft: PowerShell Documentation. https://docs.microsoft.com/en-us/powershell/

  32. Ndonda, G.K., Sadre, R.: A Public Network Trace of a Control and Automation System. https://arxiv.org/pdf/1908.02118.pdf

  33. Peterson, D., Wightman, R.: S4x15 ICS Village PCAP Files. https://www.netresec.com/?page=DigitalBond_S4

  34. (PI), P.P.I.: Profibus

    Google Scholar 

  35. Pliatsios, D., et al.: A survey on SCADA systems: secure protocols, incidents, threats and tactics. IEEE Commun. Surv. Tutor. 22(3) (2020)

    Google Scholar 

  36. Rodofile, N.R.: DNP3 Cyber-attack datasets. https://github.com/qut-infosec/2017QUT_DNP3

  37. Rodofile, N.R.: SCADA network attack datasets and process logs. https://github.com/qut-infosec/2017QUT_S7comm

  38. Rodofile, N.R.: Generating attacks and labelling attack datasets for industrial control intrusion detection systems. Ph.D. thesis, Queensland University of Technology (2013)

    Google Scholar 

  39. Siemens: S7-400 Automation System, CPU Specifications (2009). https://cache.industry.siemens.com/dl/files/550/23904550/att_98310/v1/CPU_data_en_en-US.pdf?download=true

  40. Siemens: SIMULATIONUnit Manual (2022). https://cache.industry.siemens.com/dl/files/344/109475344/att_926827/v1/HelpEN.pdf

  41. Siemens: Software for the visualization of the future (2022). https://new.siemens.com/global/en/products/automation/simatic-hmi/wincc-unified/software.html

  42. Turrin, F., et al.: A statistical analysis framework for ICS process datasets. In: Joint Workshop on CPS &IoT Security and Privacy. ACM (2020)

    Google Scholar 

  43. Zemanek, S., et al.: PowerDuck: a GOOSE data set of cyberattacks in substations. In: 15th Workshop on Cyber Security Experimentation and Test, CSET. ACM (2022)

    Google Scholar 

Download references

Acknowledgments

This work has been funded by the German Federal Ministry of Education and Research (BMBF) under the project KISS_KI Simple & Scalable and the EU and state Brandenburg EFRE StaF project INSPIRE.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Asya Mitseva .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mitseva, A., Thierse, P., Hoffmann, H., Er, D., Panchenko, A. (2023). Challenges and Pitfalls in Generating Representative ICS Datasets in Cyber Security Research. In: Katsikas, S., et al. Computer Security. ESORICS 2022 International Workshops. ESORICS 2022. Lecture Notes in Computer Science, vol 13785. Springer, Cham. https://doi.org/10.1007/978-3-031-25460-4_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-25460-4_22

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-25459-8

  • Online ISBN: 978-3-031-25460-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics