Abstract
Context
In many domains such as healthcare and banking, IT systems need to fulfill various requirements related to security. The elaboration of security requirements for a given system is in part guided by the controls envisaged by the applicable security standards and best practices. An important difficulty that analysts have to contend with during security requirements elaboration is sifting through a large number of security controls and determining which ones have a bearing on the security requirements for a given system. This challenge is often exacerbated by the scarce security expertise available in most organizations.
Objective
In this article, we develop automated decision support for the identification of security controls that are relevant to a specific system in a particular context.
Method and Results
Our approach, which is based on machine learning, leverages historical data from security assessments performed over past systems in order to recommend security controls for a new system. We operationalize and empirically evaluate our approach using real historical data from the banking domain. Our results show that, when one excludes security controls that are rare in the historical data, our approach has an average recall of ≈ 94% and average precision of ≈ 63%. We further examine through a survey the perceptions of security analysts about the usefulness of the classification models derived from historical data.
Conclusions
The high recall – indicating only a few relevant security controls are missed – combined with the reasonable level of precision – indicating that the effort required to confirm recommendations is not excessive – suggests that our approach is a useful aid to analysts for more efficiently identifying the relevant security controls, and also for decreasing the likelihood that important controls would be overlooked. Further, our survey results suggest that the generated classification models help provide a documented and explicit rationale for choosing the applicable security controls.
Similar content being viewed by others
References
Almeida L, Respício A (2018) Decision support for selecting information security controls. J Decis Syst 27(sup1):173–180
Batista G E A P A, Prati R C, Monard M C (2004) A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explorations Newsletter 6:20–29
Bettaieb S, Shin SY, Sabetzadeh M, Briand LC, Nou G, Garceau M (2019) Decision support for security-control identification using machine learning. In: Proceedings of the 25th international working conference on requirements engineering: Foundation for software quality (REFSQ’19), pp 3–20
Bishop C M (2007) Pattern recognition and machine learning. Information Science and Statistics. Springer, Berlin
Boutell M R, Luo J, Shen X, Brown C M (2004) Learning multi-label scene classification. Pattern Recogn 37:1757–1771
Breiman L, Friedman J, Stone CJ, Olshen R (1984) Classification and Regression Trees. Wadsworth International Group
Caralli R A, Stevens J F, Young L R, Wilson W R (2007) Introducing OCTAVE allegro: Improving the information security risk assessment process. Tech. rep CMU/SEI-2007-TR-012, SEI, Carnegie Mellon University
Casamayor A, Godoy D, Campo M R (2010) Identification of non-functional requirements in textual specifications: A semi-supervised learning approach. Inf Softw Technol (IST’10) 52(4):436–445
CASES (2018) Method for an optimised analysis of risks by @CASES-LU. https://www.monarc.lu, accessed September 2018
Chawla N V, Bowyer K W, Hall L O, Kegelmeyer W P (2002) SMOTE: Synthetic minority over-sampling technique. J Artif Intell Res (JAIR’02) 16:321–357
CLUSIF (2018) Method for harmonized analysis of risk. https://clusif.fr/mehari, accessed September 2018
Cohen WW (1995) Fast effective rule induction. In: Proceedings of the 12th international conference on machine learning (ICML’95), pp 115–123
Cyber Threat Institute (2019) Vector matrix - risk assessment methodology, security, impact. http://www.riskvector.com, accessed June 2019
Dalpiaz F, Paja E, Giorgini P (2016) Security requirements engineering: Designing secure socio-technical systems. MIT Press, Cambridge
Dowd M, McDonald J, Schuh J (2006) The art of software security assessment: Identifying and preventing software vulnerabilities. Pearson Education, London
Elkan C (2001) The foundations of cost-sensitive learning. In: Proceedings of the 17th international joint conference on artificial intelligence (IJCAI’01), pp 973–978
Frank E, Witten IH (1998) Generating accurate rule sets without global optimization. In: Proceedings of the 15th international conference on machine learning (ICML’98), pp 144–151
Furnell S (2008) End-user security culture: A lesson that will never be learnt? Comput Fraud Secur 2008:6–9
Grinstein G, Trutschl M, Cvek U (2001) High-dimensional visualizations. In: Proceedings of the visual data mining workshop (KDD’01), pp 120–134
Haley C B, Laney R C, Moffett J D, Nuseibeh B (2008) Security requirements engineering: A framework for representation and analysis. IEEE Trans Softw Eng (TSE’08) 34(1):133–153
Hall M A, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten I H (2009) The WEKA data mining software: An update. ACM SIGKDD Explorations Newsletter 11:10–18
Ionita D, Wieringa RJ (2016) Web-based collaborative security requirements elicitation. In: Joint proceedings of REFSQ-2016 workshops, Doctoral symposium, Research method track, and poster track co-located with the 22nd international working conference on requirements engineering: Foundation for software quality (REFSQ Workshops’16), pp 3–6
ISACA (2018) Framework for it governance and control. http://www.isaca.org/Knowledge-Center/cobit/Pages/Overview.aspx, accessed June 2018
ISO (2018) ISO 31000 - risk management. ISO Standard, London
ISO and IEC (2005) ISO/IEC 27002:2005 code of practice for information security controls. ISO Standard, London
ISO and IEC (2018) ISO/IEC 27000:2018 information security management systems. ISO Standard, London
John GH, Langley P (1995) Estimating continuous distributions in bayesian classifiers. In: Proceedings of the 11th annual conference on uncertainty in artificial intelligence (UAI’95), pp 338–345
Jufri MT, Hendayun M, Suharto T (2017) Risk-assessment based academic information system security policy using OCTAVE Allegro and ISO 27002. In: Proceedings of the 2nd international conference on informatics and computing (ICIC’17), pp 1–6
Kiesling E, Ekelhart A, Grill B, Strauss C, Stummer C (2016) Selecting security control portfolios: A multi-objective simulation-optimization approach. EURO J Decision Process 4:85–117
Kitchenham B A, Pfleeger S L (2002) Principles of survey research: Part 3: Constructing a survey instrument. ACM SIGSOFT Software Engineering Notes 27 (2):20–24
Kurtanović Z, Maalej W (2017) Mining user rationale from software reviews. In: Proceedings of the 25th IEEE international conference on requirements engineering (RE’17), pp 61–70
le Cessie S, van Houwelingen JC (1992) Ridge estimators in logistic regression. Appl Stat 41(1):191–201
Li T (2017) Identifying security requirements based on linguistic analysis and machine learning. In: Proceedings of the 24th Asia-Pacific software engineering conference (APSEC’17), pp 388–397
Likert R (1932) A technique for the measurement of attitudes. Arch Psychol 22 (140):5–55
Meier J, Mackman A, Vasireddy S, Dunner M, Escamilla R, Murukan A (2003) Improving web application security: Threats and countermeasures, Tech. rep., Microsoft
Mitchell T M (1999) Machine learning and data mining. Commun ACM 42 (11):30–36
Myagmar S, Lee AJ, Yurcik W (2005) Threat modeling as a basis for security requirements. In: Proceedings of the IEEE symposium on requirements engineering for information security (SREIS’05), pp 1–8
NIST (2012) NIST special publication 800-30: Guide for conducting risk assessments. NIST Standard, Gaithersburg
OSA (2018) Open security architecture. http://www.opensecurityarchitecture.org, accessed September 2018
Park S, Fürnkranz J (2007) Efficient pairwise classification. In: Proceedings of the 18th European conference on machine learning (ECML’07), pp 658–665
Quinlan J R (1986) Induction of decision trees. Mach Learn 1(1):81–106
Quinlan JR (1993) C4.5: Programs for machine learning. Morgan Kaufmann, Massachusetts
Read J, Pfahringer B, Holmes G, Frank E (2009) Classifier chains for multi-label classification. In: Proceedings of the 2009 joint European conference on machine learning and knowledge discovery in databases (ECML PKDD’09), pp 254–269
Rodeghero P, Jiang S, Armaly A, McMillan C (2017) Detecting user story information in developer-client conversations to generate extractive summaries. In: Proceedings of the 39th international conference on software engineering (ICSE’17), pp 49–59
Rogers EM (2003) Diffusion of innovations, 5th edn. Free Press, New York
Schmitt C, Liggesmeyer P (2015) A model for structuring and reusing security requirements sources and security requirements. In: Joint proceedings of REFSQ-2015 workshops, doctoral symposium, research method track, and poster track co-located with the 21st international working conference on requirements engineering: Foundation for software quality (REFSQ Workshops’15), pp 34–43
Sihwi SW, Andriyanto F, Anggrainingsih R (2016) An expert system for risk assessment of information system security based on ISO 27002. In: Proceedings of the 2016 IEEE international conference on knowledge engineering and applications (ICKEA’16), pp 56–61
Sindre G, Opdahl A L (2005) Eliciting security requirements with misuse cases. Requir Eng 10:34–44
Tsoumakas G, Vlahavas IP (2007) Random k-labelsets: An ensemble method for multilabel classification. In: Proceedings of the 18th European conference on machine learning (ECML’07), pp 406–417
Türpe S (2017) The trouble with security requirements. In: Proceedings of the 25th IEEE international conference on requirements engineering (RE’17), pp 122–133
Wilson D L (1972) Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans Syst Man Cybern 2(3):408–421
Yevseyeva I, Basto-Fernandes V, Emmerich M, van Moorsel A (2015) Selecting optimal subset of security controls. Procedia Comput Sci 64:1035–1042
Yevseyeva I, Basto-Fernandes V, van Moorsel A, Janicke H, Emmerich M (2016) Two-stage security controls selection. Procedia Comput Sci 100:971–978
Yu Y, Franqueira V N, Tun T T, Wieringa R J, Nuseibeh B (2015) Automated analysis of security requirements through risk-based argumentation. J Syst Softw (JSS’15) 106:102–116
Zhang M, Zhou Z (2014) A review on multi-label learning algorithms. IEEE Trans Knowl Data Eng (TKDE’14) 26(8):1819–1837
Acknowledgements
Financial support for this work was provided by the Alphonse Weicker Foundation.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Eric Knauss, Michael Goedicke and Paul Grünbacher
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article belongs to the Topical Collection: Requirements Engineering for Software Quality (REFSQ)
Rights and permissions
About this article
Cite this article
Bettaieb, S., Shin, S.Y., Sabetzadeh, M. et al. Using machine learning to assist with the selection of security controls during security assessment. Empir Software Eng 25, 2550–2582 (2020). https://doi.org/10.1007/s10664-020-09814-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10664-020-09814-x