Using machine learning to assist with the selection of security controls during security assessment

Bettaieb, Seifeddine; Shin, Seung Yeob; Sabetzadeh, Mehrdad; Briand, Lionel C.; Garceau, Michael; Meyers, Antoine

doi:10.1007/s10664-020-09814-x

Using machine learning to assist with the selection of security controls during security assessment

Published: 13 April 2020

Volume 25, pages 2550–2582, (2020)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

Seifeddine Bettaieb¹,
Seung Yeob Shin¹,
Mehrdad Sabetzadeh^1,2,
Lionel C. Briand^1,2,
Michael Garceau³ &
…
Antoine Meyers³

817 Accesses
6 Citations
Explore all metrics

Abstract

Context

In many domains such as healthcare and banking, IT systems need to fulfill various requirements related to security. The elaboration of security requirements for a given system is in part guided by the controls envisaged by the applicable security standards and best practices. An important difficulty that analysts have to contend with during security requirements elaboration is sifting through a large number of security controls and determining which ones have a bearing on the security requirements for a given system. This challenge is often exacerbated by the scarce security expertise available in most organizations.

Objective

In this article, we develop automated decision support for the identification of security controls that are relevant to a specific system in a particular context.

Method and Results

Our approach, which is based on machine learning, leverages historical data from security assessments performed over past systems in order to recommend security controls for a new system. We operationalize and empirically evaluate our approach using real historical data from the banking domain. Our results show that, when one excludes security controls that are rare in the historical data, our approach has an average recall of ≈ 94% and average precision of ≈ 63%. We further examine through a survey the perceptions of security analysts about the usefulness of the classification models derived from historical data.

Conclusions

The high recall – indicating only a few relevant security controls are missed – combined with the reasonable level of precision – indicating that the effort required to confirm recommendations is not excessive – suggests that our approach is a useful aid to analysts for more efficiently identifying the relevant security controls, and also for decreasing the likelihood that important controls would be overlooked. Further, our survey results suggest that the generated classification models help provide a documented and explicit rationale for choosing the applicable security controls.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Decision Support for Security-Control Identification Using Machine Learning

Using artificial intelligence to support compliance with the general data protection regulation

Article 01 September 2017

A Structured Approach to Risk Assessment of Machine Learning Applications

References

Almeida L, Respício A (2018) Decision support for selecting information security controls. J Decis Syst 27(sup1):173–180
Article Google Scholar
Batista G E A P A, Prati R C, Monard M C (2004) A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explorations Newsletter 6:20–29
Article Google Scholar
Bettaieb S, Shin SY, Sabetzadeh M, Briand LC, Nou G, Garceau M (2019) Decision support for security-control identification using machine learning. In: Proceedings of the 25th international working conference on requirements engineering: Foundation for software quality (REFSQ’19), pp 3–20
Bishop C M (2007) Pattern recognition and machine learning. Information Science and Statistics. Springer, Berlin
Google Scholar
Boutell M R, Luo J, Shen X, Brown C M (2004) Learning multi-label scene classification. Pattern Recogn 37:1757–1771
Article Google Scholar
Breiman L, Friedman J, Stone CJ, Olshen R (1984) Classification and Regression Trees. Wadsworth International Group
Caralli R A, Stevens J F, Young L R, Wilson W R (2007) Introducing OCTAVE allegro: Improving the information security risk assessment process. Tech. rep CMU/SEI-2007-TR-012, SEI, Carnegie Mellon University
Casamayor A, Godoy D, Campo M R (2010) Identification of non-functional requirements in textual specifications: A semi-supervised learning approach. Inf Softw Technol (IST’10) 52(4):436–445
Article Google Scholar
CASES (2018) Method for an optimised analysis of risks by @CASES-LU. https://www.monarc.lu, accessed September 2018
Chawla N V, Bowyer K W, Hall L O, Kegelmeyer W P (2002) SMOTE: Synthetic minority over-sampling technique. J Artif Intell Res (JAIR’02) 16:321–357
Article Google Scholar
CLUSIF (2018) Method for harmonized analysis of risk. https://clusif.fr/mehari, accessed September 2018
Cohen WW (1995) Fast effective rule induction. In: Proceedings of the 12th international conference on machine learning (ICML’95), pp 115–123
Cyber Threat Institute (2019) Vector matrix - risk assessment methodology, security, impact. http://www.riskvector.com, accessed June 2019
Dalpiaz F, Paja E, Giorgini P (2016) Security requirements engineering: Designing secure socio-technical systems. MIT Press, Cambridge
Google Scholar
Dowd M, McDonald J, Schuh J (2006) The art of software security assessment: Identifying and preventing software vulnerabilities. Pearson Education, London
Google Scholar
Elkan C (2001) The foundations of cost-sensitive learning. In: Proceedings of the 17th international joint conference on artificial intelligence (IJCAI’01), pp 973–978
Frank E, Witten IH (1998) Generating accurate rule sets without global optimization. In: Proceedings of the 15th international conference on machine learning (ICML’98), pp 144–151
Furnell S (2008) End-user security culture: A lesson that will never be learnt? Comput Fraud Secur 2008:6–9
Google Scholar
Grinstein G, Trutschl M, Cvek U (2001) High-dimensional visualizations. In: Proceedings of the visual data mining workshop (KDD’01), pp 120–134
Haley C B, Laney R C, Moffett J D, Nuseibeh B (2008) Security requirements engineering: A framework for representation and analysis. IEEE Trans Softw Eng (TSE’08) 34(1):133–153
Article Google Scholar
Hall M A, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten I H (2009) The WEKA data mining software: An update. ACM SIGKDD Explorations Newsletter 11:10–18
Article Google Scholar
Ionita D, Wieringa RJ (2016) Web-based collaborative security requirements elicitation. In: Joint proceedings of REFSQ-2016 workshops, Doctoral symposium, Research method track, and poster track co-located with the 22nd international working conference on requirements engineering: Foundation for software quality (REFSQ Workshops’16), pp 3–6
ISACA (2018) Framework for it governance and control. http://www.isaca.org/Knowledge-Center/cobit/Pages/Overview.aspx, accessed June 2018
ISO (2018) ISO 31000 - risk management. ISO Standard, London
Google Scholar
ISO and IEC (2005) ISO/IEC 27002:2005 code of practice for information security controls. ISO Standard, London
Google Scholar
ISO and IEC (2018) ISO/IEC 27000:2018 information security management systems. ISO Standard, London
Google Scholar
John GH, Langley P (1995) Estimating continuous distributions in bayesian classifiers. In: Proceedings of the 11th annual conference on uncertainty in artificial intelligence (UAI’95), pp 338–345
Jufri MT, Hendayun M, Suharto T (2017) Risk-assessment based academic information system security policy using OCTAVE Allegro and ISO 27002. In: Proceedings of the 2nd international conference on informatics and computing (ICIC’17), pp 1–6
Kiesling E, Ekelhart A, Grill B, Strauss C, Stummer C (2016) Selecting security control portfolios: A multi-objective simulation-optimization approach. EURO J Decision Process 4:85–117
Article Google Scholar
Kitchenham B A, Pfleeger S L (2002) Principles of survey research: Part 3: Constructing a survey instrument. ACM SIGSOFT Software Engineering Notes 27 (2):20–24
Article Google Scholar
Kurtanović Z, Maalej W (2017) Mining user rationale from software reviews. In: Proceedings of the 25th IEEE international conference on requirements engineering (RE’17), pp 61–70
le Cessie S, van Houwelingen JC (1992) Ridge estimators in logistic regression. Appl Stat 41(1):191–201
Article Google Scholar
Li T (2017) Identifying security requirements based on linguistic analysis and machine learning. In: Proceedings of the 24th Asia-Pacific software engineering conference (APSEC’17), pp 388–397
Likert R (1932) A technique for the measurement of attitudes. Arch Psychol 22 (140):5–55
Google Scholar
Meier J, Mackman A, Vasireddy S, Dunner M, Escamilla R, Murukan A (2003) Improving web application security: Threats and countermeasures, Tech. rep., Microsoft
Mitchell T M (1999) Machine learning and data mining. Commun ACM 42 (11):30–36
Article Google Scholar
Myagmar S, Lee AJ, Yurcik W (2005) Threat modeling as a basis for security requirements. In: Proceedings of the IEEE symposium on requirements engineering for information security (SREIS’05), pp 1–8
NIST (2012) NIST special publication 800-30: Guide for conducting risk assessments. NIST Standard, Gaithersburg
Google Scholar
OSA (2018) Open security architecture. http://www.opensecurityarchitecture.org, accessed September 2018
Park S, Fürnkranz J (2007) Efficient pairwise classification. In: Proceedings of the 18th European conference on machine learning (ECML’07), pp 658–665
Quinlan J R (1986) Induction of decision trees. Mach Learn 1(1):81–106
Google Scholar
Quinlan JR (1993) C4.5: Programs for machine learning. Morgan Kaufmann, Massachusetts
Google Scholar
Read J, Pfahringer B, Holmes G, Frank E (2009) Classifier chains for multi-label classification. In: Proceedings of the 2009 joint European conference on machine learning and knowledge discovery in databases (ECML PKDD’09), pp 254–269
Rodeghero P, Jiang S, Armaly A, McMillan C (2017) Detecting user story information in developer-client conversations to generate extractive summaries. In: Proceedings of the 39th international conference on software engineering (ICSE’17), pp 49–59
Rogers EM (2003) Diffusion of innovations, 5th edn. Free Press, New York
Google Scholar
Schmitt C, Liggesmeyer P (2015) A model for structuring and reusing security requirements sources and security requirements. In: Joint proceedings of REFSQ-2015 workshops, doctoral symposium, research method track, and poster track co-located with the 21st international working conference on requirements engineering: Foundation for software quality (REFSQ Workshops’15), pp 34–43
Sihwi SW, Andriyanto F, Anggrainingsih R (2016) An expert system for risk assessment of information system security based on ISO 27002. In: Proceedings of the 2016 IEEE international conference on knowledge engineering and applications (ICKEA’16), pp 56–61
Sindre G, Opdahl A L (2005) Eliciting security requirements with misuse cases. Requir Eng 10:34–44
Article Google Scholar
Tsoumakas G, Vlahavas IP (2007) Random k-labelsets: An ensemble method for multilabel classification. In: Proceedings of the 18th European conference on machine learning (ECML’07), pp 406–417
Türpe S (2017) The trouble with security requirements. In: Proceedings of the 25th IEEE international conference on requirements engineering (RE’17), pp 122–133
Wilson D L (1972) Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans Syst Man Cybern 2(3):408–421
Article MathSciNet Google Scholar
Yevseyeva I, Basto-Fernandes V, Emmerich M, van Moorsel A (2015) Selecting optimal subset of security controls. Procedia Comput Sci 64:1035–1042
Article Google Scholar
Yevseyeva I, Basto-Fernandes V, van Moorsel A, Janicke H, Emmerich M (2016) Two-stage security controls selection. Procedia Comput Sci 100:971–978
Article Google Scholar
Yu Y, Franqueira V N, Tun T T, Wieringa R J, Nuseibeh B (2015) Automated analysis of security requirements through risk-based argumentation. J Syst Softw (JSS’15) 106:102–116
Article Google Scholar
Zhang M, Zhou Z (2014) A review on multi-label learning algorithms. IEEE Trans Knowl Data Eng (TKDE’14) 26(8):1819–1837
Article Google Scholar

Download references

Acknowledgements

Financial support for this work was provided by the Alphonse Weicker Foundation.

Author information

Authors and Affiliations

SnT Centre, University of Luxembourg, Luxembourg, Luxembourg
Seifeddine Bettaieb, Seung Yeob Shin, Mehrdad Sabetzadeh & Lionel C. Briand
University of Ottawa, Ottawa, ON, Canada
Mehrdad Sabetzadeh & Lionel C. Briand
BGL BNP Paribas, Luxembourg, Luxembourg
Michael Garceau & Antoine Meyers

Authors

Seifeddine Bettaieb
View author publications
You can also search for this author in PubMed Google Scholar
Seung Yeob Shin
View author publications
You can also search for this author in PubMed Google Scholar
Mehrdad Sabetzadeh
View author publications
You can also search for this author in PubMed Google Scholar
Lionel C. Briand
View author publications
You can also search for this author in PubMed Google Scholar
Michael Garceau
View author publications
You can also search for this author in PubMed Google Scholar
Antoine Meyers
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Seung Yeob Shin.

Additional information

Communicated by: Eric Knauss, Michael Goedicke and Paul Grünbacher

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the Topical Collection: Requirements Engineering for Software Quality (REFSQ)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bettaieb, S., Shin, S.Y., Sabetzadeh, M. et al. Using machine learning to assist with the selection of security controls during security assessment. Empir Software Eng 25, 2550–2582 (2020). https://doi.org/10.1007/s10664-020-09814-x

Download citation

Published: 13 April 2020
Issue Date: July 2020
DOI: https://doi.org/10.1007/s10664-020-09814-x

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Using machine learning to assist with the selection of security controls during security assessment