Abstract
This article offers several contributions to the interdisciplinary project of responsible research and innovation in data science and AI. First, it provides a critical analysis of current efforts to establish practical mechanisms for algorithmic auditing and assessment to identify limitations and gaps with these approaches. Second, it provides a brief introduction to the methodology of argument-based assurance and explores how it is currently being applied in the development of safety cases for autonomous and intelligent systems. Third, it generalises this method to incorporate wider ethical, social, and legal considerations, in turn establishing a novel version of argument-based assurance that we call ‘ethical assurance.’ Ethical assurance is presented as a structured method for unifying the myriad practical mechanisms that have been proposed. It is built on a process-based form of project governance that enlists reflective innovation practices to operationalise normative principles, such as sustainability, accountability, transparency, fairness, and explainability. As a set of interlocutory governance mechanisms that span across the data science and AI lifecycle, ethical assurance supports inclusive and participatory ethical deliberation while also remaining grounded in social and technical realities. Finally, this article sets an agenda for ethical assurance, by detailing current challenges, open questions, and next steps, which serve as a springboard to build an active (and interdisciplinary) research programme as well as contribute to ongoing discussions in policy and governance.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
Readers who wish to jump straight to our positive proposal can begin with this section, but in doing so will skip over important details that explain the context and motivation for the proposal itself.
It is important to acknowledge that [11] recognise that the mechanisms alone are merely tools to support wider processes of governance, and also suggest the need for pursuing argument-based forms of assurance in Appendix III.
Ashmore et al. [4] also define key desiderata for each of the four stages of their “ML lifecycle”: data management, model learning, model verification, and model deployment.
For instance, consider the following statement from (Hawkins et al. [34], 13): “requirements such as security or usability should be defined as ML safety requirements only if the behaviours or constraints captured by these requirements influence the safety criticality of the ML output. ‘Soft constraints’ such as interpretability may be crucial to the acceptance of an ML component especially where the system is part of a socio‐technical solution. All such constraints defined as ML safety requirements must be clearly linked to safety outcomes.”.
Ward and Habli do acknowledge that the first step in the process of developing an assurance case centred upon interpretability is to “ask why the project needs interpretability and set the desired requirements that the project should satisfy.” Therefore, it is possible that the pattern they offer may also serve to provide assurance for wider (interpretability-linked) normative goals.
The term ‘MLOps’ refers to the application of DevOps practices to ML pipelines. The term is often used in an inclusive manner to incorporate traditional statistical or data science practices that support the ML lifecycle, but are not themselves constitutive of machine learning (e.g. exploratory data analysis), as well as deployment practices that are important within business and operational contexts (e.g. monitoring KPIs).
There is some notable overlap between this stage of the project lifecycle and the ethical assurance methodology, as some approaches to model reporting often contain similar information that is used in building an ethical assurance case [4, 51], specifically in the process of establishing evidential claims and warrant (see Sect. 4.2).
Algorithmic aversion refers to the reluctance of human agents to incorporate algorithmic tools as part of their decision-making processes due to misaligned expectations of the algorithm’s performance (see [12]).
This does not mean that if a system has been deployed with little to no oversight, nor with any due consideration given to the transparency and accountability of the processes, and ends up causing significant harm, that those responsible should be able to claim that it was due to “unforeseeable risk.”.
Harm to the environment can of course be incorporated into a broader notion of ‘safety,’ such that pollution generated in the everyday operations of a power station are factored into a safety assessment. However, the point we wish to address here is that the scope of concepts, such as ‘safety’ and ‘reliability’ tends to reflect a domain-specific focus or set of priorities (e.g. compliance with technical or legal standards, rather than ethical principles).
In formal terms, we can describe the task of a classifier as trying to determine (or, predict) the value of some unknown variable \({y}_{i}\) \(\in\) \(Y\) based an an observed variable \({x}_{i}\) \(\in\) \(X\). In the case of supervised learning, the ML algorithm is trained on a series of labelled data, taking the form \(({x}_{1},{y}_{1}),...,({x}_{n},{y}_{n})\), where each example is a pair \(({x}_{i},{y}_{i})\) of an instance \({x}_{i}\) and a label \({y}_{i}\). The goal is to learn an optimal mapping function (given certain pre-specified constraints) from the domain of possible values for \(X\) to the range of values that the target variable \(Y\) can assume. This formulation of the classification task covers many concrete examples and algorithm types, at a high level of abstraction (e.g. risk assessment, automated credit scoring, object identification).
For the purpose of this illustration, we will not worry about the specific details of \(X\) or \(Y\) However, the general format of this case study is similar to many widely used scoring systems, which need not rely on ML to function (e.g. [64].
This is just a selection of considerations. We cannot hope to cover all other relevant topics here, such as the importance of ensuring that the fairness optimisation constraints are considered reasonable by the affected stakeholders.
This also connects with some possible, future directions for ethical assurance that we discuss in §5.3. Specifically, the possibility of modularising ethical assurance to support the development of argument patterns or a model-based approach.
Diagnostic access bias arises when individuals differ in their geographic, temporal, and economic access to healthcare services, this variation may result in their exclusion from a study or dataset, differential access to diagnostic tests, or affect the accuracy of the diagnostic test itself. This can cause under- or over-estimation of the true prevalence of a disease, and lead to worse treatment for socioeconomically deprived individuals.
It also involves what epistemologists refer to as the transmission of justification across inference, which is a process where the justification for one belief (p) derives its justification from the justification that one has for a secondary belief (q) [52].
Those readers who are familiar with informal logic and argumentation theory will recognise that this structure is also heavily influenced by the work of Stephen Toulmin [69], whose research into the structure of arguments has been highly influential in the development of ABA.
The sufficiency of the overall assurance case will, of course, depend on 1 and 2.
References
Ananny, M., Crawford, K.: Seeing without knowing: limitations of the transparency ideal and its application to algorithmic accountability. New Media Soc 20(3), 973–989 (2018). https://doi.org/10.1177/1461444816676645
Andersson, E., McLean, S., Parlak, M., Melvin, G.: From fairy tale to Reality: Dispelling the myths around citizen engagement. Involve and the RSA (2013)
Arnold, M., Bellamy, R.K.E., Hind, M., Houde, S., Mehta, S., Mojsilović, A., Nair, R., et al.: FactSheets: increasing trust in AI services through supplier’s declarations of conformity. IBM J Res Dev 63(4/5), 1–13 (2019). https://doi.org/10.1147/JRD.2019.2942288
Ashmore, R., Calinescu, R., Paterson, C.: Assuring the machine learning lifecycle: desiderata, methods, and challenges. [Cs, Stat], May (2019). http://arxiv.org/abs/1905.04223.
Beauchamp, T.L., DeGrazia, D.: Principles and principlism. In: Khushf, G. (ed.) Handbook of Bioethics, pp. 55–74. Springer, Dordrecht (2004). https://doi.org/10.1007/1-4020-2127-5_3
Beauchamp, T.L., Childress, J.F.: Principles of Biomedical Ethics, 7th edn. Oxford University Press, New York (2013)
Bender, E.M., Friedman, B.: Data statements for natural language processing: toward mitigating system bias and enabling better science. Trans Assoc Comput Linguist 6, 587–604 (2018). https://doi.org/10.1162/tacl_a_00041
Benjamin, R.: Race After Technology: Abolitionist Tools for the New Jim Code. Polity, Medford (2019)
Binns, R.: What can political philosophy teach us about algorithmic fairness? IEEE Secur. Privacy 16(3), 73–80 (2018). https://doi.org/10.1109/MSP.2018.2701147
Bloomfield, R., Bishop, P.: Safety and assurance cases: past, present and possible future an adelard perspective. In: Dale, C., Anderson, T. (eds.) Making Systems Safer, pp. 51–67. Springer, London (2010). https://doi.org/10.1007/978-1-84996-086-1_4
Brundage, M., Avin, S., Wang, J., Belfield, H., Krueger, G., Hadfield, G., Khlaaf, H., et al.: Toward trustworthy AI development: mechanisms for supporting verifiable claims (2020). arXiv:2004.07213 [Cs], http://arxiv.org/abs/2004.07213
Burton, S., Habli, I., Lawton, T., McDermid, J., Morgan, P., Porter, Z.: Mind the gaps: assuring the safety of autonomous systems from an engineering, ethical, and legal perspective. Artif. Intell. 279(February), 103201 (2020). https://doi.org/10.1016/j.artint.2019.103201
Cartwright, N., Hardie, J.: Evidence-based policy: a practical guide to doing it better. Oxford University Press, Oxford (2012)
CDEI.: The Roadmap to an Effective AI Ecosystem. Centre for Data Ethics and Innovation. https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/1039146/The_roadmap_to_an_effective_AI_assurance_ecosystem.pdf (2021)
Cleland, G.M., Habli, I., Medhurst, J., Health Foundation (Great Britain).: Evidence: using safety cases in industry and healthcare (2012)
Cobbe, J., Lee, M.S.A., Singh, J.: Reviewable automated decision-making: a framework for accountable algorithmic systems. In: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. ACM, Virtual Event Canada, pp. 598–609 (2021) https://doi.org/10.1145/3442188.3445921.
Collingridge, D.: The Social Control of Technology. St. Martin’s Press, New York (1980)
Collins, G.S., Reitsma, J.B., Altman, D.G., Moons, K.G.M.: Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. Ann. Intern. Med. 162(1), 55 (2015). https://doi.org/10.7326/M14-0697
Commission Law.: Automated Vehicles: Summary of Consultation Paper 3 A Regulatory Framework for Automated Vehicles (2020)
Community GSN.: GSN Community Standard (Version 2). The Assurance Case Working Group (2018)
Diakopoulos, N.: Algorithmic accountability reporting: on the investigation of black boxes. Tow Center for Digital Journalism (2014)
Diakopoulos, N.: Algorithmic accountability: journalistic investigation of computational power structures. Digit. J. 3(3), 398–415 (2015). https://doi.org/10.1080/21670811.2014.976411
Dryzek, J.S., List, C.: Social choice theory and deliberative democracy: a reconciliation. Br. J. Political Sci. 33(1), 1–28 (2003)
Van Eemeren, F.H., Grootendorst, R.: A Systematic Theory of Argumentation: The Pragma-Dialectical Approach. Cambridge University Press, Cambridge (2004)
Fang, H., Miao, H.: Introducing the model card toolkit for easier model transparency reporting. Google AI Blog (2020)
Gebru, T., Morgenstern, J., Vecchione, B., Vaughan, J.W., Wallach, H., Daumé III, H., Crawford, K.: Datasheets for datasets. In: Proceedings of the 5th Workshop on Fairness, Accountability, and Transparency in Machine Learning (2018). http://arxiv.org/abs/1803.09010
Gebru, T., Morgenstern, J., Vecchione, B., Vaughan, W., Wallach, H., Daumé, III H., Crawford, K: Datasheets for Datasets (2019). arXiv:1803.09010 [Cs]. http://arxiv.org/abs/1803.09010.
Habermas, J.: On the Pragmatics of Communication. MIT Press, Cambridge (1998)
Habli, I., Alexander, R., Hawkins, R.: Safety cases: an impending crisis? In: Safety-Critical Systems Symposium (SSS’21), 18 (2021)
Habli, I., Alexander, R., Hawkins, R., Sujan, M., McDermid, J., Picardi, C., Lawton, T.: Enhancing COVID-19 decision making by creating an assurance case for epidemiological models. BMJ Health Care Inform 27(3), e100165 (2020). https://doi.org/10.1136/bmjhci-2020-100165
Haddon-Cave, C., Great Britain, Parliament, and House of Commons.: The NIMROD Review: an independent review into the broader issues surrounding the loss of the RAF Nimrod Mr2 Aircraft Xv230 in Afghanistan in 2006. Stationery Office, London (2009)
Hao, K.: In 2020, Let’s Stop AI Ethics-Washing and Actually Do Something. MIT Technology Review (2019). https://www.technologyreview.com/2019/12/27/57/ai-ethics-washing-time-to-act/.
Hawkins, R., Habli, I., Kolovos, D., Paige, R., Kelly, T.: Weaving an assurance case from design: a model-based approach. In: 2015 IEEE 16th international symposium on high assurance systems engineering. IEEE, Daytona Beach Shores, pp. 110–17 (2015) https://doi.org/10.1109/HASE.2015.25.
Hawkins, R., Paterson, C., Picardi, C., Jia, Y., Calinescu, R., Habli, I.: Guidance on the Assurance of Machine Learning in Autonomous Systems.” University of York: Assuring Autonomy International Programme (AAIP) (2021).
Ho, H.L.: The legal concept of evidence. In: Edward, N.Z. (Ed.) The Stanford Encyclopedia of Philosophy, Winter 2015. Metaphysics Research Lab, Stanford University.
Holland, S., Hosny, A., Newman, S., Joseph, J., Chmielinski, K.: The dataset nutrition label: a framework to drive higher data quality standards (2018).
Horty, J.F.: Reasons as Defaults. Oxford University Press, New York (2014)
ICO.: Guidance on the AI Auditing Framework. Information Commissioner’s Office (2020)
ICO, and Alan Turing Institute.: Explaining Decisions Made with AI (2020)
Kalluri, P.: Don’t ask if artificial intelligence is good or fair, ask how it shifts power. Nature 583(7815), 169–269 (2020). https://doi.org/10.1038/d41586-020-02003-2
Kelly, T.P. Arguing safety A systematic approach to managing safety cases. Ph.D. thesis, Department of Computer Science: University of York (1998).
Kind, C.: The Term ‘Ethical AI’ Is Finally Starting to Mean Something | VentureBeat. VentureBeat (2020). https://venturebeat.com/2020/08/23/the-term-ethical-ai-is-finally-starting-to-mean-something/. Accessed 6 May 2021
Kroll, J.A.: Outlining traceability: a principle for operationalizing accountability in computing systems. In: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. ACM, Virtual Event Canada, pp. 758–71 (2021) https://doi.org/10.1145/3442188.3445937.
Leslie, D.: Understanding artificial intelligence ethics and safety. The Alan Turing Institute, London (2019)
Leslie, D.: The Secret Life of Algorithms in the Time of COVID-19. The Alan Turing Institute (2020) https://www.turing.ac.uk/blog/secret-life-algorithms-time-covid-19.
Leslie, D.: The arc of the data scientific universe. Harvard Data Sci Rev (2021). https://doi.org/10.1162/99608f92.938a18d7
Leslie, D., Rincon, C., Burr, C., Aitken, Katell, M., & Briggs, M.: AI Sustainability in Practice: Part I. The Alan Turing Institute and the UK Office for AI (2022a)
Leslie, D., Rincon, C., Burr, C., Aitken, Katell, M., & Briggs, M. (2022b). AI Sustainability in Practice: Part II. The Alan Turing Institute and the UK Office for AI
Lucyk, K., McLaren, L.: Taking Stock of the Social Determinants of Health: a scoping review. Edited by Spencer Moore. PLoS One 12(5), e0177306 (2017). https://doi.org/10.1371/journal.pone.0177306
Lundberg, S.: “Slundberg/Shap.” (2020). GitHub Repository. https://github.com/slundberg/shap. Accessed: June 2021.
Mitchell, M., Wu, S., Zaldivar, A., Barnes, P., Vasserman, L., Hutchinson, B., Spitzer, E., Raji, I.D., Gebru, T.: Model cards for model reporting. Proceedings of the Conference on Fairness, Accountability, and Transparency-FAT* ’19, pp. 220–29 (2019) https://doi.org/10.1145/3287560.3287596.
Moretti, L., Piazza, T.: Transmission of justification and warrant (2013).
Morley, J., Floridi, L., Kinsey, L., Elhalal, A.: From what to how: an initial review of publicly available AI ethics tools, methods and research to translate principles into practices. Sci. Eng. Ethics (2019). https://doi.org/10.1007/s11948-019-00165-5
Mökander, J., Floridi, L.: Ethics-based auditing to develop trustworthy AI. Mind. Mach. (2021). https://doi.org/10.1007/s11023-021-09557-8
O’Neill, O.: A Question of Trust. Cambridge University Press, Cambridge (2002)
Object Management Group.: Adelard. Macrh 2018. “Structured Assurance Case Metamodel (SACM) Version 2.0.”
Owen, R., Bessant, J.R., Heintz, M. (eds.): Responsible Innovation. Wiley, Chichester (2013)
PAIR.: “What-If Tool-People + AI Research (PAIR).” (2020) https://pair-code.github.io/what-if-tool/.
Picardi, C., Paterson, C., Hawkins, R., Calinescu, R., Habli, I.: Assurance argument patterns and processes for machine learning in safety-related systems. In: Proceedings of the Workshop on Artificial Intelligence Safety (SafeAI 2020), 23–30. CEUR Workshop Proceedings. CEUR Workshop Proceedings (2020).
Raji, I.D., Smart, A., White, N., Mitchell, M., Gebru, T., Hutchinson, B., Smith-Loud, J., Theron, D., Barnes, P.: Closing the AI accountability gap: defining an end-to-end framework for internal algorithmic auditing 12 (2020)
Rawls, J.: A Theory of Justice, Revised Belknap Press of Harvard University Press, Cambridge (1999)
Reisman, D., Schultz, J., Crawford, K., Whittaker, M.: Algorithmic Impact Assessments: A Practical Framework for Public Accountability. AI Now (2018).
Research, IBM.: Introducing AI Fairness 360, A Step Towards Trusted AI. IBM Research Blog (2018). https://www.ibm.com/blogs/research/2018/09/ai-fairness-360/.
Royal College of Physicians.: “National Early Warning Score (NEWS) 2.” RCP London. (2017). https://www.rcplondon.ac.uk/projects/outputs/national-early-warning-score-news-2.
Selbst, A.D., Boyd, D., Friedler, S.A., Venkatasubramanian, S., Vertesi, J.: Fairness and abstraction in sociotechnical systems. In: Proceedings of the Conference on Fairness, Accountability, and Transparency-FAT* ’19. ACM Press, Atlanta, pp. 59–68 (2019) https://doi.org/10.1145/3287560.3287598.
Stilgoe, J., Owen, R., Macnaghten, P.: Developing a framework for responsible innovation. Res. Policy 42(9), 1568–1580 (2013). https://doi.org/10.1016/j.respol.2013.05.008
Sujan, M., Habli, I.: Safety cases for digital health innovations: can they work? BMJ Qual Saf, May, bmjqs-2021-012983 (2021). https://doi.org/10.1136/bmjqs-2021-012983.
Sweenor, D., Hillion, S., Rope, D., Kannabiran, D., Hill, T., O’Connell, M.: O’Reilly Media Company Safari. ML Ops: Operationalizing Data Science (2020)
Toulmin, S.: The Uses of Argument, Updated Cambridge University Press, Cambridge (2003)
Ward, F.R., Habli, I.: An assurance case pattern for the interpretability of machine learning in safety-critical systems. In: Casimiro, A., Ortmeier, F., Schoitsch, E., Bitsch, F., Ferreira, P. (Eds.) Computer Safety, Reliability, and Security. SAFECOMP 2020 Workshops, vol. 12235. Springer International Publishing, Cham, pp. 395–407 (2020). https://doi.org/10.1007/978-3-030-55583-2_30.
Acknowledgements
We wish to thank Ibrahim Habli, Zoe Porter, Geoff Keeling, Rosamund Powell, and Mike Katell for their insightful comments on earlier drafts of this article, as well as offering suggestions for further research that took the article in valuable directions, which it otherwise would not have explored.
Funding
This research was supported by a grant from the UKRI Trustworthy Autonomous Systems Hub, awarded to Dr Christopher Burr. Additional funding was provided by Engineering and Physical Sciences Research Council (EPSRC Grant# EP/T001569/1, EPSR Grant# EP/W006022/1), Economic and Social Research Council (ESRC Grant # ES/T007354/1).
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflict of interest
On behalf of all the authors, the corresponding author states that there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Burr, C., Leslie, D. Ethical assurance: a practical approach to the responsible design, development, and deployment of data-driven technologies. AI Ethics 3, 73–98 (2023). https://doi.org/10.1007/s43681-022-00178-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s43681-022-00178-0