ENISA Report - Data Protection Engineering
ENISA Report - Data Protection Engineering
ENGINEERING
From Theory to Practice
JANUARY 2022
0
DATA PROTECTION ENGINEERING
January 2022
ABOUT ENISA
The European Union Agency for Cybersecurity, ENISA, is the Union’s agency dedicated to
achieving a high common level of cybersecurity across Europe. Established in 2004 and
strengthened by the EU Cybersecurity Act, the European Union Agency for Cybersecurity
contributes to EU cyber policy, enhances the trustworthiness of ICT products, services and
processes with cybersecurity certification schemes, cooperates with Member States and EU
bodies, and helps Europe prepare for the cyber challenges of tomorrow. Through knowledge
sharing, capacity building and awareness raising, the Agency works together with its key
stakeholders to strengthen trust in the connected economy, to boost resilience of the Union’s
infrastructure, and, ultimately, to keep Europe’s society and citizens digitally secure. More
information about ENISA and its work can be found here: www.enisa.europa.eu.
CONTACT
For contacting the authors please use isdp@enisa.europa.eu
For media enquiries about this paper, please use press@enisa.europa.eu
CONTRIBUTORS
Claude Castelluccia (INRIA),
Giuseppe D'Acquisto (Garante per la Protezione dei Dati Personali),
Marit Hansen (ULD),
Cedric Lauradoux (INRIA),
Meiko Jensen (Kiel University of Applied Science),
Jacek Orzeł (SGH Warsaw School of Economics)
Prokopios Drogkaris (European Union Agency for Cybersecurity).
EDITORS
Prokopios Drogkaris (European Union Agency for Cybersecurity),
Monika Adamczyk (European Union Agency for Cybersecurity).
ACKNOWLEDGEMENTS
We would like to thank the colleagues from the European Data Protection Board (EDPB),
Technology Subgroup and the colleagues from the European Data Protection Supervisor
(EDPS), Technology and Privacy Unit, for reviewing this report and providing valuable
comments.
We would also like to thank Kim Wuyts, Veronica Jarnskjold Buer, Konstantinos Limniotis, Paolo
Balboni, Stefan Schiffner, Jose M. del Alamo, Irene Kamara and the ENISA colleague Athena
Bourka for their review and valuable comments.
1
DATA PROTECTION ENGINEERING
January 2022
LEGAL NOTICE
This publication represents the views and interpretations of ENISA, unless stated otherwise. It
does not endorse a regulatory obligation of ENISA or of ENISA bodies pursuant to the
Regulation (EU) No 2019/881.
ENISA has the right to alter, update or remove the publication or any of its contents. It is
intended for information purposes only and it must be accessible free of charge. All references
to it or its use as a whole or partially must contain ENISA as its source.
Third-party sources are quoted as appropriate. ENISA is not responsible or liable for the content
of the external sources including external websites referenced in this publication.
Neither ENISA nor any person acting on its behalf is responsible for the use that might be made
of the information contained in this publication.
COPYRIGHT NOTICE
© European Union Agency for Cybersecurity (ENISA), 2022
This publication is licenced under CC-BY 4.0 “Unless otherwise noted, the reuse of this
document is authorised under the Creative Commons Attribution 4.0 International (CC BY 4.0)
licence (https://creativecommons.org/licenses/by/4.0/). This means that reuse is allowed,
provided that appropriate credit is given and any changes are indicated”.
For any use or reproduction of photos or other material that is not under the ENISA copyright,
permission must be sought directly from the copyright holders.
2
DATA PROTECTION ENGINEERING
January 2022
TABLE OF CONTENTS
1. INTRODUCTION 6
3.1 ANONYMISATION 10
3.2 k-ANONYMITY 11
3
DATA PROTECTION ENGINEERING
January 2022
7. CONCLUSIONS 34
8. REFERENCES 36
4
DATA PROTECTION ENGINEERING
January 2022
EXECUTIVE SUMMARY
The evolution of technology has brought forward new techniques to share, process and store
data. This has generated new models of data (including personal data) processing, but also
introduced new threats and challenges. Some of the evolving privacy and data protection
challenges associated with emerging technologies and applications include: lack of control and
transparency, possible reusability of data, data inference and re-identification, profiling and
automated decision making.
The implementation of the GDPR data protection principles in such contexts is challenging as
they cannot be implemented in the traditional, “intuitive” way. Processing operations must be
rethought, sometimes radically (similar to how radical the threats are), possibly with the
definition of new actors and responsibilities, and with a prominent role for technology as an
element of guarantee. Safeguards must be integrated into the processing with technical and
organisational measures. From the technical side, the challenge is to translate these principles
into tangible requirements and specifications by requirements by selecting, implementing and
configuring appropriate technical and organizational measures and techniques
Data Protection Engineering can be perceived as part of data protection by Design and by
Default. It aims to support the selection, deployment and configuration of appropriate technical
and organizational measures in order to satisfy specific data protection principles. Undeniably it
depends on the measure, the context and the application and eventually it contributes to the
protection of data subjects’ rights and freedoms.
The current report took a broader look into data protection engineering with a view to support
practitioners and organizations with practical implementation of technical aspects of data
protection by design and by default. Towards this direction this report presents existing
(security) technologies and techniques and discusses possible strengths and applicability in
relation to meeting data protection principles as set out in Article 5 GDPR.
Based on the analysis provided in the report, the following conclusions and recommendations
for relevant stakeholders are provided below:
Regulators (e.g. Data Protection Authorities and the European Data Protection Board)
should discuss and promote good practices across the EU in relation to state-of-the-art
solutions of relevant technologies and techniques. EU Institutions could promote such
good practices by relevant publicly available documents.
Regulators (e.g. Data Protection Authorities and the European Data Protection Board)
and the European Commission should promote the establishment of relevant
certification schemes, under Article 42 GDPR, to ensure proper engineering of data
protection.
5
DATA PROTECTION ENGINEERING
January 2022
1. INTRODUCTION
Technological advancements over the last years have impacted the way our personal data is
being shared and processed. The evolution of technology has brought forward new techniques
to share, process and store data. This has generated new models of data (including personal
data) processing, but also introduced new threats and difficulties for the end user to understand
and control the processing. Continuous online presence of end users has resulted in an
increased processing of large amounts of personal data at daily basis. Think of online shopping
or using a mobile application to navigate to a specific location or contact friends and family. The
whole data lifecycle has been augmented with many actors being involved and eventually end
users not being able to fully understand and control who, for how long and for what purpose has
access to their personal data.
These new technologies have often been introduced without a prior assessment of the impact The evolution of
on privacy and data protection. In this context, processing of personal data is often
technology has
characterised by the absence of a predetermined purpose and by the discovery of new
correlations between the observed phenomena, for example in the case of big data or machine brought forward
learning. This modus operandi conflicts essentially with the principles of necessity and purpose new techniques to
limitation, as these are stipulated by the GDPR. Blockchain and distributed ledger technologies, share, process and
as another example, offer the opportunity of replacing intermediation-based transactions, but at
store data which
the potential expense of a substantial loss of individuals’ control over their data, which remain
visible in the chain by all blockchain participants, as long as it is active or perhaps even beyond
has introduced
that. This, depending of course on the use case, contradicts the GDPR principle of data new threats and
minimization, and constitutes a severe obstacle for the exercise of the right to deletion by data challenges
subjects. Lastly, Artificial Intelligence systems might be empowered to take decisions with some
degree of autonomy to achieve specific goals, for example in credit score evaluation in the
finance domain. Such autonomy might very well be in conflict with the prerequisites of human
agency over machines and self-determination, both at the heart of personal data protection and
the GDPR.
As also discussed in [1], some of the evolving privacy and data protection challenges
associated with emerging technologies and applications include: lack of control and
transparency, incompatible reuse of data, data inference and re-identification, profiling and
automated decision making. The implementation of the GDPR data protection principles in such
contexts is challenging as they cannot be implemented in the traditional, “intuitive” way.
Processing operations must be rethought and redesigned, sometimes radically (similar to how
radical the threats and the attack vectors are), possibly with the definition of new actors and
responsibilities, and with a prominent role for technology as an element of guarantee.
Appropriate technical and organisational measures, as well as safeguards, must be considered
at the earliest stage possible and integrated into the processing. This is the scope of the notion
of data protection by design, enshrined in Article 25 of the GDPR.
1
It was brought forward by Dr. Ann Cavoukian and was also evident as a notion but not explicitly mentioned in the Directive
95/46/EC and the ePrivacy Directive. See also European Data Protection Supervisor (EDPS) [6] EDPS Opinion 5/2018
“Preliminary Opinion on privacy by design”
6
DATA PROTECTION ENGINEERING
January 2022
in very broad terms as a general principle; by researchers and engineers on the other hand it is
often equated with the use of specific Privacy Enhancing Technologies (PETS). However, privacy
by design is neither just a list of principles nor can it be reduced to the implementation of specific
technologies. In fact, it is a process involving various technological and organizational
components, which implement privacy and data protection principles by properly and timely
deploying technical and organization measures that include also PETS.
The obligation described in Article 25 is for controllers to have effective data protection
designed and integrated into the processing of personal data, with appropriate default settings
configuration or otherwise available throughout the processing lifecycle. Further to the adoption
of the GDPR, the EDPB has published a set of guidelines [2] on Data Protection by Design and
by Default and provided guidance on their application. The core obligation is the implementation
of appropriate measures and necessary safeguards that provide effective implementation of the
data protection principles and, consequentially, data subjects’ rights and freedoms by design
and by default. Through the various examples provided, it is evident that proper and timely
development and integration of technical and organizational measures into the data processing
activities play a big role in the practical implementation of different data protection principles.
Engineering those principles relates not only to choices made with regards to designing the Data Protection
processing operation but also selecting, deploying, configuring and maintaining appropriate
Engineering aims
technological measures and techniques. These techniques would support the fulfilment of the
data protection principles and offer a level of protection adequate to the level of risk the to support the
personal data are exposed to. Data Protection Engineering can be perceived as part of data selection,
protection by Design and by Default. It aims to support the selection, deployment and deployment and
configuration of appropriate technical and organizational measures in order to satisfy specific configuration of
data protection principles. Undeniably it depends on the measure, the context and the
appropriate
application and eventually it contributes to the protection of data subjects’ rights and freedoms.
technical and
1.2 SCOPE AND OBJECTIVES organizational
The overall scope of this report is to take a broader look into data protection engineering with a measures in order
view to support practitioners and organizations with practical implementation of technical to satisfy data
aspects of data protection by design and by default. Towards this direction this report attempts protection
to present existing (security) technologies and techniques and discuss possible strengths and
principles
applicability in relation to meeting data protection principles. This work is performed in the
context of ENISA’s tasks under the Cybersecurity Act (CSA)2 to support Member States on
specific cybersecurity aspects of Union policy and law relating to data protection and privacy.
This work is intended to provide the basis for further and more specific analysis of the identified
categories of technologies and techniques while demonstrating their practical applicability.
2
Regulation (EU) 2019/881 of the European Parliament and of the Council of 17 April 2019 on ENISA (the European Union
Agency for Cybersecurity) and on information and communications technology cybersecurity certification and repealing
Regulation (EU) No 526/2013 (Cybersecurity Act) http://data.europa.eu/eli/reg/2019/881/oj
7
DATA PROTECTION ENGINEERING
January 2022
2. ENGINEERING DATA
PROTECTION
In its Preliminary Opinion 5/2018 [6] on privacy by design, the European Data Protection
Supervisor (EDPS) provided a detailed overview of privacy engineering methodologies as a
mean to translate the principles of privacy by design and by default. In the same Preliminary
Opinion, the EDPS provided examples of methodologies to identify privacy and data protection
requirements and integrate them into privacy engineering processes with a view to
implementing appropriate technological and organisational safeguards. Some of these
methodologies define data protection goals directly from privacy and data protection principles,
such as those of the GDPR, or derive them from operational intermediate goals. Other
methodologies are driven by risk management.
3
http://pripareproject.eu/ . Relevant publication is available under https://www.slideshare.net/richard.claassens/pripare-
methodologyhandbookfinalfeb242016
1
Please use footnotes for providing additional or explanatory information and/or relevant links. References should be listed
in a dedicated section. Use only the function References/Insert Footnote
8
DATA PROTECTION ENGINEERING
January 2022
impact assessment such as Privacy Impact Assessment (PIA)4 and relevant DPIA templates
and guidance made available by other National Data Protection Authorities5.
Privacy Enhancing
Technologies can
2.3 PRIVACY ENHANCING TECHNOLOGIES
be categorized
Privacy Enhancing Technologies (PETs) cover the broader range of technologies that are
designed to support implementation of data protection principles at a systemic and fundamental based on the
level. As described in [7], PETs are “a coherent system of ICT measures that protects privacy characteristics of
by eliminating or reducing personal data or by preventing unnecessary and/or undesired the technology
processing of personal data, all without losing the functionality of the information system”.
used in relation to
PETS, as technical solutions, can be perceived as building blocks towards meeting data
protection principles and the obligations under GDPR Art. 25 on data protection by design.
the data being
Therefore, they also comprise elements of the building blocks of data protection engineering. processed
As PETs can vary from a single technical tool to a whole deployment depending on the context,
scope and the processing operation itself, it is evident that there is not a one-size fits all
approach and there is a need for further categorization across different PETs. Towards this
direction ENISA has put forward a methodology [8] on analysing the maturity of PETs and a
framework on assessing and evaluating PETS in the context of online and mobile privacy tools.
As highlighted by the Spanish Supervisory Authority (SA) in [9], a number of initiatives exists on
classifications of PETs; either based on their technical characteristics or on the goals they
pursue (in relation to the data protection principles they can support).
With regards to specific tools and technologies, another categorization can be based on the
characteristics of the technology used in relation to the data being processed. More specifically,
these characteristics can be:
Further to these characteristics, additional categorization can be performed with regards to the
GDPR data protection principles that each category can support, at least in theory. Attempting
to perform such a taxonomy could be of great value to data controllers and processors as it
would provide a reference model of either what purposes each tool or technique can serve or as
an indication of what is already achieved by already deployed tools and techniques. It should be
noted however that the overall analysis should always be performed per processing operation
and also include aspects such as nature, scope, context and purposes of processing, similar to
the notion of DPIA.
4
https://www.cnil.fr/en/privacy-impact-assessment-pia
5
https://www.datatilsynet.no/rettigheter-og-plikter/virksomhetenes-plikter/vurdere-personvernkonsekvenser/vurdering-av-
personvernkonsekvenser/
9
DATA PROTECTION ENGINEERING
January 2022
3. ANONYMISATION AND
PSEUDONYMISATION
Anonymisation and pseudonymisation are two very well-known techniques that are widely used A generic
to practically implement data protection principles such as data minimization. Pseudonymisation
anonymisation
is also explicitly mentioned in the GDPR as technique that can support data protection by
design (Art. 25 GDPR) and security of personal data processing (Art. 32 GDPR). However, they scheme cannot be
are often confused, when in fact there is an important difference among them and their applied to all use
application in practice. As already pointed out by Working Party 29 [10] and according to GDPR cases and provide
Recital (26), anonymous information refers to information which does not relate to an identified full and unlimited
or identifiable natural person - and, thus, anonymous data are not considered as personal data.
protection
On the contrary, according to Art. 4 (5) pseudonymised data, which can be (re)attributed to a
natural person with the use of additional information, are personal data and GDPR data
protection principles apply to them. A common mistake is to consider pseudonymised data to be
equivalent to anonymized data.
In the area of pseudonymisation, ENISA has published over the last years a number of reports
[11], [12] & [13] that cover the notion and role of the technique under the GDPR, different
pseudonymisation techniques and models and a number of use cases where its applicability is
demonstrated in practice. To this end, the focus of this Section is primarily on data
anonymisation, aiming to discuss briefly k-anonymity and differential privacy as two possible
techniques to anonymise relational or tabular data.
In the case of non-tabular data or sequential data, anonymisation might not be as easy or as
straight-forward. For example in the case of mobility data, relevant studies [14], [15] & [16] have
proved that by knowing 3 or 4 spatial-temporal points of a trajectory was sufficient to re-identify,
with a high probability, a person in a population of several million individuals. Possible solutions
can be to publish only statistics on different trajectories propose or publish synthetic data, i.e.,
artificially generated trajectories from the statistical characteristics of real trajectories [17].
Synthetic data are discussed further in Section 4.5.
3.1 ANONYMISATION
Data anonymisation is an optimization problem between two conflicting parameters: data utility
and re-identification protection. In fact, data anonymisation is achieved by altering the data,
either by noising or by generalizing. Providing strong re-identification protection usually requires
to strongly alter the dataset and therefore negatively impact its utility. Data anonymisation
therefore entails finding the best trade-off between these two parameters; and this trade-off
often depends on the application and the context (i.e., how the dataset is distributed and used).
As also mentioned in [10], [18] & [19], we should not assume that a generic anonymisation
scheme can be applied to all use cases and such a scheme will be able to provide full and
unlimited protection. Each solution must be adapted according to the type of data, the
processing operation, the context and the possible attack models. This notion should be
considered as applicable to every technique and technology discussed within this document. As
mentioned in the Working Party 29 Opinion [10], “An effective anonymisation solution prevents
all parties from singling out an individual in a dataset, from linking two records within a dataset
(or between two separate datasets) and from inferring any information in such dataset.”
10
DATA PROTECTION ENGINEERING
January 2022
The next two sections provide a quick overview of the two most popular anonymisation
approaches, namely k-anonymity and ε-differential privacy. A more in-depth overview of existing
anonymisation schemes can be found in [10], [20] & [21].
3.2 k-ANONYMITY
The k-anonymity model was introduced in the early 2000s and it is built on the idea that by k-anonymity is
combining sets of data with similar attributes, identifying information about any one of the
built on the idea
individuals contributing to that data can be obscured. As discussed in [22] a dataset is
considered to provide k-anonymity protection if the information for each data subject contained
that by combining
in the dataset cannot be distinguished from at least k-1 data subjects whose information also sets of data with
appears in the dataset. The key concept is to address the risk of re-identification of anonymized similar attributes,
data through linkage to other available datasets. For example, a sample data set is presented in identifying
Table 1 below.
information about
Table 1: Initial Dataset
any of the
individuals can be
Diagnosed Medical obscured
Name Gender Zip Code Year of Birth
Condition
George S. M 75016 1968 Depression
Martin M. M 75015 1970 Diabetes
Marie J. F 69100 1945 Heart rhythm disorders
Claire M. F 69100 1950 Multiple sclerosis
Amelia F. F 75016 1968 Nothing
Annes J. F 75012 1964 Rheumatoid arthritis
Sophia C. F 75013 1964 Blood Disorder
Simon P. M 75019 1977 Sarcoidosis
Michael J. M 75018 1976 Lymphoma
To anonymise the data of Table A, several techniques are possible such as deletion or
generalization6. In this example, the attribute Gender was kept unmodified as it was considered
important for the study of medical conditions. Furthermore, the Zip Code of the user’s address
and the Year of Birth attributes were generalized by retaining only the department zip code and
by using intervals of 10 years, respectively. Attempting to k-anonymize the data with a k value of
two (2) and with respect to the quasi-identifier {Zip Code, Year of Birth, Gender} the initial data
set is transformed to table, because for each triplet of values, there are at least two entries in
the table corresponding to it, as presented in Table 2 below.
Diagnosed Medical
Zip Code Year of Birth Gender
Condition
Depression
75*** [1960-1970] M
Diabetes
Heart rhythm disorders
69*** [1940-1950] F
Multiple sclerosis
Nothing
75*** [1960-1970] F Rheumatoid arthritis
Blood Disorder
Sarcoidosis
75*** [1970-1980] M
Lymphoma
6
Generalization can also be achieved by deletion of an attribute (column)
11
DATA PROTECTION ENGINEERING
January 2022
K-anonymity suffers from several limitations. For example, the k-anonymity criterion does not
protect against homogeneity attacks, where all the records grouped in an equivalence class
have the same or similarly sensitive value. Various extensions to the k-anonymity model have
been introduced to address this issue, such as l-diversity, which ensures that for each quasi-
identifier value corresponding to k data, there will be at least l representative values for the
sensitive data [23] & [24]. Table 2 is 2-anonymous and 2-diverse, because there are always at
least two different medical conditions within a group of individuals with the same quasi-identifier.
However, if Simon P. had Lymphoma instead of Sarcoidosis, Table 2 would still be 2-
anonymous, but would no longer be 2-diverse. In this case, one could infer that Michael J. who
belongs to the group defined by (75, [1970-1980], M) has Lymphoma, whereas before this
prediction was possible only with a probability of 50% (1/2). Another weakness of k-anonymity
is that it does not compose, i.e., several k-anonymized datasets of the same individuals may be
combined to re-identified individuals [25]. It is therefore very difficult to give an a priori
guarantee on the risk of re-identification, which might depend of the adversary’s knowledge.
The protection guarantee in k-anonymity depends on the value of k. intuitively, a large k value
provides better protection than a smaller value, at the cost of data utility. To select a parameter
for a privacy definition, the link between the parameter value and the risk of a privacy incident
happening needs to be known. As shown previously, estimating quantitatively such risk, and
therefore the corresponding k value, in k-anonymity is very difficult [26]. In the healthcare
domain, when medical data is shared with a small number of people (typically for research
purposes), k is sometimes chosen between 5 and 15 [27]. However, this choice is very arbitrary
and ad hoc.
It is noteworthy that DP-based anonymisation can come in two flavours: global or local
anonymisation. In the global model of differential privacy, data are collected by a central
aggregator that transforms it, typically by adding noise, with a differentially private mechanism.
This model requires to fully trust the aggregator. Instead in the local model, the participating
users apply a differentially private mechanism to their own data before sending it to the
aggregator. As a result, the aggregator does not need to be trusted anymore. Usually, the local
12
DATA PROTECTION ENGINEERING
January 2022
model requires to add more noise, and therefore reduces accuracy, although the use of secure
aggregation techniques can sometime be used to minimize this accuracy degradation [31].
One of the main benefits of Differential Privacy is that the privacy loss can be quantified, even if
a given dataset is anonymized several times for different purposes or different entities (we say
that “Differential Privacy composes”). For example, the same dataset that is anonymized twice
(for example by 2 different entities), each with a privacy value of ε, is still differentially private but
with a privacy parameter of 2 ε. Another important propriety of differential privacy is that post-
processing is allowed. In other words, the result of the processing of differential private data
through a fixed transformation remains differential private.
The Differentially Privacy model provides stronger protection than k-anonymity due to the added
randomness that is independent of the adversarial knowledge. As opposed to k-anonymity,
Differential Privacy does not need attack modelling and is secure no matter what the attacker
knows. It is therefore better adapted to the “release-and-forget” mode of publication. However,
DP is not well adapted for tabular data but more suited for releasing aggregated statistical
information (counting queries, average values, etc.) about a dataset. Furthermore, DP-based
anonymisation scheme often needs to be tailored to the data usage. It is challenging to
generate a DP-anonymised dataset that provides strong protection and good utility for different
purposes [33]. Furthermore, DP provides better performance for datasets where the number of
participants is large but each individual contribution is rather limited.
13
DATA PROTECTION ENGINEERING
January 2022
Masking is a broad term which refers to functions that when applied to data, they hide their true
value. The most prominent examples are encryption and hashing but as the term is rather
broad, it also covers additional techniques, some of which will be discussed within this section.
The main usability of masking with regards to data protection principles is integrity and
confidentiality (security) and depending on the technique or the context of the processing
operation it can also include accountability and purpose limitation.
There are two types of homomorphic encryption: partially and fully [34]. Partially Homomorphic
Encryption (PHE) is where only a single operation can be performed on cipher text, for example,
addition or multiplication. Fully Homomorphic Encryption (FHE) on the other hand can support
multipliable operations (currently addition and multiplication), allowing more computation to be
performed over encrypted data. Homomorphic encryption is currently a balancing act between
utility, protection, and performance. FHE has good protection and utility but poor performance.
PHE on the other hand has good performance and protection, but its utility is very limited. There
is however a catch; the performance of FHE is quite inefficient, where simple operations can
take anywhere from seconds to hours depending on security parameters [35].
The choice of the homomorphic encryption depends on the desired level of protection in
combination with the complexity of the computations to be performed over the encrypted data. If
the operations are complex, the encryption scheme will be more expensive. The complexity of
the computation is not measured as it is done classically in computer science (time and
memory) but it is measured by the diversity of operations (addition and multiplication) performed
on the inputs. If the computation only requires addition (like in the sum of some values) then
partially homomorphic encryption can be used. If the computation requires some addition and a
limited number of multiplications then somewhat homomorphic encryption, which is similar to
partially homomorphic encryption but with a limitation on the number of operations instead of
the types of operations, can be used. If the computation requires many additions and
multiplications, then fully homomorphic encryption needs to be used.
14
DATA PROTECTION ENGINEERING
January 2022
calculate functions over the input data of two parties, without revealing the input data of one Secure multiparty
party to the other party. Prominent variations of SMPC include the Byzantine Agreement [36]
computation
where the computation extends to multiple parties and auctioning [37] where participants can
place bids for an auction without revealing their bids. The latter one is already deployed as a attempts to solve
real-ife application7 in Denmark where Danish farmers determine sugar beet prices among problems of
themselves without the need for a central auctioneer. mutual trust
among a set of
The most prominent example of SMPC is found in blockchain technology: a set of parties, called
parties where no
“miners”, have to determine and agree upon the next block to append to the blockchain ledger.
This problem can be separated into two sub-tasks: individual party
can see the other
a) Determine a valid block (or set of blocks) to append to the blockchain, and parties’ data
b) Agree with all other miners on the blockchain that this new block is the one to be
appended.
Task a) can be done individually by each miner. This task consists in finding a valid hash value
that fulfils a set of requirements by means of brute-force search (this is actually the power-
consumptive part of the blockchain technology), and does not involve multiple parties yet. Once
a miner finds and announces such a valid hash value, the majority of miners have to reach
consensus that this hash – and its new block of transactions – is to be appended to the block
chain. This task is similar to Lamport’s byzantine agreement protocol, as some miners might
play false, or may propose a different hash value and block that also satisfies all requirements.
In general, secure multi-party computation protocols exist for every function that can be
computed among a set of parties. In other words, if there is a way that a set of parties can jointly
compute the output of the function (by exchanging some messages and calculating some local
intermediate results), then there always exists a secure multi-party protocol that solves that very
problem with the security guarantees required. Unfortunately, in many cases, such a secure
multi-party computation protocol can become very complex in application, and may easily
demand a huge network communication overhead. Hence, it may not be suited for application
scenarios with rapid real-time requirements.
Depending on the exact protocol chosen, secure multi-party protocols support the privacy
protection goals of confidentiality (as the inputs of other parties are not revealed) and integrity
(as even inside or external attackers cannot easily change the protocol output). This distributes
the total power among all parties involved, which can be a huge number of entities in real-world
applications like blockchain. Thereby, it becomes unrealistic that any individual party may
decide and enforce its decision unilaterally against the other parties.
Moreover, given that the utilized secure multi-party protocol must be known to each party
involved, this approach fosters transparency as to what type of processing is applied to the
input data. On the downside of this approach, it is far from trivial to manually override the result
of a secure multi-party computation in case of errors. If, for instance, an e-mail address is
written to a block of the blockchain, and its hosting block is agreed upon by the consensus
protocol among the miners, it becomes a part of the blockchain forever. Removal of this e-mail
address from the blockchain later on is almost infeasible, as it would require every miner to
locally remove it from the block, and ignore the error this deletion causes to the hash values in
the modified blockchain – a direct violation of the blockchain protocol.
7
https://partisia.com/better-market-solutions/mpc-goes-live/
15
DATA PROTECTION ENGINEERING
January 2022
(TEE) can play a key role in protecting personal data by preventing unauthorized access, data
breaches and use of malware. It provides protection against strong adversaries that get access,
either physically or remotely, to the devices. With a TEE, the processing of the data takes place
internally in the enclave. It is then theoretically impossible to obtain any data.
TEEs are used widely in various devices, such as smartphones, tablets and IoT devices. TEEs
can also play an important role to secure servers. They can execute key functions such as
secure aggregation or encryption to limit the server’s access to raw data. It may provide
opportunities to provide verifiable computations and increase trust. Indeed, TEEs enable clients
to attest and verify the code running on a given server. In particular, when the verifier knows
which binary code should run in the secure enclaves, TEEs can be used to verify that a device
is running the correct code (code integrity). For example, in a federated learning setting, TEEs
and remote attestations may be particularly helpful for clients to be able to efficiently verify key
functions running on the server, such as secure aggregation or shuffling.
16
DATA PROTECTION ENGINEERING
January 2022
There are two main models of private information retrieval. The first model is Computational
Private Information Retrieval and there is only one server storing the database. This model is
considered to provide better level of protection but has limitations with regards to the
connections that can be established to the server and the database. In the second model,
Information Theoretic Private Information Retrieval, the database is stored on several servers
which are controlled by different owners. This model allows for better communication complexity
but it is assumed that the servers do not collude or exchange information. Additional information
on PIR are available in [39] and [40].
Many practical alternatives exist for generating synthetic data. The easiest option is drawing
samples from a known distribution. In this case, the outcome does not contain any original (and
17
DATA PROTECTION ENGINEERING
January 2022
personal) data and re-identification is an unlikely occurrence, mainly due to randomness. More
complex options rely on mixing real data and fake ones (the latter being still sampled from
known multivariate distributions, conditioned on the real observed data). In this case, some
disclosure of personal data and re-identification is possible due to the presence of true values
within the dataset. The practical generation of synthetic data today, due to the variety of
attributes involved and the oddness of the probability distributions, is based both on the use of
classical random number generation routines but also, more and more, on the application of
artificial intelligence and machine learning tools.
There are pros and cons in the use of synthetic data and controllers must be aware of both. On
the benefits side, synthetic data are machine generated data, and as such they are easy and
almost costless to reproduce. The burden of collection is voided for controllers, as well as the
intrusiveness for data subjects. Furthermore, synthetic data can also cover situations in which it
may very difficult or even unethical to collect (personal) data. For instance, in counterfactual
analyses where the goal is to study the causal effects of a specific intervention and
implementing this intervention may not be a practical option. Think of situations in which one is
interested in the effect of a new treatment on a pathology, or the consequences of an exposure
to a risk factor for human health. In all those circumstances it may not be possible to give the
new treatment to the entire population, or it may not be ethical to suspend a prior one, and it is
not ethical to deliberately expose an individual to a risk factor (e.g. pollution) to check its effects
on his health status.
Synthetic data may well help controllers overcome these difficulties and allow the execution of
many simulated experiments. Furthermore, synthetic data (used as a form of anonymisation)
might benefit from extended and potentially unconstrained retention periods.
While there is much truth in this, it is important to highlight that synthetic data can only mimic
real data, replicating specific properties of a phenomenon, meaning that by no means they
should be considered true measures. In addition, being simulated data, their quality and
accuracy very much depends on the quality of the input data, which sometimes comes from
disparate sources and the data fitting model. Synthetic data may also reflect the biases in both
the sources and the adopted models, similar to the biases in machine learning. Lastly, synthetic
data generation is not a once-and-for-all option. It requires time and effort. Even if they are easy
to create, synthetic data may need output control since data accuracy cannot be taken for
granted. Especially in complex scenarios, the best way to ensure output quality is by comparing
over time the results from synthetic data with authentic labelled data. Only in this way it is
possible to reduce the risks of inconsistencies. But most importantly synthetic data, due to their
artificial nature, are not fit for processing operations affecting identified individuals (such as
profiling, or any legally binding decision), but more for general analyses and predictions.
The areas of application for synthetic data are already extensive and they are becoming
broader, in particular considering the need to train machine learning algorithms and artificial
intelligence systems with big volumes of data in the testing phase, before they become part of
services or of a productive process. Tabular synthetic data are numerical data that mirror real-
life data structured in tables. The meaning of data can range from health data to users’ web
behaviour or financial logs. One practical use case for tabular synthetic data is the scenario in
an enterprise in which true data cannot be circulated between departments, subsidiaries or
partners, due to internal policies or regulatory constraints, while their synthetized version might
enable predictive analyses.
18
DATA PROTECTION ENGINEERING
January 2022
5. ACCESS. COMMUNICATION
AND STORAGE
Following the judgment C-311/18 (Schrems II)9, which concerned the personal data transfer of
an EU citizen to the US, the EDPB published its recommendation [43] on measures that
supplement transfer tools to ensure compliance with the EU level of protection of personal data.
Under Use case 3 and the conditions mentioned thereto, end to end encryption, combined with
transport layer encryption, is considered as a mean to allow personal data transfers to non-EU
countries for specific scenarios.
8
Directive 2002/58/EC of the European Parliament and of the Council of 12 July 2002 concerning the processing of
personal data and the protection of privacy in the electronic communications sector (Directive on privacy and electronic
communications) , revised by Directive 2009/136/EC https://eur-lex.europa.eu/legal-
content/EN/TXT/?uri=CELEX:02009L0136-20201221
9
Judgment of 16 July 2020, Schrems, C-311/18, EU:C:2020:559,
https://curia.europa.eu/juris/document/document.jsf?text=&docid=228677&pageIndex=0&doclang=en&mode=lst&dir=&occ
=first&part=1&cid=40128973
1
Please use footnotes for providing additional or explanatory information and/or relevant links. References should be listed
in a dedicated section. Use only the function References/Insert Footnote
19
DATA PROTECTION ENGINEERING
January 2022
One possible model to protect metadata is the use of an onion routing network (e.g. Tor11) which
supports anonymous communication over public networks. In onion routing user traffic is routed
through a series of relay servers. [45], and each relay server receives layered encrypted data
without knowing neither the original sender nor the final recipient. Such information is available
only to the entry and exit node [46]. However, Tor is vulnerable to attackers who can observe
traffic going in the entry and out of exit nodes and correlate messages, as discussed in [47].
Privacy preserving
5.2 PRIVACY PRESERVING STORAGE storage protects
Privacy preserving storage has two goals: protecting the confidentiality of personal data at rest the confidentiality
and informing data controllers in case a breach occurs. Encryption is the main technique used
of personal data at
to protect the data confidentiality from unauthorized access. Depending on the constraints of
data controllers, it can be applied at three different levels: (i) storage-level, (ii) database-level rest and informs
and (iii) application-level encryption. data controllers in
case of a breach
Figure 2: Database Encryption Options
File system and disk level encryption mitigate the risks of an intruder getting physical access to
the disk storing the database. This approach has the advantage to be transparent for the users
of the database however it is an all-or-nothing approach because it is not possible to encrypt
only certain parts of the database and it is not possible to have a better granularity than at file
level. In this solution, there is only one encryption key that is managed by the system
administrators of the database. This key is held on the server hosting the database and must be
protected by the highest privileged access.
Encryption can also be done at the database level. This approach provides more flexibility than
the previous solution and can be applied at different granularities tables, entries or fields. It can
also be applied when some data fields/attributes are more sensitive than others (political or
religious belief for instance). However, because the encryption keys need to be stored with the
10
Proposal for a REGULATION OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL concerning the respect for
private life and the protection of personal data in electronic communications and repealing Directive 2002/58/EC
(Regulation on Privacy and Electronic Communications) https://eur-lex.europa.eu/legal-
content/EN/TXT/?uri=CELEX%3A52017PC0010
11
Tor Project https://www.torproject.org/
1
Please use footnotes for providing additional or explanatory information and/or relevant links. References should be listed
in a dedicated section. Use only the function References/Insert Footnote
20
DATA PROTECTION ENGINEERING
January 2022
database, an adversary who can connect to the server hosting the database can used forensics
tool like Volatility12 to recover the keys directly from the volatile memory.
In application-level encryption, all data are encrypted by the client with its own encryption keys
and then stored. However, if several entries of the database are to be shared by different
clients, the cryptographic keys need to be exchanged, which can jeopardize their security. It is
possible to avoid this issue if specific encryption schemes are used, e.g. homomorphic
encryption. The encryption keys do not need to be shared anymore as it is possible to perform
computations over the encrypted data.
With regards to notification, canaries are a well-known security mechanism which is used to
detect software attacks and buffer overflows. The concept of a canary can be transposed to
personal data protection. Injecting a canary into a database entails inserting fake values in it
which are not supposed to be used by anyone. Access of these values must be then monitored
in order to detect a data breach. It is also important to notice that these fake values must not be
distinguishable from the real ones. A possible implementation of such scheme will have a server
storing the database and a distinct server which handles the requests to the database. The
server handling the requests to the database must have the capability to identify requests for
canaries, thus detecting a possible attack or breach. This model is particularly suitable for data
controllers who want to use third party cloud-based storage. However, the data controller needs
to find a good balance between the number of real entries in the database and the number of
canaries (fake entries) and in any case, such techniques cannot be considered as panacea for
timely identification of data breaches.
Depending on the context and the needs, certain access control mechanism seems to be more
suitable than others. As also discussed in [48] , in a scenario where processing of customers’
personal data for marketing purposes takes place through an on-line cloud storage provider,
Discretionary Access Control (DAC) can be used for accessing data for a specific service
request such as a print and delivery service. Through DAC, an employee is able to specify
which data, for each, external to the organization, user, and what action(s) are permitted. DAC
provides users with advanced flexibility on setting up desired access control properties, however
on the negative side it relies heavily on user’s awareness and understanding of associated risk.
On the other hand, in a hospital information system, where each actor (doctor, nurse,
administrative personnel) is assigned to different roles with different privileges (e.g. a doctor can
access penitents’ medical data), Role Based Access Control (RBAC) seems to be more
appropriate.
12
https://www.volatilityfoundation.org
21
DATA PROTECTION ENGINEERING
January 2022
Zero-Knowledge proofs not only enforce confidentiality but also, compared to other
authentication schemes such as user name/password, they enforce the data minimization
principle. In password-based authentication scheme, a user sets a password and shares this
password with a server.
When the user wants to authenticate herself to the server, she needs to provide her password,
which is then compared to the one stored on the server. An adversary who wants to
impersonate a user can steal the password either from the user or from the server. In zero-
knowledge proofs authentication scheme, this risk is limited only to the user because the server
does not know the secret used by the user to authenticate. The technique minimizes the
amount of information that the server knows about the user and consequently reduces the
attack surface. Zero-knowledge proofs are also a building block for many secure multi party
computation protocols.
There are two variants of zero-knowledge proofs: interactive [51] and non-interactive [53].
Interactive zero-knowledge proofs require several communications between the user and the
server. Non-interactive zero-knowledge proofs do not require any communication at all. Non-
interactive zero-knowledge proofs are very popular in blockchain applications.
13
IRMA app https://irma.app/
14
https://decodeproject.eu/
15
ISO/IEC 9798-5:2009 Information technology — Security techniques — Entity authentication — Part 5: Mechanisms
using zero-knowledge techniques https://www.iso.org/standard/50456.html
22
DATA PROTECTION ENGINEERING
January 2022
6. TRANSPARENCY,
INTERVENABILITY AND USER
CONTROL TOOLS
A key element in any data protection concept is the enablement of human individuals to Providing accurate
exercise their data protection rights themselves. This involves both access to information on
data protection
data processing (transparency) and the ability to influence processing of their personal
information within the realm of a data controller or data processor (intervenability). In this
information is not
respect, a multitude of approaches and topics emerged from the privacy research community an easy task as
that can help implementing these rights and correlated services at data processing institutions. simplification of
In this chapter, we present a selection of the most relevant ones. information might
be necessary but
Transparency on data processing is not only demanded by the GDPR, but also necessary for
individuals to understand why their personal data is collected and how it is processed, e.g.
might also create
whether it is transferred to other parties. While system designers or data protection officers may misunderstanding
be able to understand and even demand detailed information about the data processing
systems and processes, most users are not able to grasp what is laid down in a technical
specification or legal documents and may even be overwhelmed when being presented with
basic information on personal data processing. (Article 13 & Article 14 GDPR).
The first recommendation from the Article 29 Data Protection Working Party on how to inform
online users about data protection issues stems from their recommendations published in 2004
[54] and stresses the possibility of a multi-layered approach, starting with essential information
and providing in additional layers – if desired from the user – further information. The layered
approach is particularly helpful for presentation on mobile devices where it would be
cumbersome, if not impossible at all to read a lengthy text with full information on processing of
personal data.
In 2017, the Article 29 Data Protection Working Party published guidelines on transparency [55]
which referred to the obligations laid down in the GDPR. In particular, the document explains
the meaning of the requirement of Article 12 (1) s. 1 GDPR: “The controller shall take
appropriate measures to provide any information referred to in Articles 13 and 14 and any
communication under Articles 15 to 22 and 34 relating to processing to the data subject in a
concise, transparent, intelligible and easily accessible form, using clear and plain language, in
particular for any information addressed specifically to a child.”
Providing accurate data protection information is not an easy task, because on the one hand
simplification may be necessary so that an average person can understand the information, but
on the other hand the simplification must not provoke misunderstanding. For natural language
information, several metrics have been proposed to measure the complexity and the
23
DATA PROTECTION ENGINEERING
January 2022
When designing the presentation of the privacy policy, the users’ potential devices have to be
considered, e.g. the size of the screen. Also, the information should be accessible and designed
in a way that is compliant with assistive technologies so that people with disabilities are not
excluded. In specific situations, textual information is not appropriate, e.g. in phone calls or in
some contexts in a smart home or a connected car. Also, the entire HCI communication should
be checked concerning the information on data processing given to the users. Checks and tests
for comprehensibility (and the absence of dark patterns) could involve the data protection
officers and people with usability knowledge.
It has to be noted that the provision of information is not limited to one basic document such as
the privacy policy, but ad-hoc personalised information can also be given during the actual
usage as it is designed in the human-computer interface (HCI). This kind of information can
influence users in making up their minds concerning data-protection relevant matters (e.g. which
data to post in social networks) or on giving or withdrawing consent.17 In the report titled
“Deceived by Design”, the Norwegian Consumer Protection Council has pointed out that the
HCI design of many applications and services is not neutral, but employs so-called “dark
patterns” that nudge users towards disclosing more data and making precipitate decisions on
data processing methods [57].
Some controllers have already introduced self-designed icons that are presented in combination
with their privacy policy. In the absence of a standardized icon set they may be appreciated by
users. However, as soon as standardized solutions for icons and machine-readability as
published, they should be used. As already stated, there are several open issues concerning
privacy policies such as a lack of definition of good practice and the lack of standards
concerning machine-readability or a defined icon set. Furthermore, new technology scenarios
16
E.g. LIX formula on readability, developed by Carl Hugo Björnsson in 1971: LIX(text) = TotalWords/Sentences +
(LongWords x 100)/TotalWords;
CFP (Content Function Ratio) formula on informativity: CFR(text) = AmountOfContentWordTags /
AmountOfFunctionWordTags;
HIX (Hohenheimer Verständlichkeitsindex) formula on comprehensibility, developed by University of Hohenheim, based on
Amstad formula, 1. Neue Wiener Sachtext-Formel, SMOG-Index and LIX, https://klartext.uni-hohenheim.de/hix
17
Since consent means “freely given, specific, informed and unambiguous indication of the data subject's wishes by which
he or she, by a statement or by a clear affirmative action, signifies agreement to the processing of personal data relating to
him or her;” (Art. 4 No. 11 GDPR), it requires sufficient information.
24
DATA PROTECTION ENGINEERING
January 2022
have not been well reflected, yet, e.g. sensor technologies with restricted or missing user
interfaces or complex or dynamic and thereby hard to understand data processing systems that
may encompass several controllers or steadily changing IT systems.
The most prominent proposal in this respect is the work on “sticky policies” [62] where policies
are “stuck” to the data and also travel with them in case of data transfer. Cryptographic methods
are being used to prevent recipients from ignoring the attached policies. However, all kinds of
policies do not fully exclude the possibility of misuse of personal data. Currently, there are no
standardized solutions of such machine-readable policies that also control data processing
operations.
Following the idea of expressing privacy-relevant wishes or demands by the user in machine
readable format, several languages and protocols were developed – mainly in research projects
and prototypes.
While these languages and protocols tend to be comprehensive and often complex approaches
with many features, for practical applications a more simplistic approach seemed to be
expedient. A prominent example was the “Do not track”-Standard (DNT)18 where users could
express via an HTTP header field if they didn’t want to be tracked. “DNT = 1” means “This user
prefers not to be tracked on this request.”, while “DNT = 0” stands for “This user prefers to allow
tracking on this request.” A third possibility would be to refrain from sending a DNT header 19
because the user has not enabled this function. A deficiency of the DNT standard was the lack
of a supporting legislation: If the question of “tracking” or “non-tracking” can only be expressed
18
https://www.w3.org/2011/tracking-protection/
19
https://www.w3.org/TR/tracking-dnt/
25
DATA PROTECTION ENGINEERING
January 2022
has a “preference” instead of a clear demand and if only “polite servers” react accordingly, this
won’t help to achieve reliability and clarity for users or service providers.
A follow-up standard is called “Global Privacy Control” (GPC)20. It enables users to send a “do-
not-sell-or-share” signal via their browser to a website in which the user is requesting that their
data not be sold to or shared with any party other than the one the user intends to interact with,
except as permitted by law. Since mid-2021 the GPC signal is regulated in the adapted
California Consumer Privacy Act (CCPA)21. Users who want to express a “do-not-sell-or-share”
signal can use one of the supported browsers or extensions. Under the European data
protection regime, service providers are currently not forced to implement specific protocols that
interpret users’ privacy preference signals,
In the era of Internet of Things (IOT), the role of machine-readable policies as well as privacy
preference signals will become more significant. Providers of websites or web services should
support standardised privacy preference signals and take into account and respect demands
expressed by the users when deciding on processing of personal data. However, it has to be
noted that Article 25 (2) of the GDPR requires data protection by default without the necessity of
users to explicitly state if they don’t agree with processing of their personal data such as
profiling, sharing or selling: The controller has to ensure “that, by default, only personal data
which are necessary for each specific purpose of the processing are processed. That obligation
applies to the amount of personal data collected, the extent of their processing, the period of
their storage and their accessibility.”
If users preferences, expressed in such a standardised technical way, cannot be fulfilled, for
transparency reasons, controllers should inform them (e.g. in their privacy policy) why this is the
case. For instance, in some countries there may be laws that require longer retention periods
that the users would expect. In addition to standardised privacy preference signals, users’
browsers may contain privacy-tools as add-ons or have a specific configuration concerning
tracking, data minimization of identifiers, or script blockers. In case a restrictive configuration
may prevent correct functioning of a website or web service can be employed, the providers
should inform users about potential limitations and offer at least basic functionalities to those
privacy-aware users. Effectively this means to respect privacy demands by users, no matter
whether they use one of the upcoming standards of privacy preference signals or other tools.
Today’s existing privacy dashboards are usually provided by controllers who also decide to
what extent information is being presented there, how much explanation is given e.g. on
potential risks and what options can be employed by the user to adapt settings or change or
delete personal data. Users cannot expect to be informed on all kinds of data disclosures, e.g. if
20
https://globalprivacycontrol.org/
21
https://oag.ca.gov/privacy/ccpa#collapse7b
26
DATA PROTECTION ENGINEERING
January 2022
law enforcement or public authorities demand (in their jurisdiction lawful) access to the user’s
personal data, but prohibit the notification of the user. Also, information on data breaches may
be excluded from the presentation. In case the data processing of the controller encompasses
profiling, the privacy dashboard should clarify which personal data are being used for the user’s
profile and at best which information is being derived from the aggregated personal data.
Controllers should consider the implementation of usable privacy dashboards for data subjects
that fulfil the requirements of the GDPR. As discussed in Section 5.3, a reliable authentication of
the user is necessary to prevent the disclosure of personal data to unauthorised persons. If
user-side privacy dashboards become distributed, controllers should check whether these tools
could be supported as transparency-enhancing technologies.
In the following sections two types of privacy dashboards are described: services-side privacy
dashboards and user-side privacy dashboards.
Privacy dashboards are also known as a functionality of citizen portals for services of the public
sector. Probably the first country to provide a tool that presents an overview of personal data
and who has accessed them is Estonia: The RIHA system23 shows for each public data base
and governmental information system which personal data are being stored for which purpose
and who can access it. The Estonian citizens can see which officials have viewed their personal
data. This information is gathered from access log files. Citizens can monitor the access
activities which must not happen without a justified reason, as regulated by national law [66].
This kind of privacy dashboards would profit from standardised machine-readable policies and
privacy preference signals so the shown overview of personal data processing could be based
on reliable information as provided by a controller. It is to be expected that the standardisation
22
Google account: myaccount.google.com
23
Riigi infosüsteemi haldussüsteem: https://www.riha.ee/
27
DATA PROTECTION ENGINEERING
January 2022
Privacy dashboards can be the means not only for presenting an overview of relevant
information on processing of personal data but they can also offer functionality for changing
privacy settings or for exercising data subjects’ rights.
If different customers have agreed to different versions of terms of use at different times, their
legal basis for data processing may differ. Consent cannot be considered to be automatically
granted for every change made to data processing. It is therefore necessary to explicitly ask
existing customers to review the updated terms of use and provide their consent once again.
Depending on the type and implementation of web service in consideration, this may become
an issue of its own.
Either way, in a realistic scenario, it is inevitable to allow for different customers using the same
service under different terms of use – and thereby consent coverage. Hence, it becomes
inevitable as well to keep track of these differences, i.e., to record which customer operates
under which data processing consent. This is commonly referred to as a central aspect of
consent management.
• Users tend to click the button without reading and understanding the document
(“consent fatigue” [68] & [69])
• Users with disabilities cannot read or understand the document
• Browser display issues may hinder users from reading the document
28
DATA PROTECTION ENGINEERING
January 2022
• Services that cannot be operated via browser cannot utilize this approach
• Services on embedded computers (e.g. cars, IoT devices) may not have a screen to
show the terms of use document
• Services on embedded computers may not have a button nor other input device to
express agreement
Nevertheless, consent gathering is required even in such circumstances, and the expression of
consent must be given and documented validly, in order to be utilized as a legal basis for data
processing.
Here, multiple approaches for consent management have evolved, for documenting a one-time
consent to a medical operation or treatment, but they are mostly based on a large legal text with
a hand-written signature of the patient below. Digital equivalents utilize electronic documents,
authentication tokens such as personal ID cards, and technologies like blockchain for
permanently storing exact versions of consent documents along with the expression of consent
of the users [72]. Here, we have a strong set of security requirements for gathering and
documenting these electronic expressions of consent, e.g. with respect to integrity and
availability of the consent forms.
Unlike in the medical scenario, Internet services have slightly different needs with respect to
consent gathering:
On the plus side, utilizing consent management systems reduces management efforts of the
data controller and the processors. Once the system is up and running, the task of consent
gathering is mostly automated, leaving the costly and scarce expert human resources free to
focus on other tasks, such as determining whether a new consent must be gathered or not. A
second clear advantage is the ability to integrate the consent management system with other
management tools, such as CRM systems, audit and certification support, or legal affairs. Here,
depending on the amount of integration, a huge potential for minimization of efforts exists, as
the alternative would be to implement manual management procedures, binding costly human
expertise.
On the downside, the efforts of implementing and integrating such a consent management
system can be substantial. Depending on the degree of integration, the ramp-up of installing
such a system may require extensive resources, and may pay off only partially later-on. The
more integration is attempted, the higher the initial costs, but also the higher the resulting
29
DATA PROTECTION ENGINEERING
January 2022
savings in the long term. This (common) imbalance may keep smaller enterprises from
integrating such systems at all.
Such a right of access service would provide an interface for data subjects whose data is
processed by the organization in consideration. When triggered, the service automatically
iterates over the data stores within the organization, collecting all personal data concerning the
demanding individual, and delivering provides to the data subject the complete set of data
collected this way towards the data subject. Ideally, the whole task is automated, so that no (or
negligible) manual interaction at the side of the organization is required.
At the same time connecting to the right of access service each new data storage, data sink or
an additional data processor that gets an individual’s data may also improve data management
capabilities of the organization as a whole. Requests concerning data locations, data forwards,
business partners involved in data processing, etc. can all be answered rather straight-forwardly
just from the existing data flow infrastructures created and maintained for the right of access
service.
• Authorization: A human individual is only allowed to see and investigate its own
personal data, not that of other data subjects. Hence, there must be some (technical
30
DATA PROTECTION ENGINEERING
January 2022
• Risk of Data Breach: Disclosing the full set of personal data of one data subject to
another data subject without valid authorization is equivalent to a severe data breach,
which itself manifests a violation of the GDPR. At the same time, there may be a
substantial interest in such right of access services by other actors than the concerned
data subject, such as hackers, media, law enforcement, or relatives. Hence, the
security risk for operating such a service is not negligible.
• Correctness: Similar to completeness, the data disclosed to the data subject must be
correct, hence many not contain abbreviations, aggregations, internal censorship, or
other access-blocking means. Also, its integrity must be maintained when delivered to
the requesting individual. Hence, the implementation of such a right of access service
must utilize sound technical means to guarantee correctness and integrity of the data
contained in the response to the right of access demand.
• Volume: Personal profiles of active data subjects typically grow in size over the time of
utilization of a service. Hence, the size of the response to a right of access request can
easily grow into huge amounts of data. This causes a technical challenge of delivering
the data to the requesting individual by reasonable means. E.g. the maximum size of
allowed e-mail attachment can be easily reached, rendering an information mail as
response to a right of access request infeasible. Print-outs are not just environmentally
problematic but also do not fulfil the common requirements concerning right of access
responses nor the demand for data portability as defined in Art. 20 GDPR. Common
solutions consist in web driven downloads of compressed data archives via HTTP(S)
or FTP(S), which also have some technical challenges for low-bandwidth areas.
31
DATA PROTECTION ENGINEERING
January 2022
For the data subject, the advantage of such an infrastructure is that a single right of access
request suffices to get a full view on the whole data processing activity, across all sub-processor
borders. For data processors, the advantage of such data request infrastructure is its
compositionality: details on the exact network of sub-processors, suppliers, service providers
etc. can be easily hidden or masked in the single response sent back to the requesting
predecessor in the processing tree. This way, the exact identity of an organization’s business
partners may be hidden from the previous processors, if considered a business secret.
Nevertheless, the data collected for the original right of access request still is complete and
sufficient for the data subject’s needs.
The obvious drawback of this approach is again its implementation efforts and complexity: the
delegated right of access services must be implemented and operated. Any requests related to
the recursive nature may require more computational resources and longer execution times
than a normal right of access response.
32
DATA PROTECTION ENGINEERING
January 2022
Alternatively, a data custodian organization can itself also provide the service of data collection
to its data subjects. Unlike the recursive approach, in this case, the task of individually
identifying and demanding right of access responses from all sub-processors in the processing
network is performed iteratively by the data custodian, on behalf and by request of the data
subject. Once the collection is completed, the resulting aggregated right of access response is
then returned back to the demanding data subject. In this case, the advantage of getting a full
picture on the processing at the data subject remains evident, whereas the data controllers and
processors loose some control over what exactly is contained in such an aggregated right of
access request.
33
DATA PROTECTION ENGINEERING
January 2022
7. CONCLUSIONS
Data protection principles, as set out in Article 5 of the GDPR and elaborated in terms of
measures and safeguards in Article 25, are the goals that should be achieved when considering
the design, implementation and deployment of a processing operation. From the technical side,
the challenge is to translate these principles into tangible requirements and specifications by
requirements by selecting, implementing and configuring appropriate technical and
organizational measures and techniques over the complete lifecycle of the envisaged data
processing. Engineering Data Protection into practise is not that straightforward though;
depending on the level of risk, the context of the processing operation, the purposes of
processing, the types, scope and volumes of personal data, the means and scale for
processing, the state of the art, the cost, the translation into actionable requirements calls for a
multidisciplinary approach. In addition, the evolving technological landscape and emerging
technologies should also be taken into account as new challenges emerge such as lack of
control and transparency, possible reusability or purpose “creep” with use of data, data
inference and re-identification, profiling and automated decision making. The implementation of
data protection principles in such contexts is challenging as they cannot be implemented in the
traditional, “intuitive” way. Appropriate safeguards, both technical and organizational, must be
integrated into the processing from the very early steps, as dictated by the Data protection by
design obligation, and indeed the design process and related decision making needs to be
underpinned by this obligation also.
This report attempted to provide a short overview of existing (security) technologies and
techniques that can support the fulfilment of data protection principles and discuss possible
strengths and their possible applicability in different processing operations. The remaining of
this section presents the main conclusions to this end, together with specific recommendations
for relevant stakeholders.
Regulators (e.g. Data Protection Authorities and the European Data Protection Board),
the European Commission and the relevant EU institutions should disseminate the
benefits of such technologies and techniques and provide guidance on their applicability
and deployment.
Initiatives aimed to support engineers, such as the Internet Privacy Engineering Network
(IPEN)24, should be further supported by practitioners, researchers and academia.
24
https://edps.europa.eu/data-protection/ipen-internet-privacy-engineering-network_en
34
DATA PROTECTION ENGINEERING
January 2022
Regulators (e.g. Data Protection Authorities and the European Data Protection Board)
should discuss and promote good practices across the EU in relation to state-of-the-art
solutions of relevant technologies and techniques. EU Institutions could promote such
good practices by relevant publicly available documents.
Regulators (e.g. Data Protection Authorities and the European Data Protection Board)
and the European Commission should promote the establishment of relevant
certification schemes, under Article 42 GDPR, to ensure proper engineering of data
protection.
Regulators (e.g. Data Protection Authorities and the European Data Protection Board)
should ensure that regulatory approaches, e.g. as regards new technologies and
application sectors, take into account all possible entities and roles from the standpoint
of data protection, while remaining technologically neutral.
35
DATA PROTECTION ENGINEERING
January 2022
8. REFERENCES
[2] EDPB, "Guidelines 4/2019 on Article 25 Data Protection by Design and by Default," 2019.
[4] M. Hansen, M. Jensen and M. Rost, "Protection Goals for Privacy Engineering," in 2015
IEEE Security and Privacy Workshops, 2015.
[5] M. Colesky, J. H. Hoepman and C. Hillen, "A Critical Analysis of Privacy Design
Strategies," in 2016 IEEE Security and Privacy Workshops (SPW), 2016.
[6] European Data Protection Supervisor, "Opinion 5/2018 Preliminary Opinion on privacy by
design," 2018.
[7] ENISA, "Readiness Analysis for the Adoption and Evolution of Privacy Enhancing
Technologies," 2016.
[8] ENISA, "PETs controls matrix - A systematic approach for assessing online and mobile
privacy tools," 2016.
[9] Agencia Española de Protección de Datos (AEPD), "A Guide to Privacy by Design," 2019.
[13] ENISA, "Data Pseudonymisation: Advanced Techniques and Use Cases," 2021.
[15] O. Abul, F. Bonchi and M. Nanni, "Never Walk Alone: Uncertainty for Anonymity in Moving
Objects Databases," in IEEE 24th International Conference on Data Engineering (ICDE
08), 2008.
36
DATA PROTECTION ENGINEERING
January 2022
[17] R. Chen, G. Acs and C. Castelluccia, "Differentially private sequential data publication via
variable-length n-grams," in 2012 ACM conference on Computer and communications
security, 2012.
[21] B. Fung, K. Wang, R. Chen and P. Yu, "Privacy-preserving data publishing: A survey of
recent developments," ACM Computing Surveys, vol. 42, no. 4, pp. 1-53, 2010.
[26] A. Meyerson and R. Williams, "On the complexity of optimal K-anonymity," in 23rd ACM
SIGMOD-SIGACT-SIGART symposium on Principles of database systems, 2004.
[27] K. E. Emam and F. K. Dankar, "Protecting privacy using k-anonymity," Journal of the
American Medical Informatics Association, vol. 15, no. 5, p. 627–637.
[28] C. Dwork and A. Roth, "The Algorithmic Foundationsof Differential Privacy," Foundations
and Trends in Theoretical Computer Science, 2014.
[29] C. Dwork, N. Kohli and D. Mulligan, "Differential Privacy in Practice: Expose your
Epsilons!," Journal of Privacy and Confidentiality, vol. 9, no. 2, 2019.
[31] G. Ács and C. Castelluccia, "I Have a DREAM! (DiffeRentially privatE smArt Metering)," in
International Workshop on Information Hiding (IH 2011), 2011.
[32] C. Clifton and T. Tassa, "On syntactic anonymity and differential privacy," in IEEE 29th
International Conference on Data Engineering Workshops (ICDEW), 2013.
37
DATA PROTECTION ENGINEERING
January 2022
[33] G. Acs and C. Castelluccia, "A case study: privacy preserving release of spatio-temporal
density in Paris," in 20th ACM SIGKDD international conference on Knowledge discovery
and data mining, 2014.
[34] M. A.Will and R. Ko, "A guide to homomorphic encryption," in The Cloud Security
Ecosystem, Syngress, 2015, p. 101127.
[36] L. Lamport, R. Shostak and M. Pease, "The Byzantine Generals Problem," SRI
International, 1982.
[38] D. Asonov, "Private Information Retrieval – An Overview and Current Trends," 2011.
[40] W. Gasarch, "A Survey on Private Information Retrieval," Bulletin of the EATCS, vol. 82,
pp. 72-107, 2004.
[42] N. Unger, S. Dechand, J. Bonneau, S. Fahl, H. Perl, I. Goldberg and M. Smith, "SoK:
Secure Messaging," in 2015 IEEE Symposium on Security and Privacy, 2015.
[44] European Data Protection Board, "Statement of the EDPB on the revision of the ePrivacy
Regulation and its impact on the protection of individuals with regard to the privacy and
confidentiality of their communications," 2018.
[45] Y. Gilad, "Metadata-Private Communication for the 99%," Communications of the ACM,
vol. 62, no. 9, pp. 86-93, 2019.
[46] M. Reed, P. Syverson and D. Goldschlag, "Anonymous connections and onion routing,"
IEEE Journal on Selected Areas in Communications, vol. 16, no. 4, pp. 482 - 494, 1998.
[47] M. Reed, P. Syverson and D. Goldschlag, "Anonymous connections and onion routing,"
IEEE Journal on Selected Areas in Communications, vol. 16, no. 4, pp. 482-494, 1998.
38
DATA PROTECTION ENGINEERING
January 2022
[48] ENISA, "Reinforcing trust and security in the area of electronic communications and online
services: Sketching the notion of “state-of-the-art” for SMEs in security of personal data
processing," 2019.
[51] S. Goldwasser, S. Micali and C. Rackoff, "The knowledge complexity of interactive proof-
systems," in Seventeenth annual ACM symposium on Theory of Computing (STOC 85),
1985.
[53] M. Blum, P. Feldman and S. Micali, "Non-interactive zero-knowledge and its applications,"
in Twentieth annual ACM symposium on Theory of computing (STOC 88), 1988.
[54] ARTICLE 29 Data Protection Working Party, "Opinion 10/2004 on More Harmonised
Information Provisions," 2004.
[56] Information Commissioner’s Office (ICO), "Age appropriate design: a code of practice for
online services," [Online]. Available: https://ico.org.uk/for-organisations/guide-to-data-
protection/ico-codes-of-practice/age-appropriate-design-a-code-of-practice-for-online-
services/.
[58] L. E.Holtz, K. Nocun and M. Hansen, "Towards Displaying Privacy Information with Icons,"
in Privacy and Identity 2010: Privacy and Identity Management for Life, 2010.
[59] L. Edwards and W. Abel, "The Use of Privacy Icons and Standard Contract Terms for
Generating Consumer Trust and Confidence in Digital Services," 2014.
[60] P. Balboni and K. Francis, "Maastricht University Data Protection as a Corporate Social
Responsibility (UM DPCSR) Research Project: UM DPCSR Icons Version 1.0," 2020.
[61] H. Habib, Y. Zou, Y. Yao, A. Acquisti, L. Cranor, J. Reidenberg, N. Sadeh and F. Schaub,
"Toggles, Dollar Signs, and Triangles: How to (In)Effectively Convey Privacy Choices with
Icons and Link Texts," in 2021 CHI Conference on Human Factors in Computing Systems,
2021.
39
DATA PROTECTION ENGINEERING
January 2022
[62] M. C.-M. S. Pearson, "Sticky Policies: An Approach for Managing Privacy across Multiple
Parties," Computer, vol. 44, no. 9, pp. 60-68, 2011.
[63] M. Hils, D. W. Woods and R. Böhme, "Privacy Preference Signals: Past, Present and
Future," Proceedings on Privacy Enhancing Technologies, vol. 2021, no. 4, pp. 249-269,
2021.
[64] P3P, "THE PLATFORM FOR PRIVACY PREFERENCES 1.1 (P3P1.1)," [Online].
Available: https://www.w3.org/standards/history/P3P11.
[65] W3C, "A P3P Preference Exchange Language 1.0 (APPEL1.0)," [Online]. Available:
https://www.w3.org/TR/2002/WD-P3P-preferences-20020415/.
[67] S. Fischer-Hübner, J. Angulo, F. Karegar and T. Pulls, "Transparency, Privacy and Trust–
Technology for Tracking and Controlling My Data Disclosures: Does This Work?," in Trust
Management X: 10th IFIP WG 11.11 International Conference, IFIPTM 2016, 2016.
[68] B. W. Schermer, B. Custers and S. v. d. Hof, "The crisis of consent: how stronger legal
protection may lead to weaker consent in data protection," Ethics and Information
Technology, vol. 16, pp. 171-184, 2014.
[71] M. R. Asghar, T. Lee, M. M. Baig, E. Ullah, G. Russello and G. Dobbie, "A Review of
Privacy and Consent Management in Healthcare: A Focus on Emerging Data Sources," in
2017 IEEE 13th International Conference on e-Science (e-Science), 2017.
[73] ENISA, "Security guidelines on the appropriate use of qualified electronic signatures,"
2017.
[74] J. Tolsdorf, M. Fischer and L. L. Iacono, "A Case Study on the Implementation of the Right
of Access in Privacy Dashboards," in Annual Privacy Forum 2021, 2021.
40
TP-01-22-045-EN-N
ABOUT ENIS A
The European Union Agency for Cybersecurity, ENISA, is the Union’s agency dedicated to
achieving a high common level of cybersecurity across Europe. Established in 2004 and
strengthened by the EU Cybersecurity Act, the European Union Agency for Cybersecurity
contributes to EU cyber policy, enhances the trustworthiness of ICT products, services and
processes with cybersecurity certification schemes, cooperates with Member States and EU
bodies, and helps Europe prepare for the cyber challenges of tomorrow. Through knowledge
sharing, capacity building and awareness raising, the Agency works together with its key
stakeholders to strengthen trust in the connected economy, to boost resilience of the
Union’s infrastructure, and, ultimately, to keep Europe’s society and citizens digitally secure.
More information about ENISA and its work can be found here: www.enisa.europa.eu.
ISBN: 978-92-9204-556-2
DOI: 10.2824/09079