Background

Genome-wide association study (GWAS) compares genetic variants, single-nucleotide polymorphisms (SNP), to see if these variants are associated with a particular trait. The model used in GWAS is essentially logistic regression, evaluated one SNP at a time, corrected with covariates like age, height and weight. The number of SNPs analyzed can easily grow up to 30 million. It is estimated that it can take around 6 hours for 6000 samples and 2.5 million SNPs [1].

Some suggest that cloud computing could offer a cost-effective and scalable alternative that allows research to be done, given the exponential growth of genomic data and increasing computational complexity of genomic data analysis. However, privacy and security are primary concerns when considering these cloud-based solutions.

It was shown in 2004 by Lin et al. [2] that as little as 30 to 80 SNPs could identify an individual uniquely. Homer et al. [3] further demonstrated that even when DNA samples are mixed among 1000 other samples, individuals could be identified. In light of these discoveries, regulations concerning biological data are being updated [4]. The privacy and security of DNA-related data are now more important than ever.

Homomorphic Encryption (HE) is a form of encryption where functions, f, can be evaluated on encrypted data x1,…,xn, yielding ciphertexts that decrypt to f(x1,…,xn). Putting it in the context of GWAS, genomic data can be homomorphically encrypted and sent to a computational server. The server then performs the GWAS computations on the encrypted data, before sending the encrypted outcome to the data owner for decryption. We argue that this would ensure the privacy and security of genomic data: Throughout the entire process, there is no instance where the server can access the data in its raw, unencrypted form, preserving the privacy of the data. Additionally, since the data is encrypted, no adversary would be able to make sense of the ciphertexts. The data is thus secured on the computational server.

Motivated by these concerns, the iDASH Privacy & Security Workshop [5] has organized several competitions on secure genomics analysis since 2011. The aim of these competitions is to evaluate methods that provide data confidentiality during analysis in a cloud environment.

In this work, we provide a solution to Track 2 of the iDASH 2018 competition – Secure Parallel Genome-Wide Association Studies using Homomorphic Encryption. The challenge of this task was to implement the semi-parallel GWAS algorithm proposed by Sikorska et al. [6], which outperforms prior methods by about 24 times, with HE. This task seeks to advance the practical boundaries of HE, a continuation from last year’s HE task which was to implement logistic regression with HE.

We propose a modification of the algorithm by Sikorska et al. [6] for homomorphically encrypted matrices. We developed a caching system to minimize memory utilization while maximizing the use of available computational resources. Our solution also leverages on the complex space of the CKKS encoding to store the SNP matrix and this halved the computation time needed by doubling the number of SNPs processed each time.

Within the constraints of the competition, including a virtual machine with 16GB of memory and 200GB disk space, a security level of at least 128 bits and at most 24 hours of runtime, our solution reported a total computation time of 717.20 minutes. Our best implementation using a more efficient HE scheme, which was not available during the competition, achieved a runtime of 24.70 minutes.

In the following section, we will first define some notations used in this paper. We will begin by describing the CKKS homomorphic encryption scheme that was used to implement the GWAS algorithm. We describe our methods for manipulating homomorphic matrices that are crucial to our solution. We start with our implementation of logistic regression with HE. Following that, we adapted the GWAS algorithm using suitable approximations to simplify the computations for HE, while preserving the accuracy of the model. We also detail some optimizations that were used to accelerate the runtime. Finally, we present our results and provide some discussion about our results.

Notation

Notation for HE

Let N be a power-of-two integer and \(\mathcal {R} = \mathbb {Z}[x]/\langle x^{N}+1 \rangle \). For some integer , denote \(\mathcal {R}_{\ell } = \mathcal {R}/{2^{\ell }}\mathcal {R} = \mathbb {Z}_{2^{\ell }}[x]/\langle x^{N}+1 \rangle \). We let λ be the security parameter where attacks on the cryptosystem require approximately Ω(2λ) bit operations. We use \(z \leftarrow \mathcal {D}(Z)\) to represent sampling z from a distribution \(\mathcal {D}\) over some set Z. Let \(\mathcal {U}\) denote the uniform distribution and \(\mathcal {DG}(\sigma ^{2})\) denote the discrete Gaussian distribution with variance σ2.

Notation for GWAS

The number of samples, covariates and SNPs are denoted as n,d and k respectively. Matrices are denoted in bold font uppercase letters. Let the covariates matrix be denoted as X and the SNP matrix as S. The rows of X or S represent the covariates or SNPs from one sample respectively. We denote the rows as xi. Vectors are denoted in bold font lowercase letters. Let the response vector be denoted as y. The vector of weights from the logistic model is denoted as β and the corresponding vector of probabilities is denoted as p. The vector of SNP effects is denoted as s. The transpose of a vector v is denoted as \(\mathbf {v}^{\intercal }\). We let ⌈·⌉ denote rounding up to the nearest power-of-two.

Methods

Homomorphic encryption

HE was first proposed by Rivest et al. [7] more than 40 years ago while the first construction was proposed by Gentry [8] only a decade ago. For this work, we adopt the HE scheme proposed by Cheon et al. [9], referred to as CKKS, which enables computation over encrypted approximate numbers. As GWAS is a statistical function, the CKKS HE scheme is the prime candidate for efficient arithmetic.

Most HE schemes are based on “noisy” encryptions, which applies some “small” noise to mask messages in the encryption process. For HE, a noise budget is determined when the scheme is initialized and computing on ciphertexts depletes this pre-allocated budget. Once the noise budget is expended, decryption would return incorrect results. The CKKS scheme [9] treats encrypted numbers as having some initial precision, with the masking noise just smaller than the precision. However, subsequent operations on ciphertexts increase the size of noises and reduce the precision of the messages encrypted within. Thus, decrypted results are approximations of their true value.

The noise budget for the CKKS scheme is initialized with the parameter L. For every multiplication, the noise budget is subtracted by the integer p. The noise budget for a given ciphertext is denoted as . When the message is just encrypted, =L. When <p, the noise budget is said to be depleted.

We provide a brief description of the CKKS scheme and highly encourage interested readers to refer to [9] for the full details.

  • KeyGen(1λ):

    Let 2L be the initial ciphertext modulus. Let \(\mathcal {HWT}(h)\) denote the distribution that chooses a polynomial uniformly from \(\mathcal {R}_{2^{L}}\), under the condition that it has exactly h nonzero coefficients. Sample a secret \(s \leftarrow \mathcal {HWT}(h)\), random \(a \leftarrow \mathcal {U}(\mathcal {R}_{2^{L}})\) and error \(e \leftarrow \mathcal {DG}(\sigma ^{2})\). Set the secret key as sk←(1,s), public key as \(pk \leftarrow (b, a) \in \mathcal {R}_{L}^{2}\) where b=−a·s+e (mod L). Finally, sample \(a' \leftarrow \mathcal {U}(\mathcal {R}_{2^{L}})\), \(e' \leftarrow \mathcal {DG}(\sigma ^{2})\) and set the evaluation key evk←(b,a), where b=−a·s+e+L·s2 (mod 22L).

  • Encrypt(pk,m):

    For \(m \in \mathcal {R}\), sample \(v \leftarrow \mathcal {U}(\mathcal {R}_{2^{L}})\) and \(e_{0},e_{1} \leftarrow \mathcal {DG}(\sigma ^{2})\). Let v·pk+(m+e0,e1) (mod 2L) and output (v,L).

  • Decrypt(sk,ct):

    For \(ct = ((c_{0}, c_{1}), \ell) \in \mathcal {R}_{\ell }^{2}\), output c0+c1·s (mod 2)

  • Add(ct1,ct2):

    For ct1=((c0,1,c1,1),),ct2=((c0,2,c1,2),), compute \( (c^{\prime }_{0}, c^{\prime }_{1})\leftarrow = (c_{0,1}, c_{1,1}) + (c_{0,2}, c_{1,2}) \pmod {2^{\ell }}\) and output \((c^{\prime }_{0}, c^{\prime }_{1}), \ell)\).

  • Mult(ct1,ct2):

    For ciphertexts ct1=((c0,1,c1,1),) and ct2=((c0,2,c1,2),), let (d0,d1,d2)=(c0,1c0,2, c1,1c0,2+c0,1c1,2,c1,1c1,2) (mod 2). Compute \((c^{\prime }_{0}, c^{\prime }_{1}) \leftarrow (d_{0}, d_{1}) + \lfloor 2^{-L} \cdot d_{2} \cdot evk \pmod {2^{\ell }} \rceil \) and output \((c^{\prime }_{0}, c^{\prime }_{1}), \ell)\).

  • Rescale(ct,p):

    For a ciphertext ct=((c0,c1),) and an integer p, output \(\left (\left (c^{\prime }_{0}, c^{\prime }_{1}\right), \ell -p\right)\), where \((c^{\prime }_{0}, c^{\prime }_{1}) \leftarrow \lfloor 2^{-p} \cdot (c_{0},c_{1}) \rceil \pmod {2^{\ell -p}} \rceil \).

With the CKKS scheme, we are able to encode N/2 complex numbers into a single element in its message spaces, \(\mathcal {R}\). This allows us to view a ciphertext as an encrypted array of fixed point numbers. Let \(\phi : \mathbb {C}^{N/2} \rightarrow \mathcal {R}\),

  • Encode(\(z_{1}, z_{2}, \dots, z_{N/2}\)):

    Output \(m = \phi (z_{1}, z_{2}, \dots, z_{N/2})\).

  • Decode(m):

    Output \((z_{1}, z_{2}, \dots, z_{N/2}) = \phi ^{-1}(m)\).

Informally, ϕ(·) maps (z1,…,zN/2) to the vector \((\zeta _{j})_{j \in \mathbb {Z}^{\ast }_{N}}\), where ζj=⌊zj⌉ and \(\zeta _{N-j} = \lfloor \overline {z_{j}}\rceil \) for 1≤jN/2. This (ζj) is then mapped to an element of \(\mathcal {R}\) with the inverse of the canonical embedding map. ϕ−1(·) is straightforward, an element in \(\mathcal {R}\) is mapped to a N-dimensional complex vector with the complex canonical embedding map and then the relevant entries of the vector is taken to be the vector of messages.

The ability to encode multiple numbers into one ciphertext allows us to reduce the number of ciphertexts used and compute more efficiently. We refer to each number encoded as a slot of the ciphertext. This offers a SIMD-like structure where the same computation on all numbers within a ciphertext can be done simultaneously. This means that adding or multiplying two ciphertexts together would be equivalent to adding or multiplying each slot simultaneously.

The ciphertext of the CKKS scheme can also be transformed into another ciphertext whose slots are a permutation of the original ciphertext.

  • Rotate(ct,r): Outputs ct whose slots are rotated to the right by r positions.

Homomorphic matrix operations

In this section, we describe our method of encoding matrices with HE. The batching property of the CKKS scheme allows us to treat ciphertexts as encrypted arrays. With this, we propose 4 methods of encoding a matrix with ciphertexts.

Column-Packed (CP) matrices.

This is our primary method of encoding a matrix. We encrypt each column of a matrix in one ciphertext and therefore a matrix will be represented by a vector of ciphertexts. This method of encoding a matrix was suggested by Halevi and Shoup in [10].

We require a function, Replicate that takes a vector ν of size n and returns vectors ν1, ν2, \(\dots \), νn where νi for \(i = 1, \dots, n\), is ν[i] in all positions. This is shown in Fig. 1. We describe in Algorithm 1, a naive version of Replicate. The reader is advised to refer to [10] for details on implementing a faster and recursive variant.

Fig. 1
figure 1

Replicate

We first define matrix-vector multiplication between a CP matrix and a vector in Algorithm 2. First, we invoke Replicate on the vector. Next, we multiply each column in the left-hand side matrix with its corresponding νi. Finally, sum up all ciphertexts and this will give the matrix-vector product.

Matrix multiplication between CP matrices is defined as an iterative process over CP-MatVecMult between the left-hand side matrix and the columns of the right-hand side matrix. This is described in Algorithm 3.

Column-Compact-Packed (CCP) matrices.

In the case where the entries of a matrix can fit within a single vector, we concatenate its columns and encrypt that in one ciphertext. For this type of matrix, we are mainly concerned with the function colSum which returns a vector whose entries are the sum of each column. We present the pseudocode in Algorithm 4. This is achieved by a series of rotations and additions. However, we do not rotate for all slots of the vector, but rather log2(colSize), where colSize is the number of rows in the CCP matrix. We note here that the final sums are stored in every colSize slots, starting from the first slot.

Row-Packed (RP) matrices.

For this encoding, we encrypt rows of a matrix into a ciphertext, representing them with a vector of ciphertexts just like CP matrices. In this work, we only consider matrix-vector multiplication between an RP matrix and a vector. Multiplication of an RP matrix by a CP matrix is a lot like naive matrix multiplication.

To compute the multiplication of an RP matrix with a vector, we define the dot product between two vectors encoded in two ciphertexts in Algorithm 5. For that, we first multiply the ciphertexts together, which yields their component-wise products. Then, we apply rotations to obtain the dot product in every slot of the vector.

With DotProd, we apply it over the rows of the RP matrix with the vector, producing several ciphertexts that each contain the dot product between a row and said vector. Though a series of masks and additions, these separate ciphertexts are combined into the matrix-vector product between an RP matrix and a vector as shown in Algorithm 6.

Row-Expanded-Packed (REP) matrices.

This method of encoding a matrix is similar to RP matrices, except that each entry is repeated q times for some integer q that is a power of two. As with RP matrices, REP matrices are represented by vectors of ciphertexts. By encoding a matrix in this manner, we reduce the number of homomorphic operations when multiplying with other matrices. For this paper, we only consider matrix products between CP and REP matrices.

First, we define a function, Duplicate in Algorithm 7. Suppose that a ciphertext has k filled slots out of n, Duplicate fills the remaining slots with repetitions of the k slots. This is shown in Fig. 2. This can be realized using simple rotations and additions.

Fig. 2
figure 2

Duplicate

To compute matrix products between CP and REP matrices, we first apply Duplicate the columns of the CP matrix. Then, we multiply each column in the CP matrix with its corresponding row in the REP matrix. Finally, we sum all the ciphertexts and obtain the product of the matrices in a CCP matrix. This is shown in Algorithm 8.

Logistic regression with homomorphic encryption

The first step in the GWAS algorithm is to solve a logistic model for its weights β. There are several solutions [1115] that solve a logistic model with HE, given that it was one of the challenges in the iDASH 2017 competition.

Logistic regression.

Logistic regression estimates the parameters of a binary logistic model. Such models are used to predict the probability of an event occurring given some input features. These models assume that the logarithm of the odds ratio (log-odds) is a linear combination of the input features.

Let p denote the probability of an event occurring. The assumption above can be written as

$$ \log \left(\frac{p}{1-p} \right) = \beta_{0} + \beta_{1} x_{1} + \cdots + \beta_{d} x_{d}. $$
(1)

Rearranging Eq. (1), we get

$$ p(\mathbf{x}, \boldsymbol{\beta}) = \frac{1}{1+e^{-\boldsymbol{\beta}^{\intercal} \mathbf{x}}} $$
(2)

where \(\boldsymbol {\beta } = (\beta _{0}, \beta _{1}, \dots, \beta _{d})\) and \(\mathbf {x} = (1, x_{1}, \dots, x_{d})\). This is known as the sigmoid function.

Logistic regression estimates the regression coefficients β using maximum likelihood estimation (MLE). This likelihood is given as

$$ \begin{aligned} L(\mathbf{X}, \boldsymbol{\beta}) & = \prod_{i=1}^{n} P(y_{i}|\mathbf{x}_{i}) \\ & = \prod_{i=1}^{n} p(\mathbf{x}_{i},\boldsymbol{\beta})^{y_{i}} (1- p(\mathbf{x}_{i},\boldsymbol{\beta}))^{1-y_{i}}. \end{aligned} $$
(3)

where xi denotes the rows of the covariates matrix X. Often, MLE is performed with the log-likelihood

$$ \begin{aligned} \ell(\mathbf{X}, \boldsymbol{\beta}) & = \sum_{i=1}^{n} y_{i} \log(p(\mathbf{x}_{i},\boldsymbol{\beta})) \\ & \qquad + \sum_{i=1}^{n} (1-y_{i}) \log(1-p(\mathbf{x}_{i},\boldsymbol{\beta})) \end{aligned} $$
(4)
$$ \begin{aligned} & = \sum_{i=1}^{n} y_{i} \log \left(\frac{p(\mathbf{x}_{i},\boldsymbol{\beta})}{(1- p(\mathbf{x}_{i},\boldsymbol{\beta})} \right) \\ & \qquad + \sum_{i=1}^{n} \log \left(\frac{1}{e^{\boldsymbol{\beta}^{\intercal} \mathbf{x}}+1} \right) \end{aligned} $$
(5)
$$ \begin{aligned} & = \sum_{i=1}^{n} y_{i} (\boldsymbol{\beta}^{\intercal} \mathbf{x}_{i}) - \sum_{i=1}^{n} \log (e^{\boldsymbol{\beta}^{\intercal} \mathbf{x}}+1) \end{aligned} $$
(6)

Maximizing Eq. (6) requires an iterative process. Our implementation in solving the logistic model applies the Newton-Raphson method [16]. This is because the Newton-Raphson method is known to converge quadratically [17] and we wish to solve the model with as little iterations as possible.

The Newton-Raphson method iterates over the following equation

$$ \beta^{(t+1)} = \boldsymbol{\beta}^{(t)} - \mathbf{H}^{-1}(\boldsymbol{\beta}^{(t)}) \mathbf{g}(\boldsymbol{\beta}^{(t)}) $$
(7)

where g and H are given as

$$\begin{array}{*{20}l} \mathbf{g}(\boldsymbol{\beta}) &= \mathbf{X}^{\intercal}(\mathbf{y}-\mathbf{p}(\boldsymbol{\beta})), \end{array} $$
(8)
$$\begin{array}{*{20}l} \mathbf{H}(\boldsymbol{\beta}) &= \mathbf{X}^{\intercal} (\mathbf{W}(\boldsymbol{\beta})) \mathbf{X}. \end{array} $$
(9)

W(β) is defined to be a n by n diagonal matrix whose entries are pi(1−pi) for \(i = 1, \dots, n\). We remind the reader here that y is a n by 1 binary response vector that contains the truth labels of each individual. p(β) represents the vector of probabilities that is computed for each individual with Eq. (2) using β of the particular iteration.

A careful derivation of Eqs. (1) to (6) can be found in [18].

However, there are two non-HE friendly aspects in this algorithm. Firstly, for each iteration, H is re-computed with the iteration’s β. This is computationally expensive with homomorphic encryption. Secondly, the sigmoid function Eq. (2) contains the exponential function, ex which is not natively supported by HE schemes. Hence, we approximate the Hessian matrix and the sigmoid function in our implementation.

Hessian matrix approximation.

We use an approximation for all Hessian matrices as suggested by Böhning and Lindsay [19]. They proposed using

$$ \tilde{\mathbf{H}} = \frac{1}{4} \mathbf{X}^{\intercal} \mathbf{X} $$
(10)

as a lower bound approximation for all Hessian matrices in solving a logistic model with the Newton-Raphson method. This approximation is also used by Xie et al. [20] in their distributed privacy preserving logistic regression. We chose to precompute \(\left (\mathbf {X}^{\intercal } \mathbf {X} \right)^{-1}\) with an open source matrix library Eigen [21]. We then encrypt \(\left (\mathbf {X}^{\intercal } \mathbf {X} \right)^{-1}\) as an input to the GWAS algorithm.

Sigmoid function approximation.

We use the approximation from Kim et al. [12] who proposed polynomials of degree 3,5,7 as approximations of the sigmoid function. We chose the polynomial of degree 7:

$$ \begin{aligned} \sigma_{7}(x) &= 0.5 - 1.73496 \left(\frac{x}{8} \right) + 4.19407 \left(\frac{x}{8} \right)^{3} \\ &\quad- 5.43402 \left(\frac{x}{8} \right)^{5} + 2.50739 \left(\frac{x}{8} \right)^{7}. \end{aligned} $$
(11)

Our algorithm.

We described our algorithm for Logistic Regression with HE in Algorithm 9. We encrypt X and \(\left (\mathbf {X}^{\intercal } \mathbf {X} \right)^{-1}\) as CP matrices and y as a ciphertext. We initialize β in a ciphertext by encrypting a vector of zeros. We first compute Xβ with Algorithm 2 and apply Eq. (11) on to each slot in the ciphertext. Note here that Xβ is now a vector and is represented by one ciphertext. Instead of encrypting \(\mathbf {X}^{\intercal }\), we treat X as \(\mathbf {X}^{\intercal }\) encrypted as a RP matrix. We thus invoke RP matrix vector multiplication, Algorithm 6 with \(\mathbf {X}^{\intercal }\) and (yp). Finally, β is updated with Eq. (7).

In comparison with prior works that perform secure computation of logistic regression with HE [1115], our method is the first to use the Newton-Raphson method. Gradient descent was chosen to in maximizing the log-likelihood, Eq. (6), in other implementations.

In [13], a 1-bit gradient descent method was adopted, with the FV scheme [22]. Bootstrapping is required in this solution. [11] employed the CKKS scheme [9] with gradient descent. They shared two least squares approximations of the sigmoid function. The winning solution of iDASH 2018 [12] used a gradient descent variant - Nesterov Accelerated Gradient and introduced another approximation of the sigmoid function. [15] use bootstrapping to achieve logistic regression for datasets larger than any of the solutions published. A unique solution proposed in [14] attempts to approximate a closed form solution for logistic regression.

Semi-Parallel GWAS with homomorphic encryption

The semi-parallel GWAS algorithm proposed by Sikorska et al. [6] rearranges linear model computations and leverages fast matrix operations to achieve some parallelization and thus better performance. A logistic model is first solved with the covariates matrix. Let z be a temporary variable

$$ \mathbf{z} = \mathbf{X}\boldsymbol{\beta} + \mathbf{W}^{-1}(\mathbf{y}-\mathbf{p}) $$
(12)

where β is the weights of the logistic model, y is the response vector and p is the vector of probabilities from evaluating the sigmoid function Eq. (2) with β.

The SNP matrix S is then orthogonalized with

$$ \mathbf{S}^{*} = \mathbf{S} - \mathbf{X} \left(\mathbf{X}^{\intercal} \mathbf{W} \mathbf{X} \right)^{-1} \mathbf{X}^{\intercal} \mathbf{W} \mathbf{S} $$
(13)

and z is orthogonalized with

$$ \mathbf{z}^{*} = \mathbf{z} - \mathbf{X} \left(\mathbf{X}^{\intercal} \mathbf{W} \mathbf{X} \right)^{-1} \mathbf{X}^{\intercal} \mathbf{W} \mathbf{z}. $$
(14)

The estimated SNP effect s can then be computed with

$$ \mathbf{b} = \frac{(\mathbf{W} \mathbf{z}^{*})^{\intercal} \cdot \mathbf{S}^{*}}{\text{\texttt{colsum}}(\mathbf{W} (\mathbf{S}^{*})^{2})} $$
(15)

and the standard error can be computed with

$$ \mathbf{err} = \frac{1}{\text{\texttt{colsum}}(\mathbf{W} (\mathbf{S}^{*})^{2})}. $$
(16)

Division here denotes element-wise division between the vectors \((\mathbf {W} \mathbf {z})^{\intercal } \cdot \mathbf {S}^{*}\) and colsum(W(S)2).

The main obstacle for HE with the semi-parallel GWAS algorithm is matrix inversion. General matrix inversion is computationally expensive and inefficient in HE. This is mainly because integer division, which is used frequently in matrix inversion, cannot be efficiently implemented in HE. There are two instances where matrix inversion has to be computed. The first occurs in Eq. (12) and the second occurs in the orthogonal transformations Eqs. (13) and (14). In the following paragraphs, we will describe our method for implementing the semi-parallel GWAS algorithm with HE. We will also describe some optimizations that reduce memory consumption and accelerate computations to qualify within the competition requirements.

Inverse of W.

We exploit the nature of W to compute its inverse with the Newton-Raphson method in HE. Recall that W is a n by n diagonal matrix whose entries are pi(1−pi) for \(i = 1, \dots, n\). Firstly, we represent the diagonal matrix W by a vector w containing the diagonal entries to reduce storage and computational complexity. Secondly, the inverse of a diagonal matrix is can be obtained by inverting the entries along the main diagonal. This means that W−1 can be computed by inverting the slots of W. The entries of w are given as pi(1−pi), where pi∈[0,1]. We claim an upper bound of 0.25 on the slots of w. The proof is as follows: the derivative of pi(1−pi) is 1−2pi for which pi=0.5 gives a maximium. Substituting pi=0.5 provides the upper bound of 0.25.

We used this information to set a good initial guess of 3 in the Newton-Raphson method. This would reduce the number of iterations needed to obtain an accurate inverse. We describe this algorithm in Algorithm 10.

Modification of orthogonal transformations.

We propose modifications to Eqs. (13) and (14) as \(\left (\mathbf {X}^{\intercal } \mathbf {W} \mathbf {X} \right)^{-1}\) is too expensive to be computed in the encrypted domain.

We define a placeholder matrix M as

$$ \mathbf{M} = I - \mathbf{X} \left(\mathbf{X}^{\intercal} \mathbf{X} \right)^{-1} \mathbf{X}^{\intercal}. $$
(17)

We proposed a modification, inspired by the Hessian approximation in Eq. (10), to the orthogonal transformation of S with

$$ \begin{aligned} \mathbf{S}' & = \mathbf{M} \mathbf{S} \\ & = \mathbf{S} - \mathbf{X} \left(\mathbf{X}^{\intercal} \mathbf{X} \right)^{-1} \mathbf{X}^{\intercal} \mathbf{S} \end{aligned} $$
(18)

and z with

$$ \begin{aligned} \mathbf{z}' & =\mathbf{M} \mathbf{z} \\ & = \mathbf{z} - \mathbf{X} \left(\mathbf{X}^{\intercal} \mathbf{X} \right)^{-1} \mathbf{X}^{\intercal} \mathbf{z}. \end{aligned} $$
(19)

The estimated SNP effect is now computed with

$$ \mathbf{b}' = \frac{(\mathbf{W} \mathbf{z}')^{\intercal} \cdot \mathbf{S}'}{\text{\texttt{colSum}}(\mathbf{W} (\mathbf{S}')^{2})}. $$
(20)

and the standard error is

$$ \mathbf{err}' = \frac{1}{\text{\texttt{colsum}}(\mathbf{W} (\mathbf{S}')^{2})}. $$
(21)

Complex space of CKKS ciphertext

For our first optimization, we exploit the scheme’s native support for complex numbers to pack two SNPs into a single complex number, putting one SNP in the real part and another in the imaginary part. This allows us to fit twice as many SNPs in a single ciphertext and cut the runtime by half.

However, (Si′)2 in Eq. (20) is more difficult to compute with this packing method. Simply squaring the ciphertext does not yield the correct output as slots now contain complex numbers; for some complex number z=x+yi,

$$ (x+yi)^{2} = (x^{2} - y^{2}) + 2xyi. $$
(22)

Instead, we consider multiplying z by its complex conjugate \(\overline {z}=x-yi\). We have

$$ (x+yi)(x-yi) = x^{2} + y^{2} $$
(23)

Extracting the real parts of Eqs. (22) and (23), we get

$$ x^{2} = \frac{Re(z\overline{z} + z^{2})}{2} \qquad y^{2} = \frac{Re(z\overline{z}- z^{2})}{2} $$
(24)

Recall that Si′ is a CCP matrix which is represented by one ciphertext with each slot holding one complex numbers encoding two SNPs. Thus, we compute SiSi′ and \(\mathbf {S}_{i}'\overline {\mathbf {S}_{i}'}\). We assign

$$ \mathbf{S}_{even} = \frac{\mathbf{S}_{i}'\overline{\mathbf{S}_{i}'} + \mathbf{S}_{i}'\mathbf{S}_{i}'}{2} $$
(25)

and

$$ \mathbf{S}_{odd} = \frac{\mathbf{S}_{i}'\overline{\mathbf{S}_{i}'} - \mathbf{S}_{i}'\mathbf{S}_{i}'}{2} $$
(26)

Optimizations with HEAAN

There are two optimizations that we used with the HEAAN library to reduce the parameters needed and to improve runtime.

For the first optimization, we rescale the ciphertext by a value that is smaller than p after every plaintext multiplication. This means each plaintext multiplication is now “cheaper” than a ciphertext multiplication and hence the value of L when initialized can be lowered.

The second optimization would be to perform only power-of-two rotations. A rotation by τ slots is a composition of power-of-two rotations in the HEAAN library. The required power-of-two rotations are the 1s of the binary decomposition of τ. Thus, it would be more efficient if we only perform rotations by a power-of-two. We illustrate this with an example. A rotation by 245 slots would require 6 power-of-two rotations as the binary decomposition of 245 is 11110101. A rotation by 256 slots would require 1 power-of-two rotations as the binary decomposition of 256 is 100000000. This reduces the number of rotations in our implementation.

Batching SNPs

As S is too large to be stored in memory when encrypted, we propose to divide S column-wise and process batches of SNPs. We show how to compute the maximum number of SNPs that can fit within a batch. Let τ be the number of SNPs in a batch. Consider MS, a matrix product between a n by n and a n by τ matrix. By Algorithm 8, the result is a CCP matrix, whose ciphertext has to have enough slots for n×τ elements. For efficiency as described in the previous section, we round the size of each column to the nearest power-of-two and pad the columns with zeroes. Together with the complex space of the HEAAN ciphertext, the maximum number of SNPs that can be processed as a batch is given as

$$ \tau = \frac{2^{N-1}}{2 \times \lceil{n}\rceil} $$
(27)

Smart cache module

We consider the largest matrix in our implementation, M which is a n by n matrix. There is an instance where M will be stored as CCP matrix (See next section). This means that the ciphertext would need to have at least ⌈n2 slots. Consequentially, logN is at least 2×⌈n2. This results in a large set of parameters for the HE scheme which translate to a large amount of memory usage.

The next step requires this CCP matrix to be first converted into a CP matrix. This implies that we need to manage n ciphertexts where n is the number of individuals. This further increase the memory footprint of the algorithm.

Furthermore, the virtual machine that the iDASH organizers provide only has 16GB RAM. As a result, we choose to move ciphertexts to the hard disk when they are not used for computations.

We designed a cache module that exploits the vectorized ciphertext structure of encrypted matrices. There are 4 threads on the VM provided, of which 2 is used for reading ciphertexts from the disk while 1 is used to write ciphertext into a file. The last thread is used for computation. A ciphertext will be pre-fetched into memory before it is needed for computation, replacing a ciphertext that is no longer needed.

Our algorithm

We give a detailed walkthrough of our modified semi-parallel GWAS algorithm in Algorithm 11.

First, we perform logistic regression with X, y and \(\left (\mathbf {X}^{\intercal } \mathbf {X} \right)^{-1}\) as described in Algorithm 9. We use β from logistic regression, together with p from the previous iteration to compute w and z.

Next, compute the inverse of the slots elements in w with inverseSlots. Note that W−1(yp) is equivalent to multiplying the ciphertexts w−1 and (yp). We then compute z as given in Eq. (19). At this point, we have z and w which are both vectors, stored in a ciphertext each.

We construct a temporary variable \(\mathbf {M} = \mathbf {Id} - \mathbf {X} \left (\mathbf {X}^{\intercal } \mathbf {X} \right)^{-1}\) which is a CP matrix to facilitate computations. Here, we choose to encrypt \(\mathbf {X}^{\intercal }\) as a REP matrix. The reason for encrypting differently is because multiplying a CP matrix by a RP matrix requires the RP matrix to be first converted into REP form. This process is very inefficient homomorphically and hence we decided to encrypt it directly as a REP matrix. Thus, the product of \(\mathbf {X} \left (\mathbf {X}^{\intercal } \mathbf {X} \right)^{-1}\) with \(\mathbf {X}^{\intercal }\) is a CP-REP-MatMult as shown in Algorithm 8. At this point, we M is a CCP matrix. We then convert M into a CP matrix to compute MS.

As described earlier, we iterate over partial blocks of the SNP matrix, Si, divided column-wise. Next, compute its orthogonal transformation Si′. We remind the reader here again that MSi is computed with CP-REP-MatMult which produces a CCP matrix, S. We compute, separately, the numerator, numerator, and denominator, denominator, of Eq. (20) for each Si′.

For numerator, we multiply w−1 and (z) slots-wise and duplicate the slots for as many columns in the CCP matrix S. The vector-matrix product is now redefined as a ciphertext multiplication, followed by calling colSum over n slots.

For denominator, the computation is similar. After squaring the slots of the CCP matrix S, we duplicate w and perform a slot-wise multiplication. colSum of the resulting CCP matrix is exactly the second part of the vector-matrix product for numerator - the accumulation sum over every n slots.

We wish to highlight here that as stated in Q15 FAQ for the competition, it is acceptable to return numerator and denominator separately [23]. As such, we decrypt and concatenate all numerators and denominators respectively instead of performing a costly inversion of denominator. Finally, we divide the two vectors element-wise to obtain the estimated SNP effect, b.

Results

We used the provided dataset of 245 users with 4 covariates and 10643 SNPs.

The HE library used is the HEAAN library [24], commit id da3b98. The HE parameters used are logN=17, logL=2440 and logp=45. We observed that the HEAAN context based on these parameters utilizes about 3.5GB. The context can be thought of as the base memory needed for HE computations. Furthermore, we run Rescale on the output with p=45 for ciphertext-ciphertext multiplications and p=10 for ciphertext-plaintext multiplications to control noise growth. This gives us a security level of about 93 bits based on the LWE estimator provided by Albrecht et al. [25].

As described earlier, we require at least ⌈n2 slots, where n=245. We chose the minimum number of slots needed, 216 slots and set logN to be 17. We are able to process a total of τ=512 SNPs in each batch, based on Eq. (27). This gives us a total of ⌈10643/512⌉=21 batches. We set κ=3 for the number of iterations in HomLogisticRegression.

We have tabulated the number of sequential homomorphic computations of our modified GWAS algorithm in Table 1. These numbers represent the circuit depth of the GWAS algorithm. We find that a comparison of the number of these computations is a better measure of evaluating a HE program, independent of HE library used.

Table 1 Depth of Homomorphic Operations

We report the time taken and memory consumed on two servers: the VM provided by the iDASH organizers and our server.

The machine provided by the iDASH organizers is an Amazon T2 Xlarge or equivalent VM, which has 4 vCPU, 16GB memory, disk size around 200GB [23]. The results are shown in Table 2.

Table 2 Time Taken and Memory Consumption with iDASH server (4 cores) using HEAAN

For our server, the CPU model used is Intel Xeon Platinum 8170 CPU at 2.10GHz with 26 cores and the OS used is Arch Linux. The results are shown in Table 3.

Table 3 Time Taken and Memory Consumption with our server (22 cores) using HEAAN

We evaluated the accuracy of our results with two methods. The first method compares the vectors b and b, counting the number of entries that are not equal. However, since the CKKS scheme introduces some error upon decrypting, we are unable to get any identical entries. Instead, we opt to count the number of p-values for which our solution differs from the original algorithm by more than some error, e. This is shown in Table 4.

Table 4 HEAAN Accuracy

The second method would be to plot a scatter diagram whose x-axis represent b and y-axis represent b. Ideally, if b=b, the best fit line of the scatter plot should be y=x. We compute the line of best fit with the numpy.polyfit function from python [26] and compared against the line y=x. Our HEAAN based solution gives the line y=1.002x+0.0005317. The scatter plot is given in Fig. 3.

Fig. 3
figure 3

HEAAN Implementation Scatter Plot

We port our implementation to the SEAL library [27] which recently released a version of the CKKS scheme that does not require the 22L modulus. We implemented this with 22 cores on our machine. The parameters used are logN=17, logL=1680 and logp=50. The context generated in this instance is approximately 73.4GB. The results of this implementation is given in Table 5.

Table 5 Time Taken and Memory Consumption with our server (22 cores) using SEAL

The accuracy of the SEAL implementation based on the first method is tabulated in Table 6.

Table 6 SEAL Accuracy

Our SEAL based solution gives the line y=1.017x+0.007565. The scatter plot for the results is given in Fig. 4.

Fig. 4
figure 4

SEAL Implementation Scatter Plot

Discussion

In our submission, we miscalculated the security level, assuming that it fit the 128-bit requirements while it was actually about 93 bits. This is due to the use of the modulus 22L for the evaluation key, which is a quirk of the HEAAN library [24].

There is also a limit of 256 subjects with our implementation, due to our desire to pack the entire test dataset into a single ciphertext. For a larger number of subjects (up to 512), the matrix \(\mathbf {X}^{\intercal }\) will need at least 512 by 512 slots, which means that logN has to be at least 19.

We are aware of the limitations in HEAAN, namely the 22L modulus and slower homomorphic operations. However, it was the only publicly available HE library based on the CKKS scheme.

We can see that SEAL’s implementation of the CKKS scheme is superior in terms of runtime. This is because SEAL implemented an RNS-variant of the CKKS, which improves the speed of the algorithm by almost 8 times. The security level of this implementation based on the LWE estimator is about 230 bits.

However, we are unable to execute our GWAS algorithm with SEAL using κ=3. The set of parameters that supports the depth of the algorithm with κ=3 appears to be too large and caused our server to run out of memory. Hence, for the implementation with SEAL, we reduced κ to 1. This reduces the depth of the algorithm and hence the parameters that were used. Consequentially, the accuracy of the results has decreased from 98.42% to 61.84%.

Conclusions

In this paper, we demonstrated an implementation of a semi-parallel GWAS algorithm for encrypted data. We employed suitable approximations in adapting the semi-parallel GWAS algorithm to be HE-friendly. Our solution shows that the model trained over encrypted data is comparable to one trained over unencrypted data. Memory constraints are shown to be of little concern with our implementation of a smart cache, which reduced memory consumption to fit within the limits imposed. This signifies another milestone for HE, showing that HE is mature enough to tackle more complex algorithms.