Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Quantum machine learning for quantum anomaly detection Nana Liu1, 2, ∗ and Patrick Rebentrost3, † 1 Centre for Quantum Technologies, National University of Singapore, 3 Science Drive 2, Singapore 117543 2 Singapore University of Technology and Design, 8 Somapah Road, Singapore 487372 3 Xanadu, 372 Richmond St W, Toronto, M5V 2L7, Canada Anomaly detection is used for identifying data that deviate from ‘normal’ data patterns. Its usage on classical data finds diverse applications in many important areas like fraud detection, medical diagnoses, data cleaning and surveillance. With the advent of quantum technologies, anomaly detection of quantum data, in the form of quantum states, may become an important component of quantum applications. Machine learning algorithms are playing pivotal roles in anomaly detection using classical data. Two widely-used algorithms are kernel principal component analysis and oneclass support vector machine. We find corresponding quantum algorithms to detect anomalies in quantum states. We show that these two quantum algorithms can be performed using resources logarithmic in the dimensionality of quantum states. For pure quantum states, these resources can also be logarithmic in the number of quantum states used for training the machine learning algorithm. This makes these algorithms potentially applicable to big quantum data applications. I. Introduction Quantum computing has achieved success in finding algorithms that offer speed-ups to classical algorithms for certain problems like factoring and searching an unstructured data-base. It relies upon techniques like quantum phase estimation, quantum matrix inversion and amplitude amplification. These tools have recently been employed in quantum algorithms for machine learning[1–4], an area which has various applications across very broad disciplines. In particular, it has proved useful in data fitting and classification problems that appear in pattern recognition [5–11], where quantum algorithms can offer speed-ups in both the dimensionality and number of data used to train the algorithm. In the presence of a large amount of input data, some coming from unreliable or unfamiliar sources, it is important to be able to detect outliers in the data. This is especially relevant when few examples of outliers may be available to develop a prior expectation. These outliers can be indicative of some unexpected phenomena emerging in a system that has never been before identified, like a faulty system or a malicious intruder. The subject of anomaly detection is learning how to make an accurate identification of an outlier when training data of ‘normal’ data are given. Machine learning algorithms for anomaly detection in classical data have been widely applied in areas as diverse as fraud detection, medical diagnoses, data cleaning and surveillance [12, 13]. Quantum data, which take the form of quantum states, are prevalent in all forms of quantum computation, quantum simulation and quantum communication. Anomaly detection of quantum states is thus expected to be an important direction in the future of quantum information processing and communication, in particular, over ∗ Electronic † Electronic address: nana.liu@quantumlah.org address: patrick@xanadu.ai the cloud or quantum internet. Since machine learning algorithms have proved successful for anomaly detection in classical data, a natural question arises if there exist quantum machine algorithms used for detecting anomalies in quantum systems. While classical anomaly detection techniques for quantum states can be used, they are only possible by first probing the classical descriptions of these states which require state tomography, requiring a large number of measurements [14, 15]. Thus, it would also be advantageous to reduce these resource overheads by using a quantum algorithm. In this paper we discuss quantum machine learning methods applied to the detection of anomalies in quantum states themselves, which we call quantum anomaly detection. Given a training set of M unknown quantum states, each of dimension d, the task is to use a quantum computer to detect outliers on new data, occurring for example due to a faulty quantum device. Our schemes also do not require quantum random access memory [16], since the necessary data input is fully given by quantum states generated from quantum devices. In that sense, we present an instance of an algorithm for quantum learning [17–21]. We present two quantum algorithms for quantum anomaly detection: kernel principal component analysis (PCA) [22, 23] and one-class support vector machine (SVM) [24, 25]. We show that pure state anomaly detection algorithms can be performed using resources logarithmic in M and d. For mixed quantum states, we show how this is possible using resources logarithmic in d only. We note this can also be an exponential resource reduction compared to state tomography [26]. After introducing the classical kernel PCA and oneclass SVM algorithms in Section II, we develop quantum anomaly detection algorithms based on kernel PCA and one-class SVM for pure quantum states in Section III. We generalise these algorithms for mixed quantum states in Section IV and discuss the implications of these results in Section V. 2 II. Anomaly detection in classical machine learning Anomaly detection involves algorithms that can recognise outlier data compared to some expected or ‘normal’ data patterns. These unusual data signatures, or anomalies, can result from faulty systems, malicious intrusion into a system or from naturally occuring novel phenomena that are too rare to have been captured and classed with their own training sets. These algorithms generally provide proximity measures that quantify how ‘far’ the inspected data pattern is from the ‘norm’. This is also closely related to the change point detection problem, which involves finding the point at which an underlying probability distribution governing an observed phenomena has changed [27, 28]. There are three broad classes of anomaly detection algorithms: supervised, unsupervised and semi-supervised [12, 13]. Supervised anomaly detection assume labelled training sets for both ‘normal’ and anomalous data and the usual supervised learning algorithms can be employed [43]. The unsupervised learning algorithms apply when neither ‘normal’ nor anomalous data are labelled, but assume that ‘normal’ cases occur far more frequently than anomalous ones. However, there are scenarios where the ’normal’ data can be readily identified and gathered, but the anomalous data may be too scarce to form a training set. Here semi-supervised methods are required. We will focus on this latter scenario in this paper. Anomaly detection algorithms can be applied to different types of anomalies: point, contextual anomalies and collective anomalies [12, 13]. Point anomalies are single data instances that can be classified as either ‘normal’ data or an anomaly. They are the simplest, most widely studied type of anomaly and will be the focus of this paper. Contextual anomalies are individual data instances that are anomalous only with respect to a particular context, common in time-series data. Collective anomalies on the other hand are collections of data instances that are unusual only with respect to the whole data set. These latter types of anomalies in the quantum domain will be explored in future work. For point anomaly detection alone, there are many different types of algorithms based on different techniques [12, 13]. Two of these algorithms, kernel PCA [22] and one-class SVM [25], resemble most closely to existing quantum machine learning algorithms that provide speed-ups over their classical counterparts [5, 29]. We will use these classical algorithms, described below, as a foundation to develop quantum kernel PCA and quantum one-class SVM algorithms to detect anomalies in quantum data. A. Kernel PCA Suppose we are given a training set of M vectors ~xi ∈ ❘M . The centroid of the training data is denoted by ~xc = M 1 X ~xi , M i=1 (1) and we can define ~zi = ~xi − ~xc to be the centered data. The centered sample covariance matrix C is then given by M C= 1 X T ~zi ~zi . M − 1 i=1 (2) Anomaly detection via kernel PCA is then performed using a proximity measure f (~x0 ) = |~z0 |2 − ~z0T C~z0 , (3) where ~z0 = ~x0 − ~xc . This measure detects a difference in the distance between point ~x0 and the centroid of the training data and the variance of the training data along the direction ~z0 . This can quantify how anomalous the point ~x0 is compared to the training data. A larger f (~x0 ) thus implies a more anomalous datum than a smaller f (~x0 ). This method also allows us to classify anomalies in non-linear feature spaces by performing the replacement ~xi → φ(~xi ) with a feature map φ(~xi ). The inner products are performed in an abstract linear feature space. The inner product can be represented by a kernel function k(~xi , ~xj ), which can be taken to be a non-linear function, for example k(~xi , ~xj ) = φ(~xi )T φ(~xj ). For some kernels like the radial basis function kernels, it has been shown that anomaly detection can be more effective using kernel PCA compared to one-class SVM [22]. While for pure quantum states we will focus on the linear kernel, this analysis might also be extended to polynomial kernels [5] and radial basis function kernels [30]. B. One-class SVM One-class SVM [31, 32] has also been applied to anomaly detection in classical data. For instance, it has been used as part of change point detection algorithms and detecting novel signals in a time series [33]. We focus on the least-squares formulation of the one-class SVM [25] (using the linear kernel for now). This involves finding a hyperplane in feature space such that it both maximises the distance from the hyperplane to the origin of the feature space, as well as minimising the least-squares distance ξi between the data point ~xi and the hyperplane. 3 This is equivalent to extremising a Lagrangian L L = L(w, ~ r, ξi , αi ) = −r− M X i=1 M 1 1 X 2 ||w|| ~ 2+ ξ 2 2PT M i=1 i αi (w ~ † · ~xi + ξi − r), (4) where w ~ · ~xi = r defines a hyperplane in feature space, which ‘best characterises’ the set of training data and r is a bias term. In the presence of non-zero ξi , the hyperplane we want to find is subject to the constraint ξi = r − w ~ † · ~xi . The Lagrange multipliers αi can be represented in vectorised form as α ~ and PT denotes a ‘threshold acceptance probability’, related to the fraction of data expected to be outliers [31]. From the standard method, one arrives at the corresponding matrix equation        1 −r −r 0 ~eT = = ~ , (5) F̃ α ~ α ~ ~e K + PT M 1 0 where K is a matrix whose elements are the kernel functions Kij = k(~xi , ~xj ) and ~e = (1, ..., 1)T . The matrix inversion of F̃ applied to (1, ~0)T then reveals (−r, αi ). Once αi and r are determined, one can compute a proximity measure f (~x0 ) from the new data ~x0 to the hyperplane f (~x0 ) = |w ~ † · ~x0 − r|, (6) which can also be generalised to non-linear kernels. It is possible to look at restricted versions of this algorithm, where the bias r is set to a constant number c. This reduces Eq. (5) to (K + PT M 1)~ α = c~e. This should produce the same ordering of proximity measures when only the relative distance to the hyperplane is relevant. For simplicity, in the rest of the paper we focus on the case r = c = 1 [44]. III. Pure state anomaly detection We begin with the following set-up. Suppose we are given access to both the unitaries {Ui } and the control PM unitary UC = i=1 |iihi| ⊗ Ui , where Ui |0i = |ψi i are the training quantum states. We also suppose that given any two unitaries u1 and u2 , it is possible to generate a control unitary of the form |0ih0| ⊗ u1 + |1ih1| ⊗ u2 [45]. The states |ψi i are labelled ‘normal’. If we are also given U0 where U0 |0i = |ψ0 i, our task in quantum anomaly detection is to quantify how anomalous the state |ψ0 i is compared to the ‘normal’ states. One scenario where this may be relevant is in the testing of manufactured quantum circuits. The control unitary UC [46] allows one to create suPM perpositions of training states of the form i=1 |ψi i|ii. This is required for the generation of the necessary kernel density matrices, the computation of some normalisation constants and for the anomaly classification stage. With these assumptions, one can create multiple copies of the training states {|ψi i}, the test state |ψ0 i and all the other states necessary to perform quantum anomaly detection. Requiring multiple identical copies of states is a purely quantum characteristic stemming from the nocloning theorem and the imperfect distinguishability of non-orthogonal quantum states. For instance, multiple copies of two different states are required to improve the success probability in distinguishing between these two states, yet these multiple copies cannot be generated from a single copy due to the no-cloning constraint. A. 1. Quantum kernel PCA (pure state) Algorithm using resources logarithmic in d For a quantum version of the kernel PCA algorithm for anomaly detection, first we define the analogous centroid state of the training quantum data by |ψc i = Nc M X i=1 |ψi i = Nc d M X X i=1 j=1 (ψi )j |ji. (7) ~i . Here (ψi )j denotes the j th component of a vector ψ We demonstrate in Appendix A that the normalisation PM |Nc |2 = 1/ i,j=1 hψi |ψj i can be found using O(log M ) resources. We also show a method for preparing the centroid state in Appendix A. We can write the centered quantum data as |zi i = Ni (|ψi i − (1/(M Nc ))|ψc i) = Ni d X j=1 (zi )j |ji, (8) where |Ni |2 = 1/(1 + 1/(M 2 |Nc |2 ) − (2/M )Re[hψi |ψc i]/Nc ) and (zi )j is the j th compo~i − (1/M ) PM ψ ~k . If the new nent of the vector ~zi = ψ k=1 P d quantum state to be classified is |ψ0 i = i=1 (ψ0 )i |ii, we can denote its centered equivalent by |z0 i = Pd N0 (|ψ0 i − (1/(M Nc ))|ψc i) = N0 j=1 (z0 )j |ji, where |N0 |2 = 1/(1 + 1/(M 2 |Nc |2 ) − (2/M )Re[hψ0 |ψc i]/Nc ). The centered sampled covariance matrix can be rewritten in terms of quantum states as C = (1/(M − PM 1)) i=1 (1/|Ni |2 )|zi ihzi |. It is also proportional to a density matrix C = C/tr(C), where tr(C) = (1/(M − Pd PM 1)) i=1 j=1 (zj )i (zj )∗i . By rewriting Eq. (3) in terms of quantum states |z0 i, C and dividing the total expression by |N0 |2 , we can define a proximity measure f (ψ0 ), which obeys 0 ≤ f (ψ0 ) ≤ 1. This quantifies how anomalous the quantum state |ψ0 i is from the set {|ψi i} and can be written as f (ψ0 ) = hz0 |(1 − C)|z0 i = 1 − tr(C)tr (C|z0 ihz0 |) M =1− tr (C|z0 ihz0 |) 1 X |hzi |z0 i|2 = 1 − . (9) |Nχc |2 (M − 1) M − 1 i=1 |Ni |2 4 In the first line, we use the fact that |z0 i is a normalised quantum state, which is true in all cases except |z0 i = 0. The special limit |z0 i = 0 corresponds to when all the training states and |ψ0 i are identical. In this case f (ψ0 ) = 0, indicating no anomaly, as expected. This expression in Eq. (9) is composed of a sum of O(M ) inner product terms hψi |ψκ i where i = 1, ..., M and κ = 0, ..., M . Since the phases of these inner products are also required, the standard swap test [34] is insufficient. Each inner product can be found instead using a modified swap test. For this we prepare the state |Ψiκ i = (1/2)[|0i(|ψi i + ζ|ψκ i) + |1i(|ψi i − ζ|ψκ i)] by applying the control unitary |0ih0|⊗Ui +|1ih1|⊗U κ and then √ a Hadamard gate onto the state (1/ 2)(|0i + ζ|1i)|0i, where ζ is a complex number. The success probability of measuring |1i in the ancilla of state |Ψiκ i is given by Piκ = (1/2)(1 − Re(ζhψi |ψκ i)). Then ζ = 1 and ζ = i recover respectively the real and imaginary part of hψi |ψκ i. The final measurement outcomes of the ancilla state satisfy a Bernoulli distribution so the number of measurements N required to estimate Piκ to precision ǫ is Piκ (1 − Piκ )/ǫ2 and is upper bounded by O(1/ǫ2 ). Therefore N ∼ O(poly(log d)) is sufficient if an error of order O(1/ log d) is accepted. Each modified swap test requires O(poly(log d)) number of measurements and copies of |ψκ i. The required normalisation constants |Ni |2 can be computed using inner products between the training states and Nc , which require resources costing O(poly(M log d)). Thus the proximity measure f (ψ0 ) can be computed using resources scaling as O(poly(M log d)). 2. Algorithm using resources logarithmic in d and M Next we show an alternative protocol for computing f (ψ0 ), now using O(log M ) resources. This can be achieved by preparing a superposition of the centered data. A method to generate C is to create the PM state |χc i = Nχc i=1 (1/Ni )|ii|zi i where |Nχc |2 = PM 1/ i=1 (1/|Ni |2 ) = M + 1/(2M |Nc |2 ). In the reduced space of the second register we find tr1 (|χc ihχc |) = |Nχc | 2 M X i=1 = |Nχc |2 (M − 1)C = C, (1/|Ni |2 )|zi ihzi | B. Quantum one-class SVM (pure state) Most classical data sets consist of real-valued numbers. In such a setting it is often sufficient to use the kernel function k(~xi , ~xj ) = ~xTi ~xj or its non-linear variants. For quantum states, it is beneficial to use well known fidelity measures such as |hψi |ψj i|2 [26]. Thus, the kernel matrix corresponding to pure quantum states is defined by K= M X i,j=1 |hψi |ψj i|2 |iihj|. (11) This kernel matrix is positive semi-definite, because the kernel function |hψi |ψj i|2 can be represented as an inner product of an appropriately defined feature map on the states |ψi i. Thus K = K/tr(K) can be considered as a density matrix, where tr(K) = M . See Appendix D for a derivation which also applies to mixed states. Similarly to the classical algorithm for one-class SVM, one requires a matrix inversion algorithm before computing the proximity measure, which we demonstrate below. We first present an algorithm that computes the relevant proximity measure via a swap test using resources logarithmic in the dimensionality. Then we show an algorithm potentially using resources logarithmic in the dimensionality and the size of training data based on a quantum matrix inversion algorithm. Algorithm using resources logarithmic in d 1. Suppose we look for an algorithm using O(poly(M log(d))) resources when computing the proximity measure. Then it is possible to use the standard swap test to measure the fidelity |hψi |ψj i|2 , which costs ∼ O(log(d)) in resources for each pair i, j = 1, ..., M , or ∼ O(M 2 log(d)) resources for every i, j. Then to find the constants αi necessary to compute the proximity measure, one can perform a classical matrix inversion, at cost O(M 3 ). Although use of only O(log M ) resources is not achieved in this protocol, it does not require the assumption of having access to UC . It is sufficient to need only the unitaries {Ui } and U0 . If the unitary UC is available, we potentially obtain a logarithmic dependence in the number of quantum data as shown in the next section. (10) where tr(C) = 1/(|Nχc |2 (M − 1)) since tr(tr1 (|χc ihχc |)) = 1. The proximity measure f (ψ0 ) = 1 − tr(C)tr(C|z0 ihz0 |) can be measured using a standard swap test with O(log d) copies of |z0 i and C. For methods of generating the states |z0 i, |χc i using O(log M ) resources see Appendices B and C respectively. Algorithm using resources logarithmic in d and M 2. a. Matrix inversion algorithm The problem is given by the linear system: (K + PT M 1)~ α = r~e, (12) 5 where we set r = 1. We can convert this into its quantum counterpart beginning with (K + PT M 1)|~ αi = |~ei, b. The kernel matrix PM K can be related to another kernel matrix K0 = i,j=1 |iihj|hψi |ψj i which has been used in previous discussions of the quantum support vector machine [5]. Note that (15) where ∗ is the element wise (or Hadamard) product of two matrices, (A ∗ B)ij = Aij Bij . This fact allows us to simulate the matrix exponential exp(−iKt) by using multiple copies of the density matrix K0 = K0 /M in a modified quantum PCA method. With the efficiently |kihj| ⊗ |jihk| ⊗ |kihj|, (16) we can perform tr1,2 {e−iS =σ− ′ ∆t i[(K0T (K0 ⊗ K0 ⊗ σ)eiS ′ ∆t } 2 ∗ K0 ), σ]∆t + O(∆t ). (17) See Appendix E for a derivation. Concatenating these steps allows us to perform exp(−iKt) to error O(t2 /n) using 2n copies of K0 . One preparation method for K0 [5] is by starting with √ PM the state |χi = (1/ M ) i=1 |ii|ψi i. Then the partial trace of |χihχ| with respect to second register gives tr2 (|χihχ|) = M 1 X |iihj|hψi |ψj i = K0 /M = K0 . (18) M i,j=1 The state |χi can be generated by applying UC to the uniform superposition M M 1 X 1 X √ |ii|0i → √ |ii|ψi i = |χi. M i=1 M i=1 c. (19) Proximity measure computation The proximity measure for the pure state |ψ0 i when r = 1 can be rewritten f (ψ0 ) = | M X i=1 αi |hψi |ψ0 i|2 − 1| = |hφ1 |φ2 i − 1|, (20) where M 1 X |ψi i|ii|ψ0 i |φ1 i = √ M i=1 |φ2 i = √ Exponentiating the kernel matrix K = K0T ∗ K0 , j,k=1 (14) Note that kK+PT 1k = O(1) since PT is a constant. Since K is a density matrix, we can simulate exp(−iKt), with time t, via a variant of the quantum PCA algorithm [29], which we describe in Section III B 2 b. For simulating this gate to precision ǫ, this method requires O(t2 /ǫ) copies of K, as long as K is sufficiently low-rank. The evolution exp(iPT 1t) is trivial to generate. As a third basic criterion, the condition number of the matrix should be at most O(log M ). In our case the largest eigenvalue is O(1 + PT ) while the smallest eigenvalue is PT , since the kernel matrix is assumed to be low-rank with most eigenvalues being 0. Thus we require 1/PT . O(log M ). The fourth criterion is that measuring the desired anomaly quantifier requires at most O(log M ) repetitions, which we show in Section III B 2 c. N X S′ = (13) √ PM PM where |~ei = (1/ M ) i=1 |ii, |~ αi = Nα i=1 αi |ii and PM Nα ≡ 1/ j=1 |αj |2 . We can then use the quantum matrix inversion algorithm (HHL) [35] to obtain |~ αi with a runtime of O(log M ). This algorithm requires the efficient exponentiation of K, which we show in Section III B 2 b. The performance of the quantum matrix inversion algorithms relies on four basic conditions discussed in [35] and also summarized in [36]. First, the right hand side must be prepared efficiently. In the present case of the uniform superposition this step is easily performed via a Hadamard operation on all relevant qubits. Second, the quantum matrix inversion algorithm uses phase estimation as a subroutine and thus requires controlled operations of the unitary generated by the matrix under consideration. Here the controlled operations are made possible in the following way. We first rescale the overall matrix by the trace tr(K) = M so we are now solving the problem (K + PT 1)|~ αi = |~ei. simulable operator M 1 PM j=1 |αj |2 |ψ0 i M X i=1 αi |ii|ψi i. (21) The state |φ1 i can be generated by acting the PM unitary Ũ1 = j=1 Uj ⊗ |jihj| ⊗ U0 onto state √ PM (1/ M ) i=1 |0i|ii|0i. The state |φ2 i can likewise be generated by acting the unitary Ũ2 = U0 ⊗ UC onto state |0i|~ αi|0i, where U0 |0i = |ψ0 i and Ui |0i = |ψi i. See Fig. 1. We can compute f (ψ0 ) if we can then find the overlap hφ1 |φ2 i in Eq. (20) using the same kind of modified swap test in√Section III A 1. With access to the state |φ̃R i = (1/ 2)(|0i|φ1 i + |1i|φ2 i) and applying a 6 IV. FIG. 1: Quantum circuits for generating states |φ1 i and |φ2 i. See text for more details. Hadamard on the ancilla qubit, the probability of measuring the first ancilla qubit in state |1i is given by PR = (1/2)(1 − Rehφ √1 |φ2 i). Similarly, with access with the state |φ̃I i = (1/ 2)(|0i|φ1 i + i|1i|φ2 i) and then applying a Hadamard on the ancilla qubit, the probability of measuring the first ancilla qubit in state |1i is given by PI = (1/2)(1 + Imhφ1 |φ2 i). The states |φ̃R i and |φ̃I i can be created by applying the control unitary Ũc = |0ih0| ⊗ Ũ√ 1 (1 ⊗ H̃ ⊗ 1) + |1ih1| ⊗ Ũ2 onto the states |A αi|0i) and √R i = (1/ 2)(|0i|0i|0i|0i + |1i|0i|~ |AI i = (1/ 2)(|0i|0i|0i|0i + i|1i|0i|~ α i|0i) respectively, √ PM where H̃|0i = (1/ M ) i=1 |ii. See Appendix F for a method on generating states |AR i and |AI i. Note that one way of implementing Ũc using UC is via a two-path interferometer with Ũ1 (1 ⊗ H̃ ⊗ 1) acting along one path and Ũ2 acting along the other path. An anomaly has been detected with error ǫ if we can find the value f (ψ0 ) to precision ǫ. f (ψ0 ) depends only on the inner product hφ1 |φ2 i and this comes from binary measurement outcomes with probabilities PR and PI . Since the measurement outcomes satisfy a Bernoulli distribution, the number of measurements Nf to find Rehφ1 |φ2 i and Imhφ1 |φ2 i to precision ǫ are 1 1 PR (1 − PR ) = 2 (1 − (Rehφ1 |φ2 i)2 ) . 2 Nf ∼ 2 ǫ 4ǫ 4ǫ PI (1 − PI ) 1 1 2 Nf ∼ = 2 (1 − (Imhφ1 |φ2 i) ) . 2 . (22) ǫ2 4ǫ 4ǫ Thus Nf . 1/(4ǫ2 ) ∼ O(log(M d)) if the error we tolerate is of order O(1/(log(M d)). Mixed state anomaly detection Up to this point, we have investigated anomaly detection for pure states. One can also consider the generalised problem of anomaly detection for mixed quantum states. This can find applications in the presence of unknown noise sources or when the output of a given quantum process is designed to be a mixed state, like in NMR [37]. The performance of pure state anomaly detection is potentially logarithmic in both the dimension of the Hilbert space and the number of pure states used for training. These exponential speedups are possible as long as conditional subroutines are available to prepare the pure states and additional requirements for quantum phase estimation and matrix inversion are satisfied. However, outputs of quantum devices are often likely to be statistical mixtures of pure states. It is then natural to extend anomaly detection to mixed states as described by density matrices. For mixed states, we have to use state similarity measures different from the pures state ones discussed above. For the kernel PCA we propose an analogous measure for mixed states and an algorithm for finding this. We show how this algorithm can be executed using O(log d) resources. For the one-class SVM classification, one requires K to be a proper kernel measure and a positive semidefinite matrix. For this we show that a quantum state fidelity measure can be used as a kernel function. We show that so long as is interested only in an algorithm that runs using O(log d) resources and not necessarily efficient in O(log M ), the kernel matrix can be prepared. The proximity measures corresponding to mixed states can then be computed using O(log d) resources. However, for a big quantum data ‘speedup’ in the number of training states, there are additional requirements to satisfy. For example, one needs an efficient way of generating multiple copies of a density matrix proportional to the kernel matrix for the mixed states, which take a more complex form. It is unclear how these can be generated with minimal assumptions. One might also require a simple experimental procedure for computing the corresponding proximity measure. These remain interesting challenges to be explored in future work. A. Quantum kernel PCA (mixed state) Suppose we want to find the proximity measure between the training states represented by density matrices ρi , i = 1, ..., M and the test density matrix ρ0 . For κ = 0, 1, ..., M , we can define these density matrices as PN (κ) (κ)† (κ) ρκ = l=1 El |0ih0|El , where l = 1, ..., N and El PN (κ)† (κ) are Kraus operators obeying l=1 El El = 1. We note that the pure state proximity measure in Eq. (9) is composed of O(poly(M )) inner product terms 7 of the form hψi |ψκ i. We can denote a mixed state analogue of this inner product to be hρi , ρκ i. Then a proximity measure that reproduces Eq.(9) in the pure state limit can be written M f (ρ0 ) = 1 − |hρi , ρ0 i − 1 X |Ñ0 |2 × M − 1 i=1 M X (hρi , ρj i + hρj , ρ0 i) j=1 M + M 1 X hρj , ρl i|2 , M2 j,l=1 (23) where |Ñ0 |2 = 1/(1 + 1/(M 2 |Ñc |2 ) − PM (2/M )Re[ i=1 hρ0 , ρi i/(M Ñc )] and |Ñc |2 = PM 2 1/ i,j=1 hρi , ρj i, which reduce to |N0 | and |Nc |2 in the pure state limit. In the following, we define hρi , ρκ i and propose a mixed state version of the modified swap test in Section III A 1 to find hρi , ρκ i. It is important to observe that hρi , ρκ i 6= tr(ρi ρκ ), since tr(ρi ρκ ) reduces not to hψi |ψκ i, but to the fidelity |hψi |ψκ i|2 . Thus a simple swap test between ρi and ρκ is insufficient. Instead, we define hρi , ρκ i ≡ N X l=1 (i)† h0|El (κ) El |0i, f (ρ0 ) = M X i=1 2 αi Fi0 (ρi , ρ0 ) − 1 . (25) If we can only access O(poly(M ) log(d)) resources when computing the proximity measure, we can apply the same method as for pure states in Section III B 1. It is possible to use the same swap test, but now with mixed state inputs measuring tr(ρi ρj ). This again costs ∼ O(log d) in resources for each pair i, j = 1, ..., M , or ∼ O(M 2 log d) resources for every i, j. A classical algorithm for matrix inversion can be used, costing O(M 3 ) in resources, to find the constants αi . Like the pure state case, we no longer require the assumption of having access to UC . (24) which obeys all the properties of an inner product. In the pure state limit, N = 1 and the Kraus operators be(κ) (i) come unitary El → Uκ , El → Ui , thus we can recover hψi |ψκ i. To measure hρi , ρκ i, we start with an initial state |0ih0| ⊗ |0ih0| and apply a Hadamard to the ancilla state to get ρin = |+ih+| ⊗ |0ih0|. Suppose we are given access to quantum operations E (κ,i) such that E (κ,i) (ρin ) = PN (κ) (κ,i) (κ,i) † (κ,i) = |0ih0| ⊗ El + ) where Pl ρin (Pl l=1 Pl PN (i) (κ,i) † (κ,i) |1ih1| ⊗ El . Here l=1 (PI ) Pl = 1 ⊗ 1 follows (κ) from completeness of the Kraus operators El . Af(κ,i) ter applying a Hadamard to the ancilla in E (ρin ), the probability of finding the ancilla in state |1i is PN (i)† (κ) (1/2)(1 − Re( l=1 h0|El El |0i)). Similarly, the imagPN (i)† (κ) inary components √ Im(1 1l=1  h0|El El |0i) can be found by applying (1/ 2) i −i to the ancilla instead of using the Hadamard gate. Since the final measurement outcomes of the ancilla state satisfy a Bernoulli distribution, the number of measurements N required to estimate hρi , ρκ i to precision ǫ is upper bounded by O(1/ǫ2 ), thus N . O(poly(log d)) if the error tolerated is of order O(1/ log d). Since there are O(poly(M )) of these terms to estimate for the new proximity measure f (ρ0 ), the total resource cost for the algorithm is O(poly(M log d)). B. state limit. It is possible to show that q the superfidelity p 2 F (ρi , ρj ) = tr(ρi ρj ) + 1 − tr(ρi ) 1 − tr(ρ2j ) between states ρi and ρj obey all the properties of a kernel matrix. See Appendix D. We can then extend the proxmity measure in Eq. (20) for pure states to mixed states Quantum one-class SVM (mixed state) We begin by identifying the kernel matrix to the superfidelity [38], which reduces to the fidelity in the pure V. Discussion We have shown that there exist quantum machine learning algorithms for anomaly detection to detect outliers in both pure and mixed quantum states. This can be achieved using resources only logarithmic in the dimension of the quantum states. For pure states, the resources can also be logarithmic in the number of training quantum states used. There is a wide variety of anomaly detection algorithms based on machine learning which can be extended to the quantum realm and remain yet unexplored. These new quantum algorithms might find applications not only in identifying novel quantum phenomena, but also in secure quantum data transfer, secure quantum computation and verification over the cloud. These questions may become even more important as cloud quantum computing systems evolve into a quantum internet. Acknowledgements The authors thank J. Fitzsimons, P. Wittek, R. Munoz-Tapia, J. Calsamiglia, M. Skotiniotis, G. Sentis and A. Mantri for interesting discussions. The authors also thank P. Wittek and T. Bromley for helpful comments on the manuscript. The authors acknowledge support from the National Research Foundation and Ministry of Education, Singapore. This material is based on research supported in part by the Singapore National Research Foundation under NRF Award No. NRF-NRFF2013-01. 8 A. cilla being in |1i to arrive at Generating centroid quantum state |ψc i Assume we are given the state |χi = √ PM (1/ M ) i=1 |ψi i|ii. We can apply a Hadamard operation H on each of the qubits of the second register PM PM so |χi → (1/M ) i=1 |ψi i j=1 (−1)ij |ji. Then by making a projective measurement |0 . . . 0ih0 . . . 0| on the second register we can recover the centroid quantum state |ψc i|0 . . . 0i. The success probability Pχ of this measurement is Pχ = tr(|0 . . . 0ih0 . . . 0|H ⊗ log M |χihχ|H ⊗ log M ) = 1 M2 M X i,j=1 hψi |ψj i. (A1) Generating centered quantum states |zi i and |z0 i Assume the state preparation operations as discussed in the main part. First we prepare  M X  1 1 1 √ (|0i − |1i) |0 . . . 0i → √ |0i|ii − √ |1i |ji 2 2 M j=1 (B1) which we obtain by performing |0i → |ii conditioned on the first ancilla being in |0i and performing a Hadamard conditioned on the first ancilla being in |1i. Note that √ PM the later operation |0...0i → (1/ M ) j=1 |ji requires O(log M ) Hadamard gates, thus the gate resource count is O(log M ). Now we use UC conditioned on the non-ancilla states to prepare   M X 1 1  √ |ji |0 . . . 0i |0i|ii − √ |1i 2 M j=1   M X 1 1  |0i|ii|ψi i − √ |1i →√ |ji|ψj i . 2 M j=1 Conditioned on the ancilla being in |0i uncompute the |ii   M X 1  1 →√ |0i|0i|ψi i − |1i (−1)j·k |ki|ψj i . (B4) M 2 j,k=1 Measure |0 · · · 0i in the label register to obtain The use of O(log M ) Hadamard gates means that our gate resource count is O(log M ). In the scenario where the covariance matrix C is low rank, we expect most training samples to be rather similar, i.e., hψi |ψj i ∼ constant. This is also the case in anomaly detection where we expect the training samples to belong to a single ‘type’ and hence expect high state fidelities between these states. In this case, Pχ ∼ O(1). We observe that Eq. (A1) also allows us to recover |Nc |2 = 1/(M 2 Pχ ) using O(log M ) resources. B.   M X 1 1  |0i|ii|ψi i − (−1)j·k |ki|ψj i . (B3) |1i →√ M 2 j,k=1   M X 1 |0i|ψi i − |1i |ψj i . →q M 1 ′ 1 + M 2 j,j ′ =1 hψj |ψj i j=1 (B5) The success probability of this measurement is then given by   M X 1 1 (B6) hψj ′ |ψj i . P0···0 = 1 + 2 2 M ′ 1 PM j,j =1 √ As the final step measure the ancilla in (|0i + |1i)/ 2 to obtain   M X 1 (B7) |ψj i ≡ |zi i, → Ni |ψi i − M j=1 |Ni |2 where PM = 1/(1 − 2Re[ PM j=1 hψi |ψj i]/M ] + j,k=1 hψj |ψk i). The success probability of the final measurement is given by (i) P0 = 2(1 + 1 M2 1 PM j,j ′ =1 hψj ′ |ψj i) ×  M M X X 1 2 1 − Re(hψi |ψj i) + 2 hψj ′ |ψj i (B8) M j=1 M ′  j,j =1 (i) Thus the total probability of success is P0···0 P0 , which scales as ∼ O(1) following a similar argument to Appendix A. Note that in the limit where the training states are exactly the same, the state |zi i does not exist, i.e., (i) |zi i = 0 and thus the success probability P0 = 0 as expected. C. Generating state |χc i for kernel PCA (B2) Now perform another Hadamard conditioned on the an- The state preparation of |χc i works essentially the same way as in Appendix B, except that we need a superposition of the label i. We prepare by calling 9 the controlled state preparation operation and controlled Hadamards in the following way √ M 1 X |ii (|0i − |1i) |0 . . . 0i|0 . . . 0i 2M i=1 →√ 1 × 2M   M X 1 |ii |0i|0 . . . 0i|ψi i − √ |1i |ji|0 . . . 0i M i=1 j=1 M X →√ →√ M X i=1  (C1)  M M X 1 1 X  |ii |0i|0 . . . 0i|ψi i − √ |1i |ji|ψj i 2M i=1 M j=1 1 × 2M  |ii |0i|0 . . . 0i|ψi i − 1 |1i M M X j,k=1  (−1)j·k |ki|ψj i . (C2) Here, like in Appendices A and B we employed O(log M ) Hadamard gates for the transformation |0...0i → √ PM (1/ M ) j=1 |ji, thus our gate resource count scale as O(log M ). We then measure |0 · · · 0i in the label register to obtain the state   PM PM i=1 |ii |0i|ψi i − (1/M )|1i j=1 |ψj i q (C3) PM M + 1/M j,j ′ =1 hψj |ψj ′ i The success probability of this measurement is given by D. Positive semi-definite kernel matrix K and fidelity measure Let k(A, B) be a kernel function, where A and B are matrices. This kernel function is called positive semidefinite if the kernel matrix Kij = k(ρi , ρj ) is positive semidefinite for any training set {ρi }, where ρi are density matrices. For showing that a kernel function corresponds to a positive P kernel matrix, we have by definition to show that ij ci cj Kij ≥ 0 for all vectors ~c and all training sets. Alternatively, positivity holds if there exist some feature map function φ(ρi ) such that the kernel matrix can be defined as a matrix of inner products Kij = φ(ρi )† φ(ρj ). A large number of density matrix fidelity and distance measures have been discussed [38–41]. Here, we take the kernel function given by the superfidelity [38], defined by the symmetric function q q F (ρi , ρj ) = tr(ρi ρj ) + 1 − tr(ρ2i ) 1 − tr(ρ2j ). (D1) The matrix entries of superfidelity are all real, since the expression tr(ρi ρj ) is real and positive. We can see this by observing there exist matrices A, B such that ρi = AT A and ρj = B T B, since ρi , ρj are all positive semidefinite. Then tr(ρi ρj ) = tr((BAT )(AB T )) = tr(C T C) where C = AB T and C T C is by definition positive semidefinite. It also turns out that the super fidelity can indeed be written via a feature map, by noting that tr(ρi ρj ) = ρ~i † ρ~j , (D2) where ρ ~ is a vectorized form of a matrix ρ by stacking the columns of ρ. So the feature map is   ρ ~i φ(ρi ) = p (D3) 1 − tr(ρ2i ) and the inner product P0···0 = 1 1 + 2 2M 2 M X j,j ′ =1 hψj |ψj ′ i (C4) √ As the final step measure the ancilla in (|0i + |1i)/ 2 to obtain:   M M X X 1 (C5) |ψj i ≡ |χc i, → Nc |ii |ψi i − M j=1 i=1 PM where |Nc |2 = 1/(M − 1/M j,j ′ =1 hψj |ψj′ i]). The success probability of the final measurement is then given by P0 = 1 − 1/M 2 1 + 1/M PM ′ j,j ′ =1 hψj |ψj i PM 2 j,j ′ =1 hψj |ψj ′ i (C6) where P0 ∼ O(1) following a similar argument to Appendix A. φ(ρi )† φ(ρj ) q q ρ2i ) 1 − tr(~ ρ2j ) = Kij =ρ ~†i ρ ~j + 1 − tr(~ (D4) leads to a valid positive kernel matrix. E. Hadamard product and transpose Given two matrices ρ1 and ρ2 . The task is to apply the element-wise product of these matrices as a Hamiltonian to another quantum state. This is for the purpose of simulating the kernel matrix related to the one-class SVM. The standard swap matrix for quantum PCA is S = PM j,k=1 |jihk| ⊗ |kihj|. Now, take the modified swap operator S′ = M X j,k=1 |kihj| ⊗ |jihk| ⊗ |kihj|. (E1) 10 It is at most 1-sparse because each |jkji gets mapped to a unique |kjki while states |jkli get mapped to zero for j 6= k 6= l. It is also efficiently computable, thus efficiently simulatable as a Hamiltonian. With an arbitrary state σ, we can perform the following operation tr1,2 {e−iS ′ ∆t (ρ1 ⊗ ρ2 ⊗ σ)eiS ′ ∆t }. iS ′ ∆t tr1,2 {e (ρ1 ⊗ ρ2 ⊗ σ)e ′ 1 − itr1,2 {S (ρ1 ⊗ ρ2 ⊗ σ)}∆t }= + itr1,2 {(ρ1 ⊗ ρ2 ⊗ σ)S ′ }∆t + O(∆t2 ). j,k=1 = M X n,m,j,k=1 = M X j,k=1 |kihj| ⊗ |jihk| ⊗ |kihj|(ρ1 ⊗ ρ2 ⊗ σ)} hn|kihj|ρ1 |nihm|jihk|ρ2 |mi|kihj|σ ′ ∆t } 2 ∗ ρ2 ), σ]∆t + O(∆t ). (E6) F. Generating states |AR i and |AI i In Sec. III√B 2 c, we required the generation of states |AR√ i = (1/ 2)(|0i|0i|0i|0i + |1i|0i|~ αi|0i) and |AI i = (1/ 2)(|0i|0i|0i|0i + i|1i|0i|~ αi|0i). We begin with a control-unitary operation creating the state (F1) √ PM where |~ei = (1/ M ) i=1 |ii and R is the controlrotation on the ancilla (right-most qubit) used in the quantum matrix inversion algorithm [35]. Upon measuring this ancilla in the state |1i (F2) (E4) In the same manner we can show tr1,2 {(ρ1 ⊗ ρ2 ⊗ σ)S ′ } = σ(ρT1 ∗ ρ2 ). i[(ρT1 (ρ1 ⊗ ρ2 ⊗ σ)eiS |0i|000i|1i + |1i|0~e 0iR|0i −→ |0i|000i|1i + |1i|0~ α0i|1i. hj|ρ1 |kihk|ρ2 |ji|kihj|σ = (ρT1 ∗ ρ2 )σ. ∆t (|0i + |1i)|000i|0i −→ |0i|000i|1i + |1i|0~e 0iR|0i, tr1,2 {S ′ (ρ1 ⊗ ρ2 ⊗ σ)} M X =σ− ′ (E3) The first O(∆t) term is = tr1,2 { tr1,2 {e−iS (E2) We perform the infinitesimal operation with the matrix S ′ . The trace is over the ρ1 and ρ2 subspaces. Expanding to O(∆t2 ) leads to −iS ′ ∆t Thus in summary, we have shown that (E5) [1] J. Biamonte, P. Wittek, N. Pancotti, P. Rebentrost, N. Wiebe, and S. Lloyd, Nature 549, 195 (2017). [2] V. Dunjko, J. M. Taylor, and H. J. Briegel, Phys. Rev. Lett. 117, 130501 (2016). [3] V. Dunjko and H. J. Briegel, arXiv:1709.02779 (2017). [4] C. Ciliberto, M. Herbster, A. D. Ialongo, M. Pontil, A. Rocchetto, S. Severini, and L. Wossnig, arXiv:1707.08561 (2017). [5] P. Rebentrost, M. Mohseni, and S. Lloyd, Phys. Rev. Lett. 113, 130503 (2014). [6] S. Lloyd, M. Mohseni, and P. Rebentrost, arXiv:1307.0411 (2013). [7] N. Wiebe, D. Braun, and S. Lloyd, Phys. Rev. Lett 109, 050505 (2012). [8] N. Wiebe, A. Kapoor, and K. Svore, arXiv:1401.2142 (2014). [9] M. Schuld, I. Sinayskiy, and F. Petruccione, Phys. Rev. A 94, 022342 (2016). [10] E. Aı̈meur, G. Brassard, and S. Gambs, in Proceedings of the 24th international conference on machine learning (ACM, 2007), pp. 1–8. [11] E. Aı̈meur, G. Brassard, and S. Gambs, Machine Learn- Thus by eliminating the ancilla in state |1i, we can generate √ |AR i. Similarly, if we instead begin with state (1/ 2)(|0i + i|1i)|000i|0i in Eq. (F1), then we can generate the state |AI i. ing 90, 261 (2013). [12] V. Chandola, A. Banerjee, and V. Kumar, ACM computing surveys (CSUR) 41, 15 (2009). [13] M. A. Pimentel, D. A. Clifton, L. Clifton, and L. Tarassenko, Signal Processing 99, 215 (2014). [14] S. Hara, T. Ono, R. Okamoto, T. Washio, and S. Takeuchi, Phys. Rev. A 89, 022104 (2014). [15] S. Hara, T. Ono, R. Okamoto, T. Washio, and S. Takeuchi, Phys. Rev. A 94, 042341 (2016). [16] V. Giovannetti, S. Lloyd, and L. Maccone, Phys. Rev. Lett. 100, 160501 (2008). [17] A. Monràs, G. Sentı́s, and P. Wittek, Phys. Rev. Lett. 118, 190503 (2017). [18] G. Sentı́s, J. Calsamiglia, R. Munoz-Tapia, and E. Bagan, Sci. Rep. 2, 708 (2012). [19] A. Bisio, G. Chiribella, G. M. DAriano, S. Facchini, and P. Perinotti, Phys. Rev. A 81, 032324 (2010). [20] M. Sasaki, A. Carlini, and R. Jozsa, Phys. Rev. A 64, 022317 (2001). [21] F. Shahi and A. T. Rezakhani, arXiv:1704.01965 (2017). [22] H. Hoffmann, Pattern Recognit. 40, 863 (2007). [23] M.-L. Shyu, S.-C. Chen, K. Sarinnapakorn, and 11 L. Chang, Tech. Rep., DTIC Document (2003). [24] B. Schölkopf, R. C. Williamson, A. J. Smola, J. ShaweTaylor, and J. C. Platt, in Advances in neural information processing systems (2000), pp. 582–588. [25] Y.-S. Choi, Pattern Recogn. Lett. 30, 1236 (2009). [26] M. A. Nielsen and I. L. Chuang, Quantum Computation and Quantum Information, Cambridge University Press, UK (2010). [27] M. Basseville, I. V. Nikiforov, et al., Detection of abrupt changes: theory and application, vol. 104 (Prentice Hall Englewood Cliffs, 1993). [28] G. Sentı́s, E. Bagan, J. Calsamiglia, G. Chiribella, and R. Munoz-Tapia, Phys. Rev. Lett. 117, 150502 (2016). [29] S. Lloyd, M. Mohseni, and P. Rebentrost, Nature Phys. 10, 631 (2014). [30] R. Chatterjee and T. Yu, arXiv:1612.03713 (2016). [31] B. Schölkopf, J. C. Platt, J. Shawe-Taylor, A. J. Smola, and R. C. Williamson, Neural Comput. 13, 1443 (2001). [32] K. Muandet and B. Schölkopf, arXiv:1303.0309 (2013). [33] J. Ma and S. Perkins, in Neural Networks, 2003. Proceedings of the International Joint Conference on (IEEE, 2003), vol. 3, pp. 1741–1745. [34] H. Buhrman, R. Cleve, J. Watrous, and R. De Wolf, Phys. Rev. Lett. 87, 167902 (2001). [35] A. W. Harrow, A. Hassidim, and S. Lloyd, Phys. Rev. Lett. 103, 150502 (2009). [36] S. Aaronson, Nature Phys. 11, 291 (2015). [37] W. S. Warren, Science 277, 1688 (1997). [38] J. L. A. Miszczak, Z. P. LA, P. Horodecki, A. Uhlmann, and K. Zyczkowski, Quantum Inf. Comput. 9, 0103 (2009). [39] R. Jozsa, J. Mod. Opt. 41, 2315 (1994). [40] X. Wang, C.-S. Yu, and X. Yi, Phys. Lett. A 373, 58 (2008). [41] Z. Puchala, J. A. Miszczak, P. Gawron, and B. Gardas, Quantum Inf. Process 10, 1 (2011). [42] M. Araújo, A. Feix, F. Costa, and Č. Brukner, New J. Phys. 16, 093026 (2014). [43] For a simplified version of a supervised learning algorithm for quantum states, see [21]. Also see [21] for an unsupervised learning algorithm for quantum states. [44] We note that the special case c = 0 can be interpreted as when the training data are all clustered around the origin of the feature space. However, it is not an appropriate choice, since solving for (K + PT M 1)~ α = ~0 would imply the positive semidefinite matrix K has negative eigenvalues −PT M , which is a contradiction. [45] For example, in a photonic setting, one can act u1 along one path of an interferometer and u2 acting along the other path. [46] We note that UC can be implemented for example using passive linear optical elements [42].