Kernel Methods in Quantum Machine Learning
Kernel Methods in Quantum Machine Learning
https://doi.org/10.1007/s42484-019-00007-4
REVIEW ARTICLE
Received: 28 April 2019 / Accepted: 22 September 2019 / Published online: 15 November 2019
© Springer Nature Switzerland AG 2019
Abstract
Quantum Machine Learning has established itself as one of the most promising applications of quantum computers and
Noisy Intermediate Scale Quantum (NISQ) devices. In this paper, we review the latest developments regarding the usage of
quantum computing for a particular class of machine learning algorithms known as kernel methods.
Fig. 1 The first letter in each box refers to whether the system under where (xi , yi ), with i = 1 . . . M and yi ∈ {−1, +1}, is the
study is classical or quantum, while the second letter indicates whether
pair of training vector and label, w is the vector which is
a classical or quantum information processing device is used
normal to the discriminative hyperplane, and b is the offset
of the hyperplane.
power of quantum computing to deal specifically with An important extension of the SVM method described
classically intractable kernels. above is the so called soft margin SVM, where the best hyper-
plane is the one that reaches the optimal trade-off between
two factors: the minimization of the margin and the restraint
2 Kernel methods and SVM of the point deviation from the margin; the latter is expressed
by means of slack variables ξi tuned by a hyper-parameter
Kernel methods (Theodoridis 2008) are classification C. A soft margin SVM optimization problem is of the form
algorithms that use a kernel function K in order to map data
points, living in the input space V , to a higher dimensional 1 M
calculation of the point coordinates in the new space by subject to the constraint
means of so called kernel trick, which allows us to work in
the feature space V simply computing the kernel of pairs of ∀i yi (w · xi − b) ≥ 1 − ξi , ξi ≥ 0. (1)
data points in the input space (Theodoridis 2008). Usually it is convenient to switch to the dual form, where
Intuitively, the “trick” consists in considering the Lagrange multipliers αi are introduced in order to include
following scenario. Let φ : V → V , be a map from the the constraint in the objective function, by obtaining the
input space V to the enhanced feature space V . Then a formulation:
kernel K : V × V → R is a function
M
1
K(xi , xj ) ≡ φ(xi ), φ(xj ) , arg max αi − αi αj yi yj (xTi xj ),
(αi ) 2
representing the inner product ·, · in V , that must satisfy
i=1 i,j
the Mercer condition (Mercer et al. 1909; Mohri et al. 2012) with w = αi yi xi , subject to αi yi = 0 , ∀i αi ≥ 0.
of positive semi-definiteness, i.e., for all choices of n real i i
It is worth noticing that only a sparse subset of the
numbers (c1 , . . . , cn ) the following relation must hold
αi s are non-zero and that the corresponding xi are the
M
M
support vectors which lie on the margin and determine the
K(xi , xj )ci cj ≥ 0. discriminant hyperplane.
i=1 j =1
In this context, a non-linear classification boundary for
Clearly, calculating the kernel K(xi , xj ) is computationally the SVM is obtained by replacing the term (xTi xj ) in
cheaper than computing coordinates for each new point the objective function with a kernel function K(xi , xj ) ≡
φ(x), and, on the other hand, we are never required to φ(xi )T (φ(xj )) satisfying the Mercer condition of positive
explicitly compute φ(xi ) at any stage of the algorithm. The semi-definiteness. The Lagrangian optimization problem
existence of a concrete mapping φ : V → V is guaranteed for the soft margin SVM now becomes
by the Mercer theorem (Mercer et al. 1909; Mohri et al.
2012), provided that the kernel function K(xi , xj ) gives rise
M
1
arg max αi − αi αj yi yj K(xi , xj ),
to a kernel matrix obeying the Mercer condition. (αi ) 2
i=1 i,j
Support Vector Machine (SVM) is the best known
example of kernel method. This supervised binary classifier subject to i αi yi = 0 with ∀i αi ≥ 0.
Quantum Machine Intelligence (2019) 1:65–71 67
Note that the dual form of the SVM optimization problem formulation has become very popular in the last few years
is quadratic in the parameter αi and it can be efficiently and it is often referred to as the Quantum SVM (QSVM)
solved with quadratic programming algorithms. algorithm. In order to understand QSVM it is necessary to
An alternative version of SVM that has a central role clarify that classical input training vectors x are represented
in the quantum formulation of the problem is the least- by means of quantum states of the form
squares support vector machines (LS-SVM) (Suykens and
1
N
Vandewalle 1999). Here, the constraint defined in Eq. 1 is | x = (x)k |k ,
replaced by the equality constraint |x|
k=1
A different approach was proposed by Rebentrost, In the training set basis, the solution state for the LS-SVM is
Mohseni and Lloyd (Rebentrost et al. 2014), which pre-
M
1
sented a completely new quantum algorithm that imple- |b, α = M
b | 0 + αk | k .
ments SVM on a circuit-based quantum computer. This b2 + 2
k=1 αk k=1
68 Quantum Machine Intelligence (2019) 1:65–71
The process of classifying new data |x with trained a quantum device. In this context, we can recognize two
| α, β requires the implementation of the query oracle common threads. On one side, a hybrid classical-quantum
learning model takes classical input and evaluates a ker-
M nel function on a quantum devices, while classification is
1
| ũ = 1 b | 0 | 0+ |xk | αk | k | xk performed in the standard classical manner (e.g employing
M 2
b2 + k=1 αk |xk |
2 2 k=1 a SVM algorithm). In the second approach instead, a ker-
(3) nel based variational quantum circuit is trained to classify
input data. More specifically, a variational quantum circuit
and also the query state (Mcclean et al. 2016) is a hybrid quantum-classical algo-
1
M rithm employing a quantum circuit U (θ ) that depends on
| x̃ = | 0 | 0 + |x| | k | x . (4) a set of parameters θ which are varied in order to minimize
M|x|2 + 1
k=1 a given objective function (see Fig. 2). The quantum circuit
The classification is obtained by computing the inner is hence trained by a classical iterative optimization algo-
product x̃|ũ via a swap test (Buhrman et al. 2001). This rithm that at every step finds best candidates θ starting from
means that, with the help of an ancillary qubit, the state random (or pre-trained) initial values.
|ψ = √1 ( | 0a | ũ + | 1a | x̃) is constructed and then Schuld and Killoran recently explored this concepts
2
(Schuld and Killoran 2019) remarking the strict relation
measured in the state |φ = √1 ( | 0a − | 1a ) with a success
2 between quantum states and feature maps. The authors
probability given by P = |ψ|φ|2 = 12 (1 − x̃|ũ). explain that the key element in both quantum computing
The probability P can be estimated to accuracy in and kernel methods is to perform computations in a high
O( P (1−P
2
)
). The class label is decided depending on the dimensional (possibly infinite) Hilbert space via an efficient
value of P : if it is greater than 12 , then |x is labelled −1; if manipulation of inputs.
it is less than 12 , then the label of |x is 1. In fact it is possible to interpret the encoding of classical
The overall time complexity for both training and inputs xi into a quantum state |φ(x) as a feature map φ
classification of the LS-SVM is of O(log(NM)). which maps classical vectors to the Hilbert space associated
In the QSVM algorithm, kernelization can be achieved with a system of qubits. As said before, two ways of
by acting on the training vector basis, i.e., by mapping each exploiting this parallelism are described.
|xi to a d-fold tensor product In the first approach, called by the authors implicit, a
quantum device takes classical input and evaluates a kernel
|φ(xi ) = |xi 1 ⊗ |xi 2 ⊗ ... ⊗ |xi d .
function as part of a hybrid classification model. This
This allows us to obtain polynomial kernels of the form requires the use of a quantum circuit Uφ (x) implementing
d the mapping
K( xi |xj ) ≡ φ(xi )|φ(xj ) = xi |xj
φ : x → |φ(x) = Uφ (x)|000..0
that can be computed in O(d −1 logN). Note that in the
QSVM, the kernel evaluation is directly performed in the and which is able to produce a kernel
high dimensional feature quantum space, while in classical K(xi , xj ) = 000..0| Uφ† (xi )Uφ (xj ) |000..0
SVM the kernel trick avoids such expensive calculation.
In order for quantum computing to be helpful, such
However, this is no problem in the quantum case thanks
kernel shouldn’t be efficiently simulated by a classical com-
to the exponential quantum speed-up obtained in the
puter. It is therefore posed the question of what type of fea-
evaluation of inner products.
ture map circuits Uφ leads to powerful kernels for classical
An experimental implementation of the QSVM have
been shown in Li et al. (2015) and Patrick et al. (2018). Also,
in Windridge et al. (2018), the authors propose a quantized
version of Error Correction Output Codes (ECOC) which
extends the QSVM algorithm to the multi-class case and
enables it to perform an error correction on the label
allocation.
learning models like SVM but at the same time are classi- The two methods are tested on an artificial dataset x ∈
cally intractable. The authors suggest that a way to achieve T ∪ S ≡ Ω ⊂ (0, 2π ]2 where T and S are respectively
such a goal is to employ non-Gaussian elements (e.g., cubic the training and test sets. This classical input is previously
phase gate or photon number measurements) as part of the encoded as φS (x) ∈ R where φS (x) = (π − x1 )(π − x2 ).
quantum circuit Uφ (x) implementing the mapping to the On the basis that, in order to obtain an advantage over
feature space. classical approaches, feature maps need to be based on a
The second approach, addressed in the paper as explicit, circuit that is hard to simulate with classical means, the
uses a variational quantum circuit to directly learn a decision authors propose a feature map on n-qubits generated by the
boundary in the quantum Hilbert space. In their example, the unitary
authors first translate classical input to a quantum squeezed
state
U (x) = U(x) H ⊗n U(x) H ⊗n
∞ √
1 (2n)!
x → |φ(x) = √ (− expix tanh(c))n |2n,
cosh(c) n=0 n n!
2
⎛ ⎞
The components of such gates are, more explicitly, U(x) = exp ⎝i φS (x) Zk ⎠ ,
S⊆[n] k∈S
BS(θ1 , θ2 ) = eθ1 (eiθ2 â1† â2 − e−iθ2 â1 â2† ),
with θ1 , θ2 ∈ R and â, â † creation and annihilation with Zk being the phase shift gate of angle k and S the test
operators; set. Such circuit acts on |0n as initial state and uses classical
√ data previously encoded in φS (x).
D(z) = e 2i(Im(z)x̂−Re(z)p̂)
, The exact classical evaluation of the inner-product (i.e.,
kernel) between two states obtained using a circuit U (x)
with complex displacement z and finally the quadratic and
is #P - hard because it is associate to a Tutte partition
cubic phase gates
function which is hard to simulate classically (Goldberg and
u 2 u 3
P (u) = ei 2 x̂ and V (u) = ei 3 x̂ . Guo 2017).
A different approach is taken in Di Pierro et al. (2017),
The probability of measuring the Fock state |n1 , n2 in the where the same idea of using quantum computation to
state |2, 0 or |0, 2 is interpreted as the probability that the evaluate a kernel is discussed in the context of Topological
classifier predicts class y = 0 or y = 1 Quantum Computation (TQC).
TQC represent a model of quantum computing poly-
p(|2, 0) = p(y = 0) and p(|0, 2) = p(y = 1)
nomially equivalent to the circuit based where, instead
The authors trained such a model on the ‘moons’ dataset of using qubits and gates, the computation is performed
using stochastic gradient descent and showed that training braiding two-dimensional quasi particles called anyons
loss s converges to zero after about 200 iterations. (Pachos 2012). Moreover, it is well known that some com-
Along the same path, simultaneously to Schuld and putational problems, such as the approximation of the Jones
Killoran (2019), Havlicek et al. (2019) propose two Polynomial, i.e., an invariant of links and knots, have
classifiers that map classical data into quantum feature a more straightforward implementation in TQC (Aharonov
Hilbert space in order to get a quantum advantage. Again, et al. 2006).
one SVM classifier is based on a variational circuit that The approach proposed in Di Pierro et al. (2017) is based
generates a separating hyperplane in the quantum feature on an encoding of input classical data x in the form of binary
space, while the other classifier only estimates the kernel strings into braids, which in TQC are expressed by means
function on the quantum computer. of evolution operators B. This encoding is constructed by
70 Quantum Machine Intelligence (2019) 1:65–71
Quantum version Grover algorithm Quantum optimization for training support vector
of SVM machines (Anguita et al. 2003)
Quantum version HHL algorithm Quantum support vector machine for big data
of SVM classification (Rebentrost et al. 2014)
Experimental NMR 4-qubit quantum Experimental implementation of a quantum sup-
processor port vector machine (Li et al. 2015)
Experimental IBM quantum Quantum algorithm Implementations for begin-
experience ners (Patrick et al. 2018)
Quantum version of HHL algorithm Quantum error-correcting output codes (Win-]
SVM and ECOC dridge et al. 2018)
Kernel methods Variational quantum Quantum machine learning in feature Hilbert
circuit spaces (Schuld and Killoran 2019)
Kernel methods Variational quantum Supervised learning with quantum-enhanced fea-
circuit ture spaces (Havlicek et al. 2019)
Kernel methods Topological quantum Hamming distance kernelisation via topological
computation quantum computation (Di Pierro et al. 2017)
mapping the bit value 0 to the crossing operator σi , and the polynomial belongs to the #P - hard class (Goldberg and
bit value 1 to the adjoint crossing operator σi † : Guo 2017).
5 Conclusion
Hence, a given binary string of length n is uniquely
represented by a pairwise braiding of 2n strands, i.e., by a In this paper, we have reviewed the main approaches
braid B ∈ B2n as shown below. to the design of algorithms for kernel methods in ML,
which exploit the power of quantum computing to achieve
a computational advantage with respect to the classical
approaches. We divided the literature on this problem into
two main categories. On the one side, there are attempts
Therefore, applying the braiding Bu associated with the
to formulate quantum versions of support vector machine
binary string u to the vacuum state of the anyonic quantum
running on a gate model quantum computer. On the other
system |ψ defines an embedding φ into the Hilbert space
side, we grouped different approaches whose core idea
H of the anyonic configurations:
relies on the use of quantum computing techniques in order
φ : u → Bu |ψ to deal with classically intractable kernels. In Table 1, we
The authors finally show that scalar product of anyonic give a schematic description of the various results that we
quantum states obtained with such mapping generates a have discussed together with the relative article where they
kernel that depends on the hamming distance between the appear.
input strings as follows
dH (u,v)
Hopf References
K(u, v) ≡ ψ| B†u Bv |ψ =
d
dH (u,v)
A4 + A−4 Agresti I et al (2019) Pattern recognition techniques for boson
= sampling validation. Phys Rev X 9:14
A2 + A−2 Aharonov D, Jones V, Landau Z (2006) A polynomial quantum algo-
where Hopf indicates the Kaufman polynomial (Kauff- rithm for approximating the Jones polynomial. In: Proceedings
of the 38th annual ACM symposium on theory of computing,
man 1987) in the variable A that is associated to the so pp 427–436
called Hopf link, d = A2 + A−2 and dH (u, v) is the Aı̈meur et al (2013) Quantum speed-up for unsupervised learning.
hamming distance between input strings u and v. Mach Learn 90:261–287
Despite this example does not provide a computationally Amin MH et al (2018) Quantum Boltzmann machine. Phys Rev X 8:11
Anguita D et al (2003) Quantum optimization for training support
hard kernel, the authors suggest that a more complex braid vector machines. Neural Netw 16:763–770
mapping of the input may lead naturally to a classically Arunachalam S, Wolf Ronaldde (2017) A survey of quantum learning
intractable kernel since the calculation of Kaufman theory, arXiv:1701.06806
Quantum Machine Intelligence (2019) 1:65–71 71
Barry J et al (2014) Quantum partially observable Markov decision Lloyd S, Mohseni M, Rebentrost P (2014) Quantum principal
processes. Phys Rev A 90:032311 component analysis. Nat Phys 10:631–633
Benedetti M et al (2019) Adversarial quantum circuit learning for pure Lu S, Braunstein SL (2014) Quantum decision tree classifier. Quantum
state approximation. New J Phys 21:043023 Inf Process 13:757–770
Biamonte J et al (2017) Quantum machine learning. Nature 549:195– Mcclean JR, Romero J, Babbush R, Aspuru-Guzik A (2016) The
202 theory of variational hybrid quantum-classical algorithms. New J
Bishop C (2016) Pattern recognition and machine learning, vol 738. Phys 18:023023
Springer, New York Mercer J et al (1909) Functions of positive and negative type and their
Bottarelli L et al (2018) Biclustering with a quantum annealer. Soft connection the theory of integral equations, 209 Philosophical
Comput 22:6247–6260 Transactions of the Royal Society of London
Buhrman H, Cleve R, Watrous J, De Wolf R (2001) Quantum Mikhail V et al (2016) Altaisky towards a feasible implementation
fingerprinting. Phys Rev Lett 87:4 of quantum neural networks using quantum dots. Appl Phys Lett
Canabarro A, Fernandes Fanchini F, Malvezzi AL, Pereira R, Chaves 108:103108
R (2019) Unveiling phase transitions with machine learning. Mitchell T (1997) Machine learning. McGraw Hill, New York
arXiv:1904.01486 Mohri M et al (2012) Foundations of machine learning, vol 432. MIT
Ciliberto C et al (2018) Quantum machine learning: a classical Press, Cambridge
perspective. Proc R Soc A: Math Phys Eng Sci 474:20170551 Nielsen MA, Chuang IL (2011) Quantum computation and quantum
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn information. Cambridge University Press, New York
20:273–297 O’Driscoll L et al (2019) A hybrid machine learning algorithm for
Crawford D et al (2016) Reinforcement learning using quantum designing quantum experiments. Quantum Mach Intell 1:1–11
Boltzmann machines, arXiv:1612.05695 Pachos JK (2012) Introduction to topological quantum computation.
Di Pierro A et al (2017) Distance kernelisation via topological Cambridge University Press, New York
quantum computation theory and practice of natural computing. Patrick J et al (2018) Coles quantum algorithm implementations for
Lect Notes Comput Sci 10687:269–280 beginners, arXiv:1804.03719
Di Pierro A et al (2018) Homological analysis of multi-qubit Perdomo-Ortiz A et al (2018) Opportunities and challenges for
entanglement. Europhys Lett 123:30006 quantum-assisted machine learning in near-term quantum comput-
Dong XY, Pollmann F, Zhang XF (2019) Machine learning of ers. Quantum Sci Technol 3:030502
quantum phase transitions. Phys Rev B 99:121104 Rebentrost P, Mohseni M, Lloyd S (2014) Quantum support vector
Dunjko V, Briegel HJ (2018) Machine learning & artificial intelligence machine for big data classification. Phys Rev Lett 113:5
in the quantum domain: a review of recent progress. Rep Prog Phys Schuld M, Killoran N (2019) Quantum machine learning in feature
81:074001 Hilbert spaces. Phys Rev Lett 122:6
Dunjko V et al (2016) Quantum-enhanced machine learning. Phys Rev Schuld M, Petruccione F (2018) Supervised learning with quantum
Lett 117:6 computers, vol 287. Springer International Publishing, Berlin
Giovannetti V, Lloyd S, Maccone L (2008) Quantum random access Schuld M, Sinayskiy I, Petruccione F (2015) An introduction to
memory. Phys Rev Lett 100:4 quantum machine learning. Contemp Phys 56(2):172–185
Goldberg LA, Guo H (2017) The complexity of approximating Sergioli G et al (2018) A quantum-inspired version of the nearest mean
complex-valued ising and tutte partition functions. Computational classifier. Soft Comput 22:691–705
Complexity 26:765–833 Stoudenmire E, Schwab DJ (2016) Supervised learning with tensor
Gray J et al (2018) Machine-learning-assisted many-body entangle- networks. Advances in neural information processing systems
ment measurement. Phys Rev Lett 121:6 (NIPS Proceedings) 29:4799–4807
Harrow AW, Hassidim A, Lloyd S (2009) Quantum algorithm for Suykens JAK, Vandewalle J (1999) Least squares support vector
linear systems of equations. Phys Rev Lett 103:4 machine classifiers. Neural Process Lett 9:293–300
Havlicek V, Córcoles AD et al (2019) Supervised learning with Theodoridis S (2008) Pattern recognition, vol 984. Elsevier Academic
quantum-enhanced feature spaces. Nature 567:2019–212 Press, Cambridge
Heim B et al (2015) Quantum versus classical annealing of ising spin Wiebe N et al (2015) Quantum algorithms for nearest-neighbours
glasses. Science 348:215–217 methods for supervised and unsupervised learning. Quantum Info
Huembeli P et al (2019) Automated discovery of characteristic features Comput 15:316–356
of phase transitions in many-body localization. Phys Rev B 99:6 Windridge D, Mengoni R, Nagarajan R (2018) Quantum error-
Iten R et al (2018) Discovering physical concepts with neural correcting output codes. Int J Quantum Info 16:1840003
networks, arXiv:1807.10300 Wittek P (2014) Quantum machine learning, vol 176. Elsevier
Kauffman LH (1987) State models and the Jones polynomial. Academic Press, Cambridge
Topology 26:395–407 Yu S, Albarrán-Arriagada F, Retamal JC, Wang YT, Liu W, Ke ZJ,
Levine Y et al (2018) Deep learning and quantum entanglement: Meng Y, Li ZP, Tang JS, Solano E, Lamata L, Li CF, Guo GC
fundamental connections with implications to network design. In: (2019) Adv Quantum Technol 2(7–8):1800074
International conference on learning representations
Li Z, Liu X, Xu N, Du J (2015) Experimental realization of a quantum Publisher’s Note Springer Nature remains neutral with regard to
support vector machine. Phys Rev Lett 114:5 jurisdictional claims in published maps and institutional affiliations.