Quantum Machine Learning in Feature Hilbert Spaces: Maria@xanadu - Ai
Quantum Machine Learning in Feature Hilbert Spaces: Maria@xanadu - Ai
Quantum Machine Learning in Feature Hilbert Spaces: Maria@xanadu - Ai
kernel. This kernel can be fed into any classical kernel method such as a support vector machine. In
the second approach, we can use a variational quantum circuit as a linear model that classifies data
explicitly in Hilbert space. We illustrate these ideas with a feature map based on squeezing in a
continuous-variable system, and visualise the working principle with 2-dimensional mini-benchmark
datasets.
M
A central result of this paper is that the idea of em- X 0
cm c∗m0 κ(xm , xm ) ≥ 0.
bedding data into a quantum Hilbert space opens up
m,m0 =1
a promising avenue to quantum machine learning, in
which we can generically use quantum devices for pat- By definition of the inner product, every feature map
tern recognition. The implicit and explicit approaches gives rise to a kernel.
are not only hardware-independent, but also suitable for
intermediate-term quantum technologies, which allows us Theorem 1. Let φ : X → F be a feature map. The inner
to test them with the generation of quantum comput- product of two inputs mapped to feature space defines a
ers that is currently being developed. Nonlinear feature kernel via
maps also circumvent the need to implement nonlinear
transformations on amplitude-encoded data, and thereby κ(x, x0 ) := hφ(x), φ(x0 )iF , (1)
solve an outstanding problem in quantum machine learn-
ing which we will come back to in the conclusion. where h·, ·iF is the inner product defined on F.
φ : X → F, which we call a quantum feature map here. popular input encoding techniques in quantum machine
According to Theorem 1 we can derive a kernel κ from learning.
this feature map via Eq. (1). By virtue of Theorem
2, the kernel is the reproducing kernel of an RKHS Rκ a. Basis encoding. Many quantum machine learning
as defined in Eq. (3). The functions in Rκ are the inner algorithms assume that the inputs x to the computation
products of the ‘feature-mapped’ input data and a vector are encoded as binary strings represented by a compu-
|wi ∈ F, which defines a linear model tational basis state of the qubits [12, 21]. For exam-
ple, x = 01001 is represented by the 5-qubit basis state
f (x; w) = hw|φ(x)i (5) |01001i. The computational basis state corresponds to a
Note that we use Dirac brackets h·| · i instead of the inner standard basis vector |ii (with i being the integer rep-
product h·, ·i to signify that we are calculating inner prod- resentation of the bitstring) in a 2n -dimensional Hilbert
ucts in a quantum Hilbert space. Finally, the representer space F, and the effect of the feature-embedding circuit
theorem 3 guarantees that the minimiser minw C(w, D) is given by
of the empirical risk Uφ : x ∈ {0, 1}n → |ii.
M
X This feature map maps each data input to a state from an
C(w, D) = |f (xm ; w) − y m |2 + ||f ||Rκ orthonormal basis and is equivalent to the generic finite-
m=1 dimensional case discussed in Appendix A. As shown
there, the generic kernel is the Kronecker delta
can be expressed by Equation (4). The simple idea of in-
terpreting x → |φ(x)i as a feature map therefore allows κ(x, x0 ) = hi|ji = δij ,
us to make use of the rich theory of kernel methods and which is a binary similarity measure that is only nonzero
gives rise to machine learning models whose trained can- for two identical inputs.
didates can be expressed by inner products of quantum
states. Note that if the state |φ(x)i has complex ampli- b. Amplitude encoding. Another approach to infor-
tudes, we can always construct a real kernel by taking mation encoding is to associate normalised input vectors
the absolute square of the inner product. x = (x0 , ..., xN −1 )T ∈ RN of dimension N = 2n with the
amplitudes of a n qubit state |ψx i [8, 13],
N −1
III. QUANTUM MACHINE LEARNING IN X
FEATURE HILBERT SPACE Uφ : x ∈ RN → |ψx i = xi |ii.
i=0
Now let us enter the realm of quantum computing and As above, |ii denotes the i’th computational basis state.
quantum machine learning. We show how to use the ideas This choice corresponds to the linear kernel,
of Section II C to design two types of quantum machine κ(x, x0 ) = hψx |ψx0 i = xT x0 .
learning algorithms and illustrate both approaches with
an example from continuous-variable systems. c. Copies of quantum states. With a slight variation
of amplitude encoding we can implement polynomial ker-
nels [9]. Taking d copies of an amplitude encoded quan-
A. Feature-encoding circuits tum state,
Uφ : x ∈ RN → |ψx i ⊗ · · · ⊗ |ψx i,
From the perspective of quantum computing, a
corresponds to the kernel
quantum feature map x → |φ(x)i corresponds to
a state preparation circuit Uφ (x) that acts on a κ(x, x0 ) = hψx |ψx0 i · · · hψx |ψx0 i = (xT x0 )d .
ground or vacuum state |0...0i of a Hilbert space F as d. Product encoding. One can also use a (tensor)
Uφ (x)|0...0i = |φ(x)i. We will call Uφ (x) the feature- product encoding, in which each feature of the input
embedding circuit. The models from Eq. (5) in the x = (x1 , .., xN )T ∈ RN is encoded in the amplitudes
reproducing Hilbert space from Definition 2 are inner of one separate qubit. An example is to encode xi as
products between |φ(x)i and a general quantum state |φ(xi )i = cos(xi )|0i + sin(xi )|1i for i = 1, ..., N [22, 23].
|wi ∈ F. We therefore consider a second circuit W This corresponds to a feature-embedding circuit with the
with W |0...0i = |wi, which we call the model circuit. effect
The model circuit specifies the hyperplane of a linear
model in feature Hilbert space. If the feature state cos x1 cos xN N
Uφ : x ∈ RN → ⊗ ··· ⊗ ∈ R2 ,
|φ(x)i is orthogonal to |wi, then x lies on the deci- sin x1 sin xN
sion boundary, whereas states with a positive [negative] and implies a cosine kernel,
inner product lie on the left [right] side of the hyperplane. N
Y
κ(x, x0 ) = cos(xi − x0i ).
To show some examples of feature-embedding circuits
i=1
and their associated kernels, let us have a look at
5
prediction
1.0
model
quantum
device kernel( , ) 0.0
c = 1.0 c = 1.5 c = 2.0
implicit approach
FIG. 4. Shape of the squeezing kernel function κsq (x, x0 ) from
Equation (7) for different squeezing strength hyperparameters
quantum
device
prediction c. The input x is fixed at (0, 0) and x0 is varied. The plots
show the interval [−1, 1] on both horizontal axes.
explicit approach
new input training inputs the model circuit’s architecture defines the space of
possible models and can act as regularisation (see also
FIG. 3. Illustration of the two approaches to use quantum fea- [22]). Below we will follow a slightly more general
ture maps for supervised learning. The implicit approach uses strategy and compute a state W (θ)Uφ |0...0i, from which
the quantum device to evaluate the kernel function as part of measurements determine the output of the model.
a hybrid or quantum-assisted model which can be trained by Depending on the measurement, this is not necessarily a
classical methods. In the explicit approach, the model is solely linear model in feature Hilbert space. We could even go
computed by the quantum device, which consists of a varia- further and include postselection in the model circuit,
tional circuit trained by hybrid quantum-classical methods. which might give the classifier in feature Hilbert space
even more power.
B. Building a quantum classifier
Using quantum computers for learning tasks with
these two approaches is desirable in various settings.
Having formulated the ideas from Section II C in the For example, the implicit approach may be interesting
language of quantum computing, we can identify two in cases where the quantum device evaluates kernels
different strategies of designing a quantum machine or models faster in terms of absolute runtime speed.
learning algorithm (see Figure 3). On the one hand, we Another interesting example is a setting in which the
can use the quantum computer to estimate the inner kernel one wants to use is classically intractable because
products κ(x, x0 ) = hφ(x)|φ(x0 )i from a kernel-dependent the runtime grows exponentially or even faster with
model as in Eq. (4), which we call the implicit approach, the input dimension. The explicit approach may be
since we use the quantum system to estimate distance useful when we want to leave the limits of the RKHS
measures on input space. This strategy requires a quan- framework and construct classifiers directly on Hilbert
tum computer that can do two things: to implement space.
Uφ (x) for any x ∈ X and to estimate inner products
between quantum states (for example using a SWAP In the remainder of this work we want to explore these
test routine). The computation of the model from those two approaches with several examples. We use squeez-
kernel estimates, as well as the training algorithm is left ing in continuous-variable quantum systems as a fea-
to a classical device. This is an excellent strategy in the ture map, for which the Hilbert space F is an infinite-
context of intermediate-term quantum technologies [24], dimensional Fock space. This constructs a squeezing-
where we are interested in using a quantum computer based quantum machine learning classifier which can for
only for small routines of limited gate count, and example be implemented by optical quantum computers.
compute as much as possible on the classical hardware.
Note that in the long term, quantum computers could
also be used to learn the parameters αm by computing C. Squeezing as a feature map
the inverse of the kernel Gram matrix, which has been
investigated in Refs. [9, 25].
A squeezed vacuum state is defined as
On the other hand, and as motivated in the in- ∞ p
1 X (2n)!
troduction, one can bypass the representer theorem |zi = p (−eiϕ tanh(r))n |2ni,
and explicitly perform the classification in the ‘feature cosh(r) n=0 2n n!
Hilbert space’ of the quantum system. We call this
the explicit approach. For example, this can mean to where {|ni} denotes the Fock basis and z = reiϕ is the
find a |wi that defines a model 5. To do so, we can complex squeezing factor with absolute value r and phase
make the model circuit trainable, W = W (θ), so that ϕ. It will be useful to introduce the notation |zi =
quantum-classical hybrid training [23, 26] of θ can learn |(r, ϕ)i. We can interpret x → |φ(x)i = |(c, x)i as a fea-
the optimal model |w(θ)i = W (θ)|0i. The ansatz for ture map from a one-dimensional real input space x ∈ R
6
as a feature map, where F is now a multimode Fock Since the idea of a support vector machine is to find
space. We call this feature map the squeezing feature the maximum-margin hyperplane in feature space, we
map with phase encoding. want to know whether we can always find a hyperplane
for which the training accuracy is 1. In other words,
The kernel we ask if the data becomes linearly separable in Fock
N
space by the squeezing feature map. An easy way to
Y do this is to apply a perceptron classifier to the data in
κ(x, x0 ; c) = h(c, xi )|(c, x0i )i (7)
feature space. The perceptron is guaranteed to find such
i=1
a separating hyperplane if it exists. Figure 6 shows the
with performance of a perceptron classifier in the Fock space
r for the ‘blobs’ data. The data was mapped to this space
sech c sech c by the squeezing feature map with phase encoding. As
h(c, xi )|(c, x0i )i = 0 , (8) one can see, after 5000 epochs (runs through the dataset)
1 − ei(xi −xi ) tanh c tanh c
the decision boundary perfectly fits the training data,
derived from this feature map [27] is easy to compute on achieving an accuracy of 1. The number of iterations to
a classical computer. It is plotted in Figure 4, where we train the perceptron is known to increase with O(1/γ 2 )
see that the hyperparameter c determines the variance where γ is the margin between the two classes [28], and
of the kernel function. Note that we can also encode indeed we find in other simulations that the ‘moons’ and
7
‘circles’ data only take a few epochs until reaching full a.)
accuracy. Although the perfect fit to the training data is
feature map model
of course not useful for machine learning (as can be seen circuit W (θ)
circuit
by the non-increasing accuracy on the test set) these
results are a clue to the fact that the squeezing feature
map makes data linearly separable in feature space, a x1 o0 p(y = 0)
fact that we prove in Appendix B.
x2 o1 p(y = 1)
While the results of the simulations are promising,
a goal is to find more sophisticated kernels. Although X Y
quantum computers could offer constant speed advan- ..
.
tages, they become indispensable if the feature map cir- F
cuit is classically intractable. However, squeezed states
are an example of so-called Gaussian states, and it is b.)
well known that Gaussian states (although living in an
infinite-dimensional Hilbert space) can be efficiently sim- |(c, x1 )i p(n1 )
W (θ)
ulated by a classical computer [29], which we used in the |(c, x2 )i p(n2 )
simulations. In order to do something more interesting,
one needs non-Gaussian elements to the circuit. For ex- c.)
ample, one can extend a standard linear optical network
of beamsplitters by a cubic phase gate [30, 31] or use pho- D(θ3 ) P (θ5 ) V (θ7 )
ton number measurements [32]. To this end, let Vφ (x) BS(θ1 , θ2 )
be a non-Gaussian feature map circuit, i.e. a quantum D(θ4 ) P (θ6 ) V (θ8 )
algorithm that takes a vacuum state and prepares an x-
dependent non-Gaussian state. The kernel
FIG. 7. a.) Representation of the Fock-space-classifier in
the graphical language of quantum neural networks. A vector
0
κ(x, x ) = h0...0|Vφ† (x)Vφ (x0 )|0...0i (x1 , x2 )T from the input space X gets mapped into the feature
space F which is the infinite-dimensional 2-mode Fock space
can in general not be simulated by a classical computer of the quantum system. The model circuit, including photon
any more. It is therefore an interesting open question detection measurement, implements a linear model in feature
what type of feature map circuits Vφ are classically in- space and reduces the “infinite hidden layer” to two outputs.
tractable, but at the same time lead to powerful kernels b.) The model circuit of the explicit classifier described in the
for classical models such as support vector machines. text uses only 2 modes to instantiate this infinite-dimensional
hidden layer. The variational circuit W (θ) consists of repe-
titions of a gate block. We use the gate block shown in c.)
E. An explicit quantum classifier with the beamsplitter (BS), displacement (D), quadratic (P)
and cubic phase gates (C) described in the text.
In the explicit approach defined above, we use a
parametrised continuous-variable circuit W (θ) to build
a “Fock-space” classifier. For our squeezing example this higher probability. We can interpret this circuit in the
can be done as follows. We start with two vacuum modes graphical representation of neural networks as shown at
|0i ⊗ |0i. To classify a data input x, first map the input the top in Figure 7.
to a quantum state |c, xi = |c, x1 i ⊗ |c, x2 i by performing
a squeezing operation on each of the modes. Second, ap- Let us assume we could represent any possible quan-
ply the model circuit W (θ) to |c, xi. Third, interpret the tum circuit in the feature Hilbert space with the circuit
probability p(n1 , n2 ) of measuring a certain Fock state W (θ). Since the data in F is linearly separable, there is
|n1 , n2 i as the output of the machine learning model. a W for which we obtain 100% accuracy on the training
Since this probability depends on the displacement and set, as we saw in Figure 6. However, the goal of machine
squeezing intensity, it is better to define two probabili- learning is not to perfectly fit data, but to generalise from
ties, say p(n1 = 2, n2 = 0) and p(n1 = 0, n2 = 2), as it. It is therefore not desirable to find the optimal deci-
a one-hot encoded output vector (o0 , o1 ). This output sion boundary for the training data in F, but to find a
vector can be normalised [33] to a new vector good candidate from a class of decision boundaries that
captures the structure in the data well. Such a restricted
1 o0 p(y = 0) class of decision boundaries can be defined by using an
= , ansatz for the model circuit W (θ) which cannot represent
o 0 + o 1 o1 p(y = 1)
any circuit, yet still flexible enough to reach interesting
where p(y = 0), p(y = 1) can now be interpreted as the candidates. Figure 7 c.) shows such a model circuit for
probability for the model to predict class y = 0 and the 2 input modes in our continuous-variable example.
y = 1, respectively. The final label is the class with the The architecture consists of repetitions of a general gate
8
1 IV. CONCLUSION
detail
loss
.99 .98
In this paper we introduced a number of new ideas
train 0 test 0 for the area of quantum machine learning based on the
train 1 test 1 iterations theory of feature spaces and kernels. Interpreting the
encoding of inputs into quantum states as a feature map,
FIG. 8. Fock space classifier presented in Figure 7 and the we associate a quantum Hilbert space with a feature
text for the ‘moons’ dataset. The shaded areas show the prob-
space. Inner products of quantum states in this feature
ability p(y = 1) of predicting class 1. The datasets consist of
150 training and 50 test samples, and has been trained for space can be used to evaluate a kernel function. We
5000 steps with stochastic gradient descent of batch-size 5, can alternatively train a variational quantum circuit as
an adaptive learning rate and a square-loss cost function with an explicit classifier in feature space to learn a decision
a gentle l2 regularisation applied to all weights. The loss drops boundary. We introduced a squeezing feature map as
predominantly in the first 200 steps (left). an example and motivated with small-scale simulations
that these two approach can lead to interesting results.
[1] G. Verdon, M. Broughton, and J. Biamonte, arXiv [2] M. H. Amin, Physical Review A 92, 1 (2015).
preprint arXiv:1712.05304 (2017). [3] M. Benedetti, J. Realpe-Gómez, R. Biswas, and
9
Eq. (2). Consider first the discrete case: space of coherent states is an RKHS for the input set {l}.
X
ψ(si ) = hsi |ψi = hsi |sj ihsj |ψi = hhs| · i|ψ(·)i. The most well-known type of coherent state are optical
sj coherent states
∞
We can identify hsi |sj i with the reproducing kernel. |α|2 X αn
|αi = e− 2 √ |ni,
Since the basis is orthonormal, we have κ(si , sj ) = δi,j . n=0 n!
The continuous case is more subtle. Inserting the iden-
tity, we get which are the eigenstates of the non-Hermitian bosonic
Z creation operator â, with the associated kernel
ψ(s) = ds0 hs|s0 ihs0 |ψi = hhs| · i, ψ(·)i,
|α|2 |β|2
− 2 + 2 −αβ
κ(α, β) = hα|βi = e , (A1)
which is the reproducing kernel property with the repro-
ducing kernel κ(s, s0 ) = hs|s0 i. However, the “function” whose square is a radial basis function or Gaussian kernel
s0 (s) = δ(s − s0 ) is not square integrable, which means it as remarked in [14].
is itself not part of Hsf , and the properties of Definition
3 are not fulfilled. This is no surprise, as the space of
Appendix B: Linear separability in Fock space
square integrable functions L2 is a frequent example of a
Hilbert space that is not a RKHS [37]. The inconsistency
between Dirac’s formalism and functional analysis is If we map the inputs of a dataset D to a new dataset
also a well-known issue in quantum theory, but usually
D0 = {|(c, x1 )i, ..., |(c, xM )i},
glossed over in physical contexts [38]. If mathematical
rigour is needed, physicists usually refer to the theory of using the squeezing feature map with phase encoding
rigged Hilbert spaces [39]. from Eq. (6), the feature mapped data vectors in D0 are
always linearly separable, which means any assignment
There are quantum systems with an infinite basis of two classes of labels to the data can be separated by
which naturally give rise to a reproducing kernel that is a hyperplane in F (see Figure 1). To show this, first
not the delta function. These systems are described by consider the following:
so-called generalised coherent states [40]. In the context
of quantum machine learning, this has been discussed in Proposition 1. A set of M vectors in RN are linearly
Ref. [14]. Generalised coherent states are vectors |li in separable if M − 1 of them are linear independent.
a Hilbert space Hc of finite or countably infinite dimen-
sion, and where the index l is from some topological space The proof can be found in Appendix C. Proposition
p
L (allowing us to define a norm |||li|| = hl|li). They 1 tells us that if our data is linearly independent,
have two fundamental properties. First, |li is a strongly it is linearly separable. This result is in fact known
continuous function of l, from statistical learning theory: The VC dimension
– a measure of flexibility or expressive power – of
lim || |l0 i − |li|| = 0, |li =
6 0. linear models in K dimensions is K + 1, which means
l→l0 that a linear model can separate or “shatter” K + 1
data points if we can choose the strategy of how to ar-
Note that this excludes for example the discrete Fock
range them, but not the strategy of how they are labelled.
basis {|ni}, but also any orthonormal set of states {|zi}
with a continuous label z ∈ C, since 12 |||z 0 i − |zi|| = 1 for
If we can show that the squeezing feature map maps
z 0 6= z. Second, there exists a measure
R µ on L so that vectors to linearly independent states in Fock space, we
we have a resolution of identity 1 = L |lihl| dµ(l). This
know that any dataset becomes linearly separable in Fock
leads to a functional representation of the Hilbert P space space. To simplify, lets first see look at the squeezing map
where a vector |ψi ∈ Hc is expressed via |ψi = l ψ(l)|li
of a single mode.
with ψ(l) = hl|ψi. Inserting the resolution of identity to
the right hand side of this expression yields Proposition 2. Given a0 set of squeezing phases
Z {ϕ1 , ..., ϕM } with ϕm 6= ϕm for m = 1, ..., M, m 6= m0
ψ(l) = hl|l0 ihl0 |ψi dµ(l0 ), and a hyperparameter c ∈ R, the squeezed vacuum Fock
L states |(c, ϕ1 )i, ..., |(c, ϕM )i are linearly independent.
which is exactly the reproducing property in Definition The proof is found in Appendix D. A very similar
3 with the reproducing kernel κ(l, l0 ) = hl|l0 i. Since proof confirms that the proposition also holds true for
there is a finite overlap between any two states from the sueezing map with absolute value encoding described
the basis, the kernel is not the Dirac delta func- in Section III C. Symbolic computation of the rank of the
tion, and we do not run into the same problem as design matrix in feature space in Mathematica confirms
for continuous orthogonal bases. Hence, the Hilbert this result for randomly selected squeezing factors up
11
to M = 10 and a cutoff dimension that truncates Fock Remember that the rank of a matrix is the number of
space to 40 dimensions. linearly independent row (and column) vectors.
For the multimode feature map dealing with input data If the data vectors are all linearly independent we have
of dimension higher than one, that N ≥ M (if N < M there would be some vectors
that depend on others, because we have more vectors
|(c, ϕm )i = |(c, ϕm m
1 )i ⊗ . . . ⊗ |(c, ϕ ))i, than dimensions), and the rank of X is min(M, N ) = M .
and Augmenting X by stacking any number of column
0 0 0
vectors simply increases N , which means that it does
|(c, ϕm )i = |(c, ϕm m
1 )i ⊗ . . . ⊗ |(c, ϕ ))i. not change the rank of the matrix. It follows that for
M linearly independent data points embedded in a N
We have
dimensional space the system has a solution. The data
N is therefore linearly separable.
0 Y 0
h(c, ϕm )|(c, ϕm )i = h(c, ϕm m
i )|(c, ϕi )i,
i=1 With this argument we can add more vectors that are
0 linearly dependent until M = N . After this, we can in
which is 1 if ϕm
=i ϕm
for all i = 1, ..., N and a value
i fact add one (but only one) more data point that linearly
other than zero else. The linear independence therefore
depends on the others, and still guarantee linear separa-
carries over to multi-dimensional feature maps.
bility. That is because adding one data point makes the
row number equal to the column number in [X|1], and
Appendix C: Proof of Proposition 1 adding more columns does not change the rank. In con-
trast, adding two data points means that we have more
columns than rows in [X|1], and adding the column for
Let [X|1|y] can indeed change the rank.
D = {(x1 , y 1 ), · · · , (xM , y M )}
be a dataset of M vectors with xm ∈ RN for all m = Appendix D: Proof of Proposition 2
1, · · · , M , and y ∈ {−1, 1}. The vectors are guaranteed
to be linearly separable if for any assignment of classes Let’s consider a matrix M where the squeezed states
{−1, 1} to labels y 1 , ..., y M there is a hyperplane defined in Fock basis form the rows:
by parameters w1 , ..., wN , b so that p
1 iφj
n (2n)!
N Mjn := p −e tanh(rj )
sgn(
X
wi xm m
∀m = 1, ..., M. cosh(rj ) 2n n!
i + b) = y (C1)
i=1
We introduce two auxiliary diagonal matrices:
The sign function is a bit tricky, but if we can instead q
show that the stronger condition D1 := diag{ cosh(rj )}
N
X
wi xm
i +b=y
m
∀m = 1, ..., M (C2) ( )
n!
i=1 D2 := diag p
(2n)!
holds for some parameters, Eq. C1 must automatically
be satisfied. Multiplying, we find that the matrix V := D1 M D2 has
matrix elements
Equation C2 defines a system of M linear equations n
with N + 1 unknowns (namely the variables). From the 1 iφj
Vjn = − e tanh(rj ) .
theorey of linear algebra we know [41] that there is at 2
least one solution if and only if the rank of the ‘coefficient
Importantly, V has the structure of a Vandermonde ma-
matrix’
1 trix. In particular, it has determinant
x1 · · · x1N 1
1 Y
[X|1] = ... . . . ... ... −eiφi tanh(ri ) + eiφj tanh(rj ) .
det(V ) =
2
1≤i<j≤n
xM
1 · · · xM
N 1
is equal to the rank of its augmented matrix The only way that det(V ) = 0 is if
c > 0). Thus, the only solution to the above equation is linearly independent. Note that the same proof also pre-
when ϕi = ϕj , which can only be true if the two feature scribes that squeezing feature maps with absolute value
vectors describe the same datapoint, which we excluded encoding makes distinct data points linearly independent
in Proposition 2. Thus, det(V ) > 0, which means that in Fock space.
det(M ) > 0, and hence M is full rank. This means that
the columns of M , which are our feature vectors, are