We present a framework for differentiable quantum transforms. Such transforms are metaprograms capable of manipulating quantum programs in a way that preserves their differentiability. We highlight their potential with a set of relevant examples across quantum computing (gradient computation, circuit compilation, and error mitigation), and implement them using the transform framework of PennyLane, a software library for differentiable quantum programming. In this framework, the transforms themselves are differentiable and can be parametrized and optimized, which opens up the possibility of improved quantum resource requirements across a spectrum of tasks.

1 Introduction

Quantum machine learning (QML) is a rapidly growing area of research with great potential. Tools for designing QML algorithms and applications are increasingly being incorporated into open source quantum software [3, 5, 7, 18, 22]. Some of these integrate with classical machine learning tools such as Autograd [23], PyTorch [30], TensorFlow [1], and JAX [6]. The core functionality provided by all of these libraries is automatic differentiation (autodifferentiation) of mathematical functions, which is used to compute the gradients required for model training. The capabilities provided by these libraries have enabled significant advances to be made in classical machine learning, allowing developers and practitioners to focus more on architectures, data, and algorithms rather than on the technical implementation of computing derivatives.

Differentiation is a process that maps a function \(f(x)\) to another function, \(g(x) = \nabla f(x)\), which computes its derivative. Differentiation can thus be viewed as a function transform. Some classical frameworks, such as JAX [6], Dex [31], and functorch [12], make this notion explicit. For example, JAX is branded as a library for transforming numerical functions, and its built-in transforms include just-in-time compilation and automatic vectorization (vmap) in addition to autodifferentiation. The system is extensible and allows users to write their own transforms. Furthermore, these transforms preserve differentiability of whatever they act upon.

There are many processes in quantum computing that rely on the idea of transforming a quantum function, most often represented by a quantum circuit. For instance, a quantum transform may take a circuit as input and return a new, modified circuit as output. This idea, and the underlying functional programming elements, is perhaps most explicit in the Haskell-based language Quipper [11], which contains a built-in, user-extensible system for monad transformers that can be applied to quantum circuits. Such transforms are used most often in the context of quantum compilation or transpilation, wherein gates in a circuit are rewritten in terms of other gates, reordered, or otherwise optimized. Many multipurpose quantum software libraries such as Qiskit [3] and Cirq [8], and the compiler t\(\vert\)ket\(\rangle\) [34], expose such optimization transforms to the user and provide guidance on how to write new transforms. Such transforms have also been implemented in PennyLane [5] as part of a general framework for quantum differentiable programming, which served as the driving force for the work presented here.

Our key contributions are the formalization and implementation of differentiable quantum transforms. While the preceding discussion makes clear that quantum differentiable programming, differentiable function transforms, and quantum transforms are not new ideas in and of themselves, their augmentation to differentiable quantum transforms is, to the best of our knowledge, a novel concept that goes beyond existing formulations. We provide a mathematical description of two such types of transforms (single and batch transforms). We implemented this framework, and will show how it can be leveraged in a variety of practical circumstances. The key defining feature is the transforms themselves are fully differentiable, so they can be parametrized and trained. Not only can we find optimal transform parameters, we also have the potential to learn new transforms. This framework enables us to reason about algorithms and applications in a different way. Its implementation is designed to be streamlined, reusable, and easily extensible, ultimately facilitating the exploration of new ideas and reducing “time-to-research.”

Existing related work can also be viewed from the lens of differentiable quantum transforms. For instance, the work in Reference [43] presents a formal description of quantum differentiable programming from a programming languages perspective and derives a set of rules for what it means to compute gradients of quantum programs that are compiled and composed. In our framework, gradients and compilation are but two examples of differentiable quantum transforms. The work in Reference [42] describes a method for differentiable quantum architecture search by expressing the space of possible circuit ansatze as a continuous-valued probability distribution, whose parameters are then trained to generate variational circuits suitable for a particular loss function. While this work does not consider composability and potential interplay with other operations, such an application also fits naturally within our framework, and is a demonstration of why trainability of transforms can be valuable.

Our work is organized as follows. We begin with an outline of the formal definitions and key features of transforms in Section 2.1 and give an overview of applications in Section 2.2. In Section 3.1, we outline our implementation of a transforms module using the PennyLane quantum software library [5]. A high-level comparison of this module to other contemporary software frameworks is provided in Section 3.2. In Section 3.3, we showcase four explicit examples that demonstrate its advantages: gradient computation, quantum compilation, error mitigation, and noise characterization. We conclude with ideas for future research and applications.

2 Differentiable Quantum Transforms

2.1 Formalism

At a high level, a differentiable quantum transform is a composable function that takes a differentiable quantum program as input and returns one or more differentiable quantum programs as output. Let S be a quantum program [41, 43]. A quantum program may prepare the state of one or more qubits, apply quantum operations (or other quantum programs), and terminates with the measurement of one or more qubits. The probabilistic nature of measurement in quantum mechanics means that the output of a quantum program is non-deterministic. In the context of the programs and transforms we discuss here, we will often consider the program output to be the expectation value of some observable taken in the limit of an infinite number of such measurements (shots), so that the output is reproducible.

Operations applied during the course of a quantum program may be unitary operations or quantum channels. An element of the unitary group on n qubits, \(\mathcal {U}(2^{n})\), is parametrized by \(2^{2n}\) real values. Many quantum channels (e.g., depolarization, amplitude damping, etc.) are also expressed in terms of real-valued parameters. Thus, we express a parametrized quantum program as \(S(\mathbf {\theta })\), where \(\mathbf {\theta } = (\theta _{1}, \ldots , \theta _{m})\) are the parameters [43]. Quantum programs can be differentiated with respect to their parameters; we denote this by \(\frac{\partial S}{\partial \theta _{i}}\), where it is implicit that the function being differentiated is the mathematical function implemented by the program S.

Definition 1.

Let S be a quantum program with input parameters \(\lbrace \theta _{i}\rbrace\). S is a differentiable quantum program if \(\frac{\partial S}{\partial \theta _{i}}\) is defined for all \(\theta _{i}\).

Many derivatives (including higher-order ones) with respect to program parameters can be computed and evaluated on quantum hardware. This generally involves the use of parameter-shift rules, which evaluate the quantum program at different values of its parameters and compute a function of the output [4, 20, 26, 33, 39]. For example, the common two-term shift rule that applies to many single-parameter gates is

\begin{equation} \frac{\partial S}{\partial \theta _{i}} = c \left[ S(a \theta _{i} + s) - S(a \theta _{i} - s) \right], \end{equation}

(1)

where s is the shift parameter and a and c are gate-dependent constants (by default, \(c = 1/2\), and \(a = 1\)). While this appears similar to a finite-difference method, the value of s is macroscopic (typically \(\pi /2\)), enabling gradients to be estimated even in a noisy hardware setting where the infinitesimal shift of finite-differences would be completely washed out due to noise.

A quantum transform \(\mathcal {T}\) is a metaprogram that deterministically maps an input quantum program to one or more output quantum programs, and optionally depends on one or more parameters \(\lbrace \tau _{i}\rbrace\). A differentiable quantum transform (DQT) is a transform that preserves differentiability of the input program with respect to the program parameters, while itself being a differentiable program. As is the case with quantum programs, we denote differentiation of a transform by \(\frac{\partial \mathcal {T}}{\partial \tau _{i}}\), where it is implicit that the function being differentiated is the mathematical function implemented by S after it is transformed by \(\mathcal {T}\).

Definition 2.

Let S be a differentiable quantum program with inputs \(\lbrace \theta _{i}\rbrace\). A program \(\mathcal {T}\) with inputs \(\lbrace \tau _{i}\rbrace\) is a differentiable quantum transform if it maps S to one or more output programs, i.e.,

\begin{equation} \mathcal {T}(S) \rightarrow \lbrace S_{k}^{\prime } \rbrace , \end{equation}

(2)

where each \(S_{k}^{\prime }\) is also a differentiable quantum program with respect to the same inputs \(\lbrace \theta _{i}\rbrace\), and \(\frac{\partial \mathcal {T}}{\partial \tau _{i}}\) is defined for all \(\lbrace \tau _{i}\rbrace\).

Transforms for which \(\lbrace \tau _{i}\rbrace = \emptyset\) are non-parametrized transforms, whereas those with \(\lbrace \tau _{i}\rbrace \ne \emptyset\) are parametrized transforms. We denote transforms mapping one program to a single other, \(|\lbrace S^{\prime }_{k}\rbrace | = 1\), as single transforms. Transforms mapping one program to many, \(|\lbrace S^{\prime }_{k}\rbrace | \gt 1\), are termed batch transforms. Differentiating the output of a batch transform may consist of differentiating the output of each program independently, or a function of those outputs.

We define both single and batch transforms to be mappable over lists of quantum programs, i.e., for single transforms, \(\mathcal {T}([S_{1}, \ldots , S_{k}]) = [\mathcal {T}(S_{1}), \ldots , \mathcal {T}(S_{k})]\), where in the case of a batch transform, the result is a list of lists of programs. As such, transforms (whether parametrized or not) are composable.

Corollary 3.

Let \(\mathcal {T}\), \(\mathcal {U}\) be two DQTs. DQTs are composable, i.e., \(\mathcal {V} = \mathcal {U} \cdot \mathcal {T}\) is also a DQT.

Composability of transforms enables us to construct and apply extensive pipelines of transforms to quantum programs, all the while preserving the differentiability of that program’s input parameters.

2.2 Applications

2.2.1 Gradient Computation as Transforms.

Computation of quantum gradients is a key application of batch transforms. Consider, for example, the parameter-shift rules for computing gradients of parametrized unitaries on quantum hardware. Given a parametrized quantum circuit function

\begin{align} f(\theta) = \langle \psi \vert U(\theta)^\dagger \hat{B} U(\theta) \vert \psi \rangle , \end{align}

(3)

where

•

\(|\psi \rangle\) is the initial quantum state,

•

\(U(\theta)=e^{iG\theta }\) is some parametrized unitary, with generator G having equidistant eigenvalues,

•

and \(\hat{B}\) is an observable of which we wish to compute the expectation value,

the parameter-shift rule allows us to compute the partial derivative of the expectation value by executing the same quantum function with (equidistant) shifted values [39]:

\begin{align} \frac{\partial }{\partial \theta }f(\theta) = \sum _{\mu =1}^{2R} f\left(x + \frac{2\mu -1}{2R}\pi \right) \frac{(-1)^{\mu -1}}{4R\sin ^2\left(\frac{2\mu -1}{4R}\pi \right)}. \end{align}

(4)

Here R is the set of all (unique) pairwise differences of the eigenvalue spectrum of unitary generator G.¹ In essence, we can evaluate the gradient of \(f(\theta)\) with respect to a particular parameter value \(\theta\) by evaluating the circuit at \(2R\) points and then algebraically combining the results. This could be accomplished by a batch transform that creates a separate quantum program for each term in the sum and then combines the results according to Equation (4).

Note that for the case where \(R=1\) (corresponding to single-qubit rotation gates), the above parameter-shift rule reduces to the commonly known two-term rule [33]:

\begin{equation} \frac{\partial U(\theta)}{\partial \theta } = \frac{1}{2} \left[ U\left(\theta + \frac{\pi }{2}\right) - U\left(\theta - \frac{\pi }{2}\right) \right]. \end{equation}

(5)

This can be easily extrapolated to other types of gradient computation, such as finite-differences, or for gates that permit more complex parameter-shift rules (such as the controlled-rotation gates with four-term shift rules).

2.2.2 Differentiable Quantum Compilation.

Quantum compilation is the process of decomposing a high-level specification of a quantum algorithm into a sequence of elementary operations in a format suitable for a particular quantum device. This is an extensive pipeline with numerous components: circuit synthesis, circuit optimization, transpilation, hardware-specific qubit placement and routing, gate scheduling, and optimization of low-level pulse controls.

All these tasks can be naturally viewed as quantum transforms: A circuit is fed in as input, and a modified and/or optimized circuit comes out. Compilation is implemented in this manner in many other quantum software libraries. These often consist of a pipeline of one or more passes through a set of subroutines that transform the circuit. Such subroutines typically manipulate the circuit at the level of its directed acyclic graph. There is some existing work that provides guidance on how to put together automated pipelines [27]. Software tools for compilation such as t\(|\)ket\(\rangle\) [34], staq [2], quilc [35], Qiskit [3], Cirq [8], XACC [25], QCOR [28, 29], Quipper [11], and the transforms in PennyLane, all give the user the flexibility to define pipelines based on a set of available building blocks. In particular, t\(|\)ket\(\rangle\) explicitly uses a modular transform system in its software. Both t\(|\)ket\(\rangle\) and PyQuil have support for partial or parametric compilation, respectively, which makes compilation of parametrized circuits more efficient by first tracing through execution with symbolic variables, and then specifying the numerical parameter values at runtime.

Implementing compilation using differentiable transforms allows for optimization of parametrized circuits without compromising the differentiability of the parameters. For example, consider a single transform that modifies a quantum program by finding adjacent rotations of the same type on the same qubit and combines them into a single rotation, e.g., \(RZ(\phi) RZ(\theta) = RZ(\phi + \theta)\). If both \(\phi\) and \(\theta\) are trainable parameters, then the sum of these values represents a new trainable parameter, which we call \(\lambda = \phi + \theta\). If implemented in an autodifferentiable manner where operations on parameters are traced during a forward pass, then we can compute the gradient of \(\lambda\) alone and use it to extract the gradients of \(\phi\) and \(\theta\). This can be advantageous if using, for example, the parameter-shift rule of Equation (5). With only one trainable parameter, we must evaluate the program only twice, rather than four times in the case where each gradient must be computed individually.

2.2.3 Transforms and Noise.

Many common tasks in both noise characterization and error mitigation can be framed and explored in the context of differentiable transforms. The most basic application is to use transforms to add noise; a single parametrized transform may modify a quantum program by inserting applications of a parametrized noise channel after certain types of gates to simulate the behavior of a noisy device. As transforms are composable, such noise models are highly customizable and allow for varying types and amounts of noise to be added.

However, perhaps a more interesting task is to leverage differentiable transforms to characterize noise. One can create a parametrized noise model as described above and use the results of experiments on a noisy device to learn the values of the noise parameters that most closely match the observed behavior. Thanks to the autodifferentiability of the transform parameters, this can be done using standard optimization techniques such as gradient descent.

Error mitigation methods are essential when running computations on noisy near-term quantum devices. One such method, zero-noise extrapolation (ZNE), estimates a noiseless value of a result by running a circuit for increasing values of some scale factor that adds noise and then extrapolates the results back to the zero-noise case [15, 19, 21, 37]. ZNE naturally incorporates both single and batch transforms: a number of common methods used for addition of noise, such as CNOT pair insertion and unitary folding [9], can be implemented as single transforms; we can then implement a batch transform that uses the single transform to create new programs with different amounts of noise and then compute a function of the results to obtain the mitigated value.

3 Differentiable Quantum Transforms in PennyLane

Going beyond the theoretical description, we will discuss and showcase quantum transforms in the context of PennyLane. PennyLane, while not a pure functional library, contains numerous functional elements that enabled the development of the transforms module, qml.transforms (qml is the standard import alias for PennyLane). We begin with an overview of the key aspects of the system, followed by comparison of these features with what is available in other programming frameworks. We conclude the section with implementations of the applications discussed in Section 2.2.

3.1 The Transforms Module

PennyLane represents (differentiable) quantum programs and quantum circuits using three types of data structures: quantum functions, quantum nodes, and quantum tapes, as shown in Figure 1. The core component of computation is a quantum function, which is a programmatic representation of a quantum circuit. Quantum functions are regular Python functions that accept arguments as input, apply a sequence of quantum operations, and return one or more quantum measurements.

Fig. 1.

To run a quantum function and obtain measurement results, a quantum function must be bound to a device. Binding is accomplished using a higher-level data structure called a quantum node or QNode. The device may be a simulator or actual quantum hardware. Once bound, the quantum circuit can be executed by specifying the set of input parameters and calling the QNode using the same syntax and parameters as one would call the underlying quantum function (this is enabled by the fact that the QNode wrapper is actually a Python decorator).

To execute a QNode with the provided parameters, upon invocation, the QNode constructs an internal representation of the quantum function called a quantum tape. A quantum tape is the lowest-level data structure, representing a quantum program as an annotated queue of operations and measurements. Parameter values are assigned upon construction of the tape.

PennyLane contains explicit construction mechanisms for single transforms, batch transforms, and other types of non-composable transforms that can, e.g., be applied to a QNode to extract information about it. In many cases, the transforms can be applied to more than one type of data structure (for example, the qml.transforms.insert transform can be applied to quantum functions, QNodes, or even quantum devices, to insert gates at specified positions in a quantum circuit).

3.1.1 Single-tape and Quantum Function Transforms.

Single-tape transforms are the base unit of transforms. They are one-to-one processes that loop through a tape’s contents (a list of operations and a list of measurements) and queue the modified operations and measurements to the current context. The transform may accept one or more parameters that affect how the tape is modified. Convenience decorators are provided to ease the construction of tape transforms and other types of transforms. A simple example is presented in Figure 2, wherein all \(CNOT\) gates are converted to \(CZ\) and Hadamard gates by way of a textbook circuit identity.

Fig. 2.

A single tape transform can be applied to a tape directly (such as one constructed in the manner of Figure 1), and a new tape is returned. Alternatively, the same implementation can be used to create a quantum function transform (qfunc transform), which is an elevated tape transform. Quantum tapes are a lower-level data structure, and generally accessed and manipulated internally (e.g., constructed by a QNode prior to execution), rather than being user-facing. Therefore, qfunc transforms enable the same functionality at the abstraction level of functions. In fact, a tape transform can be converted to a qfunc transform simply by changing the top-level decorator from @qml.single_ tape_transform to @qml.qfunc_transform. A qfunc transform can also be applied directly to quantum functions as a decorator, with no need to consider tapes at all beyond the implementation of the transform itself. This flexibility enables us to easily compose qfunc transforms by chaining functions or by stacking decorators.

Transforms may also contain classical processing that affects the parameter values of operations on tapes. PennyLane includes a special module, pennylane.math, which enables manipulation of the parameter values (more generally, tensors) in a way that is agnostic to the underlying classical machine learning framework and thus preserves differentiability. For example, Figure 3 demonstrates a transform that modifies all qml.RX rotations to rotate by the square root of the original rotation angle. We can then write a quantum function that applies qml.RX rotations, apply the transform, and compute the gradients with respect to the input parameters in any framework (in Figure 3 PyTorch is used).

Fig. 3.

3.1.2 Batch Transforms.

Batch transforms are one-to-many transforms that take one tape as input and return a collection of tapes as output. Furthermore, in PennyLane, they may also return a classical processing function that acts on the results of the executed quantum tapes to compute a desired quantity (this function should also be differentiable). A key use case of batch transforms is the computation of quantum gradients, which was discussed in Section 2.2.1 and is shown graphically in the left panel of Figure 4. The ability to differentiate and compose transforms ensures that we can compute nth order derivatives without any obstacles.

Fig. 4.

Batch transforms can also be used to compute the expectation value of a Hamiltonian, as depicted in the right panel of Figure 4. Let

\begin{equation} \hat{H} = \sum _{i} c_{i} P_{i}, \quad P_{i} \in \mathcal {P}_{n}, \end{equation}

(6)

where \(\mathcal {P}_{n}\) is the n-qubit Pauli group, and let \(U(\theta)\) be a parametrized quantum circuit. We can compute the expectation value of \(\hat{H}\), \(\langle \hat{H} \rangle\), after applying \(U(\theta)\) as a linear combination of the expectation values of its constituent terms,

\[\begin{eqnarray} \langle \hat{H} \rangle &=& \sum _{i} c_{i} \mathinner {\langle {0}|} U^{† }(\theta)P_{i}U(\theta) \mathinner {|{0}\rangle }. \end{eqnarray}\]

(7)

To compute \(\langle \hat{H} \rangle\) using a batch transform, we first transform the initial tape into multiple tapes, each of which measures the expectation value of an individual \(P_{i}\) (alternatively, we can partition the terms of \(\hat{H}\) into commuting sets, and compute one expectation value per set [10, 13, 14, 38, 40]). The tapes are then executed, and the results fed into a processing function that implements Equation (7). An example is shown in Figure 5 using the Hamiltonian and tape depicted in Figure 4. This particular batch transform is built-in to PennyLane as qml.transforms.hamiltonian_expand.

Fig. 5.

Batch transforms can also be applied as a decorator directly to QNodes; the outcome of executing the QNode is simply the output of the classical processing function. If both the transform and processing function preserve differentiability of all parameters, then the output of this processing function can be fed as input to other parts of a differentiable quantum program, or itself be differentiated (as will be demonstrated in Section 3.3.3).

3.1.3 Other Transforms.

The qml.transforms module contains two other types of transforms that do not fit the criteria above: device transforms and information transforms. Device transforms act on PennyLane devices, modify their internal workings, and return a new device with different behavior. Figure 6 presents an example, qml.transforms. insert, which adds additional gates at specified points in the circuit. This can be used to, e.g., add simulated noise at the level of the device.

Fig. 6.

Information transforms are non-composable, non-differentiable transforms that take a tape, quantum function, or QNode as input, and return a function that will compute and/or display information about that input. Key examples are qml.draw, which returns a function that outputs a text-based visual representation of the circuit, and qml.specs, which takes as input a QNode and returns a function that computes its quantum resources.

3.2 Comparison with Other Programming Frameworks

PennyLane is one of a multitude of general-purpose, open source quantum programming frameworks. Most frameworks contain the notion of a transform, with the major use case being circuit transpilation and optimization. While PennyLane uses transforms for this purpose as well, the same concept is applied to other tasks such as gradient computation and error mitigation. Transforms are composable across all these different applications (and differentiability is preserved in four classical machine learning frameworks). One can implement the same applications in other frameworks; however, differentiable transforms offer distinct advantages that either streamline the implementation or provide unique functionality. Below, we discuss some similarities and differences with three other general-purpose frameworks: Qiskit [3], Cirq [8], and Rigetti’s PyQuil [16, 36]. The first two are selected due to their broad applicability and the latter for a unique capability, parametric compilation, which is relevant to the discussion about differentiable compilation.

Qiskit. Qiskit is the only library of the three we consider that contains both transform functionality and gradient computation using parameter-shift rules. The primary application of transforms in Qiskit is transpilation. The transpiler optimizes a circuit by applying a sequence of passes, which are objects of type TransformationPass. Such objects have a run method that takes a circuit’s directed acyclic graph representation as input and returns a modified graph as output. Users can write custom optimization passes by inheriting from this class. Figure 7 provides an example and an analogous implementation in PennyLane.

Fig. 7.

Figure 8 demonstrates how gradients of both a circuit’s parameters and transform parameters can be computed. An advantage of the differentiable transform implementation is that it is applied as a function, whereas a new TransformationPass object must be instantiated each time it is used. In Qiskit, the parameter-shift rule is applied to create a list of circuits that are executed, and the results are recombined in a manner similar to a batch transform. However, transpilation through parameters with no assigned numeric value is not possible, and the parameter-shift rules will produce two circuits per parameter. In PennyLane, circuits can first be optimized to produce a lower number of effective parameters, leading to fewer evaluations; this will be discussed in Section 3.3.2. Figure 8 also suggests there is more manual work required to request gradient computation. Using differentiable transforms, only a single extra line (the @qml.gradients.param_shift decorator) is added to convert a QNode evaluation to a gradient computation. This simplifies transform composition, and the differentiable output can also be fed forward into subsequent computations. Additional details regarding gradient computation are provided in our GitHub repository.

Fig. 8.

Cirq. As of version 0.14, Cirq has adopted a functional approach to transforms with addition of a @cirq.transformer decorator. Its transform framework focuses on single transforms for quantum circuit optimization and does not contain the notion of batch transforms. Native Cirq, however, does not contain any gradient computation mechanisms or integrate with any autodifferentiation frameworks. TensorFlow-Quantum [7], the companion library, does enable such integration (with TensorFlow only); however, at the time of this writing, it is currently pinned to Cirq v0.13.1. We expect that features such as differentiable compilation may be enabled in future versions of the two frameworks.

PyQuil. The final library we consider is PyQuil, as one of its unique features is parametric compilation. Quantum programs are compiled at the structural/symbolic level, and their executables contain a “data memory” section that can be updated at runtime with gate parameters. This differs from PennyLane in that the compilation occurs before any parameter values are set, whereas in PennyLane, transforms are applied to circuits that already contain their numerical parameters (though it is not a “native” feature, we will show in Section 3.3.2 how this functionality is accomplished in PennyLane using just-in-time (JIT) compilation and tracing). However, PyQuil does not contain any machinery for a user to develop their own custom optimization passes; the compiler is focused on transpiling down to the hardware gate set and satisfying architecture constraints. It also does not interface with any automatic differentiation libraries.

3.3 Examples

In this section, we present implementations of the examples of Section 2.2 using the differentiable transforms implemented in PennyLane. For each example, we detail a specific scenario in which the differentiability yields significant advantages, insights, or enables novel functionality. Code snippets are provided for each example, and the full implementations are available in a GitHub repository.² Unless otherwise noted, all examples can be implemented in the most recent release of PennyLane (v0.27).

3.3.1 Optimizing Gradient Computation in a Noisy Setting.

While the parameter-shift rule works for a large variety of gates in variational quantum algorithms, we occasionally chance upon unitaries that we wish to train on hardware that do not permit a parameter-shift rule. This could be for a variety of reasons; perhaps the unitary does not satisfy the form \(e^{iGx}\), or the eigenvalue spectrum of its generator is unknown.

In such cases, we typically must fall back to numerical methods of differentiation on hardware such as the method of finite-differences. Previous work exploring finite-differences in a noisy setting has shown that the optimal finite-difference step size for first-order forward difference is of the form [24]

\begin{equation} h^* = \left(\frac{2 \sigma _0^2}{N f^{\prime \prime }(x)^2}\right) ^ {1 / 4}, \end{equation}

(8)

where N is the number of shots (samples) used to estimate expectation values, \(\sigma _0\) is the single-shot variance of the estimates, and \(f^{\prime \prime }(x)\) is the second derivative of the quantum function at the evaluation point. While for large N we can make the approximation \(h^*\approx N^{-0.25}\), for small N on hardware, we must manually compute the second derivative of the quantum function in order to determine a decent estimate for the gradient step size, which can further introduce error while adding a prohibitive number of additional quantum evaluations required per optimization step. Instead, we can wrap the gradient computation in a quantum transform that learns optimal parameters for the finite-difference step size in the presence of noise.

Consider a variational quantum function circuit that we would like to optimize on a noisy device using first-order forward finite-differences and \(N=1,\!000\) shots. Rather than hard-coding a constant finite-difference step size, we can include the variance of the single-shot gradient as a quantity to minimize in the cost function by using the qml. gradients.finite_diff transform.

An implementation is presented in Figure 9; the full code example, including the circuit definition, is available on our GitHub repository. Starting with \(x=0.1\) and \(h=10^{-7}\) (the default step size value of the finite_diff transform), we can now write an optimization loop that

Fig. 9.

(1)

computes the cost value \(f(x, h)\) and an estimate of the quantum gradient \(\partial _x f(x,h)\) using single-shot finite differences with step size h (implemented together in cost_and_grad).

(2)

using autodifferentiation, computes the partial derivative of the cost value with respect to the step size, \(\partial _h f(x, h)\).

(3)

applies a gradient descent step for both parameters x and h.

The results of this “adaptive step size finite-difference” optimization is compared to both a naïve finite-difference optimization (using the default step size of \(h=10^{-7}\)) and the optimal finite-difference optimization (by computing Equation (8) at every step) in Figure 10. It can be seen that the naïve finite-difference optimization fails to converge to the minimum at all; a step size of \(10^{-7}\) results in a very large quantum gradient variance (as can be verified from Equation (8)). The adaptive finite difference, as well as the optimum finite difference, by contrast, are both able to converge to the minimum, with the optimum variant converging particularly quickly, as would be expected.

Fig. 10.

Nevertheless, the results show that knowing the underlying theoretical characteristics of a system (such as in the optimal case) are not required. By simply encoding the quantities we wish to minimize—the expectation value and the gradient variance—the use of differentiable quantum transforms allows us to train the model hyperparameters to minimize error during gradient descent. Such approaches may also be viable in more complex models, where optimal or error-minimizing hyperparameter values are not known in advance.

3.3.2 Augmenting Differentiable Compilation Transforms with JIT Compilation.

The core difference between circuit compilation in PennyLane and other quantum software libraries is that nearly all of its compilation routines are quantum transforms that preserve differentiability.³ This is accomplished by only applying mathematical operations on the parameters through which gradients can be propagated in the underlying autodifferentiation frameworks. This occasionally requires framework-specific considerations such as typecasting or addition of very small numbers to avoid undefined gradients around critical values. As the framework-specific handling of operations is done within the transforms using qml.math, a user need only pass variables with the type of the desired framework to a QNode and request that circuit be compiled; the framework can be interchanged without any modification to the quantum functions themselves.

All compilation transforms are implemented as qfunc_transforms, and as such, manipulate a circuit at the level of its tape. Transforms include rotation merging, single-qubit gate fusion, inverse cancellation, and moving single-qubit gates through control/target qubits of controlled operations. A top-level qml.compile transform is made available to the user to facilitate creation of custom compilation pipelines. The right panel of Figure 11 displays an example pipeline that consists of pushing commuting gates left through controls and targets of two-qubit gates and then fusing all sequences of adjacent single-qubit gates into a single qml.Rot (general parametrized unitary) operation. This yields the reduced-size circuit in the left panel of Figure 11. The compiled circuit remains fully differentiable with respect to the input arguments, even though they may not appear directly as the arguments in any of the gates of the compiled circuit.

Fig. 11.

At the end of the code listing of Figure 11, the method to obtain the gradients with respect to the input parameters is shown. Differentiable compilation can reduce the quantum resources required to compute gradients, as the autodifferentiation framework keeps track of the changes in variables and can produce circuits with a reduced number of parameters. A disadvantage of applying transforms in this way is that every circuit execution involves feeding the quantum function through the transform pipeline. For large circuits and multi-step pipelines that involve more mathematically complex operations such as full fusion of single-qubit gates, this could lead to significant temporal overhead. This is especially undesirable for gradient computations, as these by nature involve multiple executions of a quantum circuit.

Most of PennyLane’s circuit compilation transforms remain differentiable even after applying JIT compilation. As opposed to traditional (classical) compilation that is performed prior to runtime, JIT compiles a program dynamically at runtime. For instance, in JAX [6], this is accomplished for jax.jit using tracing: On the first execution, a “tracer” object is sent through the code, keeping track of the operations performed on it. This is then compiled into a function in an intermediate representation and can be run with real inputs. The first execution of JIT-compiled code takes substantially longer due to the time required to trace; however, subsequent executions are much faster as they are executed using the optimized version. Other frameworks have analogous functionality, such as tf.function in TensorFlow, which first constructs a computational graph using a traced object.

Here we show an explicit example using the JAX interface and jax.jit. Enabling compatibility of quantum transforms with JIT requires different considerations than ensuring differentiability with respect to arguments is preserved. In particular, conditional statements present a challenge for the tracer. For example, suppose that in the process of merging a sequence of \(RZ\) gates, a cumulative angle of 0 is obtained. If optimizing by hand, or in non-jitted code, then whether to apply that gate can be expressed using a conditional statement that checks the value of the angle. However, this leads to two branches of the code with different structure: one in which a gate is applied and one in which it is not. In JAX, one can use a special type of conditional (jax.lax.cond) that is recognized by the tracer; however, the output of both branches must have the same type and shape, and furthermore, it must be a JAX type such as an array (i.e., not a quantum gate). Thus, when jitting, one is limited to using the conditional branches to compute and return some function of a gate’s numerical parameters, and that gate runs even if the parameter is 0. PennyLane’s transforms contain separate conditional statements that first check whether a gate’s parameter is a tracer object prior to compiling. Then, jitted compilation can be done at the cost of occasionally applying a gate with a trivial parameter, and non-jitted compilation will always remove such gates. However, more sophisticated routines such as the two-qubit unitary decomposition are currently non-jittable: The structure of the circuit is computed as a function of the unitary and has four conditional branches with template circuits that depend on the computed number of CNOTs.⁴

Two-qubit decompositions notwithstanding, JIT compilation yields benefits in both the classical and quantum aspects of running an algorithm. The classical preprocessing, which involves applying the quantum transforms, becomes significantly faster after the first jitted evaluation, as it has been compiled to machine code that is then executed directly, bypassing the Python interpreter. We can then run the optimized circuit on quantum devices, which will typically have lower depth and fewer operations, without additional overhead. Furthermore, the same can be done for computation of gradients: Not only may the number of quantum evaluations be reduced as a result of compiling the circuit, but the jitted gradient, after the first execution, will run significantly faster than the original.

Consider the circuit in Figure 12 that has 20 parameters. Using jax.grad to evaluate the gradient with the parameter-shift rules leads to 41 device executions (two per parameter, plus one for the initial forward pass). Running the same circuit but compiling using the qml.transforms.single_qubit_fusion transform will merge all adjacent single-qubit gates. The compiled circuit has 15 effective parameters, as well as lower circuit depth (expanding the rotations gives an original depth of 10, and compiled depth of 8). Table 1 presents sample runtimes for computing the gradients of this circuit. Even though the number of quantum evaluations is lower, the application of the transform adds significant time overhead. However, jit can be applied to both the original and compiled gradient functions to speed up the computation substantially.

Fig. 12.

Table 1.

Circuit optimization method	Time
None	171 ms \(\pm\) 8.15 ms (mean \(\pm\) std. dev. of 7 runs, 1 loop each)
Single-qubit gate fusion	1.56 s \(\pm\) 10.3 ms (mean \(\pm\) std. dev. of 7 runs, 1 loop each)
None (JIT)	23.5 ms \(\pm\) 429 \(\mu\)s (mean \(\pm\) std. dev. of 7 runs, 10 loops each)
Single-qubit gate fusion (JIT)	13.3 ms \(\pm\) 54.2 \(\mu\)s (mean \(\pm\) std. dev. of 7 runs, 100 loops each)

Table 1. Runtimes for Evaluating the Gradient Using JAX for a Circuit of the Form Shown in Figure 12

Times were obtained from running in a Jupyter notebook cell on an Intel i7 3.40-GHz processor. The first execution of the jitted function is longer (here, roughly 0.4 s for the original circuit, and 16 s when single-qubit gate fusion is applied); however, subsequent executions are markedly faster.

New values of the parameters can be passed to the jitted and compiled gradient function without needing to run jit again. In variational algorithms, where a circuit and its gradient are evaluated on the order of thousands of times with different parameters over the course of the optimization process, this would lead to a substantial speedup. While this is only a small example, it demonstrates that ensuring compilation transforms preserve differentiability of the input parameters can lead to benefits both when executing on a quantum device and on a simulator.

3.3.3 Differentiable Error Mitigation.

In this section, we will implement fully differentiable ZNE (i.e., the extrapolated value is itself differentiable with respect to the input circuit parameters) with unitary folding [9]. In this method, a circuit U is first applied, followed by repetitions of \(U^{\dagger }U\). Following the example of the error-mitigation library mitiq [19], the number of such folds \(n_{f}\) is computed based on a scale factor \(\lambda\) according to the expression \(n_{f} = (\lambda - 1)/2\), rounded to the nearest integer. A qfunc transform implementing this is shown in the left panel of Figure 13.

Fig. 13.

ZNE is naturally implemented as a batch transform, shown in the right panel of Figure 13. The transform creates multiple versions of the initial tape with different amounts of folding. It then returns those tapes, along with a processing function that performs the noise extrapolation on the results of executing the tapes (implemented in fit_zne).

The zne batch transform can be applied either to a quantum tape to obtain the transformed tapes and processing function, or to a QNode to directly obtain the mitigated value. In fact, once defined, a user can obtain error-mitigated results simply by adding a single line to their code: the @zne decorator. Some pseudocode for this process is demonstrated in Figure 14 (the full example included in our GitHub repository). Furthermore, if the fit_zne function is implemented in a differentiable manner (using, e.g., the PennyLane math module), then the extrapolated value itself will be differentiable with respect to circuit input parameters.

Fig. 14.

In addition to using the qml.transforms module to build differentiable ZNE methods from scratch, as of v0.25 PennyLane includes a built-in batch transform, mitigate_with_zne. The transform is accompanied by custom folding functions, as well fitting options such as Richardson and polynomial extrapolation.

3.3.4 Learning Noise Parameters with Transforms.

The differentiability of transform parameters can be leveraged for characterization tasks such as learning noise parameters. Consider a simple noisy device where every single-qubit gate is depolarized by the same qubit-dependent amount. We can simulate this noise using a transform that applies an appropriate depolarization channel after every single-qubit gate, as shown in Figure 15.

Fig. 15.

Suppose the set of true depolarization parameters is \(\mathbf {p} = [0.05, 0.02]\), i.e., \(p = 0.05\) for the first qubit and \(p = 0.02\) for the second. In the left panel of Figure 16, we choose a suitable circuit for experimentation and use a transform to learn both depolarization parameters. We first set up a representation of our noisy device and bind to it a QNode that applies the transform using the true depolarization parameters. Note that this step is merely for simulation purposes, and it can be substituted with a noisy hardware device that does not apply any transforms at all.

Fig. 16.

Next, we optimize to learn these depolarization parameters, using a simple least-squares loss for the cost. It computes the difference between the output of a transformed QNode whose transform parameters we are trying to learn (which is running on an ideal, simulated device), and the noisy QNode (which is running on a device with the true amount of depolarization noise). Optimization was performed using gradient descent, and results are plotted in the right panel of Figure 16 (full code can be found in the GitHub repository). The learned parameters are close to the true values, with variability due to shot noise. While this is a simple example, it could be generalized to more complex noise models, such as gate- and qubit-dependent noise, other types of noise such as gate over-rotation, as well as composition of noise channels.

4 Conclusions

Differentiable quantum transforms are a flexible and highly extensible tool for developing quantum algorithms. The examples shown here only scratch the surface of what such transforms can enable in the future. For example, transforms for quantum gradients can be augmented to ensure that all hyperparameters themselves are trainable. Like we did for finite-differences, similar approaches can be applied to analytic gradient methods such as the parameter-shift formula and the Hadamard test. Without needing to delve into deep characterization, the gradient hyperparameters can be included in the cost function to be minimized, allowing the classical optimization loop to find optimal hyperparameter values to mitigate hardware error and improve convergence.

Recent work [32] demonstrated the advantage of a variational approach to error mitigation: This could be implemented and extended with transforms to perform similar tasks that can optimize parameters of an error mitigation protocol that uses a trainable transform concurrently with the optimization of parameters of a variational algorithm. The example in Section 3.3.4 in particular could be used as a starting point for exploration into data-driven approaches to characterizing devices in order to create adaptive, more realistic simulations of hardware. Such simulated devices could then be used as resources to learn recovery transforms to undo noise, and develop improved error-mitigation techniques, without the overhead of needing to simulate on actual hardware.

Finally, this unlocks, in all domains, the possibility for learning new transforms. This would be of particular interest for applications in compilation and circuit optimization. Methods for, e.g., variational compilation [17] and differentiable architecture search [42] have been investigated, and parametrized transforms would enable similar methods to be seamlessly integrated into algorithmic pipelines.

Acknowledgments

The authors thank Zeyue Niu, David Wierichs, and Richard Woloshyn for useful discussions.

Footnotes

Note that since G has equidistant eigenvalues, the set R simply consists of some base frequency multiplied by natural numbers.

https://github.com/PennyLaneAI/differentiable-quantum-transforms.

At the time of writing, the two-qubit unitary decomposition remains non-differentiable due to it involving non-differentiable library functions such as eigensystem computation.

⁴

While one could potentially use the most general template and set angles to 0 as necessary, this would require every decomposed operation to have three CNOTs, which is undesirable when running on hardware where CNOT gates are generally noisier and have longer duration. For further details and discussion of the implementation and challenges, we refer the interested reader to the source code of qml.transforms.two_qubit_decomposition.

References

[1]

Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dandelion Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, PeteWarden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. Retrieved from https://www.tensorflow.org/.

Abstract

1 Introduction

2 Differentiable Quantum Transforms

2.1 Formalism

2.2 Applications

2.2.1 Gradient Computation as Transforms.

2.2.2 Differentiable Quantum Compilation.

2.2.3 Transforms and Noise.

3 Differentiable Quantum Transforms in PennyLane

3.1 The Transforms Module

3.1.1 Single-tape and Quantum Function Transforms.

3.1.2 Batch Transforms.

3.1.3 Other Transforms.

3.2 Comparison with Other Programming Frameworks

3.3 Examples

3.3.1 Optimizing Gradient Computation in a Noisy Setting.

3.3.2 Augmenting Differentiable Compilation Transforms with JIT Compilation.

3.3.3 Differentiable Error Mitigation.

3.3.4 Learning Noise Parameters with Transforms.

4 Conclusions

Acknowledgments

Footnotes

References

Cited By

Index Terms

Recommendations

Quantum Computing: A New Software Engineering Golden Age

Quantum computing

On the principles of differentiable quantum programming languages

Comments

Information

Published In

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Get Access

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations