1 Introduction

The Poisson equation is a second-order partial differential equation widely used in various fields of science and engineering. In general, in order to solve the Poisson equation numerically, projection methods such as collocation, spectral, and boundary element methods as well as finite-difference methods [2] are used. The core of these methods is to approximate the solution of the Poisson equation as the solution of a linear system. However, since the dimension of the linear system obtained from the discrete Poisson equation is generally very large, solving such a system demands much computational time. Therefore, the Poisson equation is a problem well suited to quantum computing, a faster and more powerful computation paradigm [3] than classical computing.

A series of quantum algorithms [4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23] have been developed to solve linear equation systems, which have shown significant speedups over their classical counterparts. Recently, variational quantum algorithms (VQAs) [24,25,26,27], which have already shown some promise for use on so-called noisy intermediate-scale quantum (NISQ) devices [28], are adopted to solve the Poisson equation [19, 20]. From the experimental point of view, while VQAs-based approaches have some advantages, such as generally using relatively shallow quantum circuits or requiring fewer quantum measurements, they still have challenges in optimizing a set of parameters, especially on larger problems [19]. In addition, instead of producing the direct solution of the Poisson equation, these methods rely on an expectation of certain observables limiting them to be coupled with other general problems and thus may have a limited use case. A non-VQA-based iterative method for the HHL algorithm [12] has been proposed to solve linear system of equations in Ref. [21]. Even though they obtained a more accurate solution by increasing the number of iterations with the same number of measurements, they still have challenges in improving the error convergence speed compared to the state vector calculations. However, in the context of the quantum circuit model, Cao et al. [22] first used the original HHL algorithm [12] to solve the Poisson equation. Later, our co-author Wang et al. [23] pointed out a bottleneck of Cao’s algorithm which was the cost associated with the function used for controlled rotation. It was later reformulated by Wang et al. reducing its cost from \(O(m^4)\) to \(O(m^3)\), where m is the number of qubits of input register. Cao’s and Wang’s approaches are reviewed in Appendix A.

Even though these past works, including Cao’s and Wang’s methods, improved the quantum algorithm and circuit for the Poisson solver, they still either suffered from lack of accuracy and/or were limited to demonstrating only a very small size of the problem, and thus, their practical usage is limited. Some of these works focus on minimizing the error in their approaches or in the overall solutions without directly presenting the actual or direct solution of the Poisson equation [19,20,21], or some others appeared to suggest the feasibility of their methods on quantum hardware without even clearly discussing or validating their works on any hardware [19, 20].

Our previous work [1] serves as an initial outline of our research on the quantum Poisson equation solver. It presents a proof-of-concept demonstration of our quantum Poisson solver algorithm and validates preliminary results for a simple case of \(3\times 3\) problem. Here we delve into comprehensive research details, presenting the results on up to \(15\times 15\) problems that include step-by-step improvements in Poisson solutions, scaling performance, and experimental exploration.

Specifically, by solving different sizes of problems, this paper demonstrates the advancement of our algorithm for solving the Poisson equation in several aspects: (1) Improve the precision of phase estimation by increasing the accuracy of the eigenvalues. Unlike the previous approach [23] where only the integer part of the eigenvalues was encoded, we implement non-truncated eigenvalues through eigenvalue amplification. We will see that this has a clear impact in drastically reducing the error in the solution of the Poisson solver and compare that to the exact solution; (2) The rotation angles are calculated with full accuracy, which is also essential for ensuring the overall accuracy of the solution; (3) We present and analyze success probability results, highlighting the reliability of our quantum Poisson solver; (4) Without compromising any accuracy of the algorithm, during the run time, our implementation uses an optimized number of qubits representing the rotation angles. We also optimize the CNOT gates usage, which is one of the primary sources of error in an experiment; (5) Solutions of the Poisson equations with larger problem size to \(7\times 7\) and \(15\times 15\) are demonstrated. In fact, our implementation with dynamic allocation of qubits in different segments of the algorithm ensures easy adaptation of this method for solving real-world problems; (6) Additionally, we explore the scaling performance of our algorithm against the circuit depth and width, demonstrating how our approach scales with larger problem sizes. We also offer a multilevel strategy for how this algorithm might be further improved to explore much larger problems with greater performance; (7) The possibilities and difficulties of implementing the algorithm on real quantum hardware are discussed for the first time. This also includes experimenting with the circuit mapping, error mitigation, etc. on the NISQ devices and presenting a vision for near-term hardware; (8) Finally, the algorithm is implemented using the Qiskit package [29], which would bring advantages for practical use. We believe all these aspects are necessary to advance the study of quantum Poisson solvers.

We also want to make it clear that in this work our main focus is to demonstrate the advancement of our hybrid algorithm that accurately simulates the Poisson equation with realistic problem sizes and explore the experimental feasibility of those problems. In particular, we aim to push the scalability of our proposed Poisson solver to larger practical problems on both simulators and real quantum devices. While testing these problems, we also identify the key limiting factors against applying the algorithm to large problems and implement some optimization methods in terms of the number of qubits and gates. We explain that with the current state of the technology, it is difficult to realize a complete quantum description of the algorithm due to its high resource costs. However, we discuss pathways to further improve this hybrid approach in both simulation and experimental environments.

For demonstrating the circuit, we present both simulated and experimental results, discuss the sources of errors, and eliminate them. The Matrix Product State (MPS) simulator is used for simulation, and the experiment is done on IBM’s ibmq_manila and ibmq_brooklyn quantum backends [30]. We examine the measurement error mitigation on a small system and also discuss how the overall results of the Poisson equation on the currently available quantum hardware are dominated by the error in the CNOT gates.

We have extended the existing algorithm from Wang et al. beyond a single proof of concept to a fully dynamic, scalable body of work that can be used for numerous applications in mixed computing algorithms. Wang’s QRUNES [31]-based machine instructions have been abstracted to more usable Qiskit functions, allowing us to perform fine-tuning of different register sizes so that we can identify key areas of inaccuracy and compare qubit tradeoffs. This is important because the primary limiting factor in accuracy is the total number of qubits in the circuit, and qubits are at a premium in NISQ hardware. Our code is readable and easily usable, allowing a true “black box” approach to be taken to solving the most computationally intensive part of Poisson applications.

The paper is organized as follows. In Sect. 2, we adopt the finite- difference method to discretize the Poisson equation to obtain a linear system. In Sect. 3, we describe the quantum algorithm and circuit design for each module and our algorithm in detail. In Sect. 4, we explain the algorithm improvement and circuit optimization. We show simulated results of different sizes of problems and their improvements, and we discuss more about algorithm scaling and success probability in Sect. 5. In Sect. 6, we demonstrate our improved quantum circuit for the Poisson solver on IBM quantum hardware and discuss error mitigation. Finally, we conclude our works in Sect. 7.

2 Overview of the problem

The goal of this work is to implement an efficient quantum algorithm solving the multidimensional Poisson equation with boundary conditions. Let us consider the Poisson equation defined in an open bounded domain \(\Omega \subset \Re ^d\), where d is the number of spatial dimensions.

$$\begin{aligned}&-\nabla ^2 v(x) = b(x), \ x \ \text {in} \ \Omega \ \ \end{aligned}$$
(1)
$$\begin{aligned}&v(x) = 0, \ x \ \text {on} \ \delta \Omega \ \ \Omega = (0,1)^d \end{aligned}$$
(2)

where \(\delta \Omega \) is the boundary of \(\Omega \) and b(x) is a given smooth function representing different problem applications, such as charge or velocity distribution. One way to solve this problem is to discretize \(\Omega \) to \(N^\prime = N + 1\) grid points in each dimension, where N is an exponent of base 2. The solution v(x) is a vector of \((N-1)^d\) entries.

In this work, we focus on the one-dimensional Poisson equation with Dirichlet boundary conditions. Using the central difference approximation to discretize the second-order derivative, Eq. (1) can be converted to finite-difference form as

$$\begin{aligned} A \cdot \begin{pmatrix} v_1 \\ v_2 \\ \vdots \\ v_{N-1} \end{pmatrix} = \frac{1}{h^{2}} \begin{pmatrix} 2 &{} -1 &{} &{} 0 \\ -1 &{} \ddots &{} \ddots &{} \\ &{} \ddots &{} \ddots &{} -1 \\ 0 &{} &{} -1 &{} 2 \end{pmatrix} \cdot \begin{pmatrix} v_1 \\ v_2 \\ \vdots \\ v_{N-1} \end{pmatrix} = \begin{pmatrix} b_1 \\ b_2 \\ \vdots \\ b_{N-1} \end{pmatrix} \end{aligned}$$
(3)

We now have the \(N-1\) linear equation system, i.e., \(A |v\rangle = |b\rangle \) to be solved. Here A is a Hermitian matrix, and due to the zero boundary condition, its dimension is \((N-1) \times (N-1)\), and the mesh size h equals 1/N.

The eigenvalues of A are \(\lambda _j = 4N^2 \textrm{sin}^2(j \pi / 2N)\), and its corresponding eigenvectors are \(u_j (k) = \sqrt{2/N} \textrm{sin}(j \pi k /N)\) [32].

Fig. 1
figure 1

Overall circuit representation of the algorithm for solving the one-dimensional Poisson equation. The numbers of qubits of registers A, E, and B are l, m, and n, respectively. Here \(m=i+f\), where i and f number of qubits in reg. E hold the integer and fractional parts of the eigenvalue. \(|\omega _j\rangle \) is the angular coefficient evolved from the approximated eigenvalue \(|\lambda _j\rangle \), the output of the QPE. The input \(|b\rangle _n = \sum _{i=1}^{2^n-1} b_i|i\rangle \) is prepared and stored in register B

The best classical algorithms for solving this problem run polynomially with matrix size [33], so the run time increases exponentially with the dimension of the problem. In this paper, a quantum algorithm is used to produce a quantum state representing the normalized solution of the problem. Since this technique runs in polylog time, the curse of dimensionality can be broken. Thus, we can solve the linear system of equations based on the HHL algorithm [12]. Our algorithm exploits properties of matrix A to efficiently implement the HHL algorithm by simulating the unitary operator \(e^{i A t}\). Though we are presenting an algorithm for the one-dimensional Poisson equation, this can be easily extended to the d-dimensional case [19, 23] as

$$\begin{aligned} A^{(d)} = \underbrace{A \otimes I \otimes \cdots \otimes I}_d + I \otimes A \otimes I \otimes \cdots \otimes I + \cdots \nonumber \\ + I \otimes \cdots \otimes I \otimes A. \end{aligned}$$
(4)

with the exponential \(A^{(d)}\) expressed in the form

$$\begin{aligned} e^{iA^{(d)}t} = \underbrace{e^{iAt} \otimes e^{iAt} \otimes \cdots \otimes e^{iAt}}_d. \end{aligned}$$
(5)

So, the quantum circuit simulating \(e^{iA^{(d)}t}\) is just the parallel execution of the circuit simulating \(e^{iAt}\) along the d dimension. In the following sections, we will focus on the one-dimensional Poisson equation.

Fig. 2
figure 2

Overall circuit for quantum phase estimation (QPE) that uses both integer and fractional parts of the eigenvalues through reg. E. \(U^{2^k}\) represents the unitary operator of \(\exp {(i2\pi A/2^{m-k})}\). QFT\(^\dagger \) represents the inverse quantum Fourier transform

3 Quantum algorithm and circuit design

The overall circuit diagram of our algorithm for solving the one-dimensional Poisson equation is presented in Fig. 1 [1, 23]. As the figure shows, the algorithm consists of three stages: phase estimation, controlled rotation, and uncomputation. Its circuit diagram has three main registers—reg. B, reg. E, and reg. A.

  • Reg. B is used to encode the coefficients of the right-hand side of Eq. (1). Its number of qubits is \(n=\lceil \log (N^\prime ) \rceil \), where \(N^\prime \) is defined in Sect. 2.

  • Reg. E is used to store the approximated eigenvalues of matrix A. Its number of qubits is \(m = i + f\), where the first \(i = 2n + 2\) qubits hold the integer part and the remaining f qubits the fractional part of the eigenvalue.

  • Reg. A is used to store pre-calculated angular coefficients for the controlled rotation operation. Its number of qubits is chosen to be \(l \ge m\).

In this work, we assume that the input state \(|b\rangle \) of reg. B is prepared as \(\sum _i b_i|i\rangle \), where \(b_i\) is the value on the right-hand side of Eq. (3), and \(|i\rangle \) is the computational basis [34]. That is, the input \(|b\rangle \) contains the prerequisite state vector, the problem that we are trying to solve, which we then entangle with the approximated eigenvalues \({\lambda }_j\) on reg. E. The output of the algorithm thus is a quantum state that encodes the solutions of the Poisson equation as probability amplitudes on reg. B. Thus, this circuit is a process of quantum state preparation, with the output written as \(|v\rangle = A^{-1}|b\rangle = \sum _i\alpha _i|i\rangle \), where \(\alpha _i\) is the value of the solutions of the Poisson equation after normalization.

We will now discuss a few key steps of the algorithm.

3.1 Phase estimation

Through the quantum phase estimation (QPE) circuit shown in Fig. 2, we estimate the eigenvalues of the discretized matrix A and entangle the states encoding the eigenvalues with the corresponding eigenstates [35]. We will now discuss how the quantum states evolve through the QPE section of the circuit. The initial state of reg. E and reg. B is

$$\begin{aligned} |0\rangle ^{\otimes m}|b\rangle = \sum _{i=1}^{2^n - 1}b_i|0\rangle ^{\otimes m}|i\rangle = \sum _{j=1}^{2^n - 1}\beta _j|0\rangle ^{\otimes m}|u_j\rangle . \end{aligned}$$
(6)

where \(|i\rangle \) is the computational basis and \(|u_j\rangle \) is the \(j\)th eigenvector of matrix \(A\). Then the Hadamard gates across reg. E prepare a uniform superposition state, which the sequence of controlled \(U^{2^m}\) operation evolves as follows:

$$\begin{aligned} \sum _{k'=0}^{2^m-1}(|k'\rangle \langle k'| \otimes U^{k'}) \cdot {\frac{1}{\sqrt{2^m}}} \sum _{k=0}^{2^m-1}|k\rangle \otimes \sum _{j=1}^{2^n-1}\beta _j|u_j\rangle \nonumber \\ = \sum _{j=1}^{2^n-1}\beta _j \left[ \frac{1}{\sqrt{2^m}} \sum _{k=0}^{2^m-1}e^{2\pi i \frac{{\lambda }_j}{2^m} k}|k\rangle \right] |u_j\rangle . \end{aligned}$$
(7)

Note that the state in the square bracket of Eq. 7 is simply the output of the quantum Fourier transform acting on the state \(|{\lambda _j}\rangle \), so after the application of the inverse Fourier transform the states evolve to \(\sum _{j=1}^{2^n-1}\beta _j|{\lambda _j}\rangle |u_j\rangle \). This entangles the eigenvalues \(|\lambda _j\rangle \) with the eigenstates \(|u_j\rangle \) from reg. B.

Though there are methods [36, 37] available for simulating the time evolution of \(e^{iAt}\), Wang et al. take advantage of using specific properties of the tri-diagonal matrix A to reduce the complexity of the algorithm. They first decompose the unitary operator \(e^{iAt}\) with a Hermitian matrix S (S being an orthogonal matrix composed of the eigenvectors of A) and then diagonalize it via the sine transform, and finally use phase kickback [38] to operate it on the state \(|b\rangle \). We adopt Wang’s approach for phase estimation; its detailed circuit composition is available in Ref. [23].

3.2 Phase verification

An eigenvalue problem involving an arbitrary unitary operator A and its eigenvector \(|{v_j}\rangle \) and eigenvalue \(\lambda _j\) satisfies \(A|v_j\rangle =\lambda _j|v_j\rangle \). Using this, we can verify the correctness of the QPE part of the circuit. In fact, this would also implicitly verify the phase kickback operation, which, through the controlled U operations (in Fig. 2), entangles the eigenvalues of matrix A with the eigenstates associated with the input in reg. B. We can think of reg. B as containing the problem we are trying to solve for the HHL algorithm. Each eigenvalue of matrix A is associated with an eigenvector, so the first way to perform the verification is to input the individual eigenvectors as the input to reg. B and then measure reg. E before the controlled rotations. For example, a Qiskit simulation with \(A(3\times 3)\) in Eq. 3 acting on its eigenstates \( |v_j\rangle = \left( {\begin{matrix} 1 \\ \sqrt{2} \\ 1 \end{matrix}}\right) , \left( {\begin{matrix} -1 \\ 0 \\ 1 \end{matrix}}\right) , \left( {\begin{matrix} 1 \\ -\sqrt{2} \\ 1 \end{matrix}}\right) \) produces the eigenvalues \(\lambda _j = 9, 32, 54\) in binary (using only the integer part for simplicity) with \(100\%\) probability; this is shown in Fig. 3 (a-c). Further, for any input with an arbitrary combination of eigenvectors, for example, \(|v\rangle = \frac{1}{2}|v_1\rangle + \frac{1}{\sqrt{2}}|v_2\rangle + \frac{1}{2}|v_3\rangle \), the QPE circuit produces the combination of the eigenvalues with correct probabilities, as presented in Fig. 3d.

3.3 Controlled rotation

After the phase estimation \(\sum _{j=1}^{2^n-1}\beta _j|\lambda _j\rangle |u_j\rangle \) is obtained on regs. B and E, we perform the linear map taking the state of \(|{\lambda }_j\rangle \) to \((1/\lambda _j) |\lambda _j\rangle \). This process consists of two parts: calculating the rotation angular coefficients and performing the controlled \(R_y\) operation. The probability amplitude of \(1 / {\lambda }_j\) can be produced by implementing the controlled \(R_y\) rotation, that is, \(R_y (2 \theta _j) |0\rangle = \cos {\theta _j} |0\rangle + \sin {\theta _j} |1\rangle \), where the rotation angle \(\theta _j\) can be expressed in terms of \({\lambda }_j\) as

$$\begin{aligned} \sin {\theta _j} = 1/{\lambda }_j, \end{aligned}$$
(8)

which can be rewritten as

$$\begin{aligned} \cot {\theta _j} = \sqrt{{\lambda }^{2}_j - 1}, ~~~ \theta _j \in (0, \pi /2). \end{aligned}$$
(9)

Taking \(\theta _j = \omega _j \pi \), Eq. (9) becomes

$$\begin{aligned} \omega _j = \frac{1}{\pi } \textrm{arccot} \big (\sqrt{{\lambda }^{2}_j - 1} \big ), ~~~ \omega _j \in (0, 1/2), \end{aligned}$$
(10)

where \(\omega _j\) is the rotation angular coefficient. In this hybrid approach, we prepare \(\omega _j\) classically and encode them into the circuit.

Fig. 3
figure 3

Verification of the QPE section of the circuit. Panels ac show that for a given eigenstate of \(A(3\times 3)\), the QPE produces the corresponding eigenvalue with \(100\%\) probabilities. Panel d shows that inputting the combinations of eigenstates with arbitrary weights produces eigenvalues with similar weights

For an angular coefficient state \(|\omega _j\rangle \) in reg. A, the binary representation can be written as \(\omega _j = \omega _{j_1}\omega _{j_2}\cdots \omega _{j_l} = \sum _{k=1}^l 2^{-k}\omega _{j_k}\). Then, using \(R_y(2\theta _j) = e^{-i \theta _j Y}\), the \(R_y\) rotation can be expressed as [39],

$$\begin{aligned} R_y(2\omega _j\pi ) = e^{-i (\sum \limits _{k=1}^l\frac{\omega _{j_k}}{2^{-k}}) \pi Y} \nonumber \\ = \prod _{k=1}^l e^{-i \frac{\omega _{j_k}}{2^k}\pi Y} = \prod _{k=1}^l R_y^{\omega _{j_k}} (\frac{\pi }{2^{k-1}}). \end{aligned}$$
(11)

where \(\omega _{j_k}\) are the control qubits in reg. A. For a given k, if the bits of \(\omega _{j_k}\) for all j are zero, then the corresponding \(R_y^{\omega _{j_k}}\) operation has no effect on the Ancillary register. This allows us to further optimize the circuit by removing any control qubits with bit \(\omega _{j_k} = 0\) from reg. A. This is further discussed in the next section.

The workflow used in this work follows several steps and is presented in Algorithm 1.

Algorithm 1
figure a

Quantum Poisson Solver

4 Algorithm improvements, circuit optimization, and challenges

Wang’s method has already reduced its complexity to \(O(m^2)\) qubits and \(O(m^3)\) operations [23]. After implementing the algorithm as a quantum circuit, we look for options for further improving it so that even with a limited number of qubits on the quantum hardware, we are able to more accurately solve the Poisson equation for a realistic problem size, i.e., with a larger matrix A. Below, we discuss some shortcomings of the existing approaches and the ways we improve them:

4.1 Eigenvalue amplification

The first source of inaccuracy in the existing algorithm [22, 23] is the truncation of the eigenvalues of matrix A in the phase estimation. Wang’s implementation uses only the integer eigenvalues of the A matrix, presumably in order to save qubits. As more qubits become available in quantum devices, however, we can improve accuracy by using non-truncated values. Therefore, we extend the algorithm by taking into account both the integer and fractional parts of the eigenvalues. This is done by amplifying the eigenvalue by a factor of \(2^f\), which shifts the decimal point of the binary \(\lambda _j\) to right by an integer f. For example, for a given \(\lambda _j = 10111.11011011101011\), with no amplification, \(2^4\) amplification and \(2^8\) amplification, the circuit carries \(\lambda _j = 10111, 101111101\), and 1011111011011, respectively. Essentially, when we include fractional part for the eigenvalue, we are encoding a bitshifted/amplified eigenvalue that is still an integer but contains bits of the fractional part. This way, by using a large f, one actually includes more bits of the fractional part of the eigenvalue, and the shifted position of the decimal point of the eigenvalue is adjusted by a normalization factor \(2^{-f}\) in the controlled \(R_y\) operation of the circuit to match. Due to the dynamic nature of our code, we are able to experiment using any number of bits on the eigenvalues, taking a more accurate representation of the critical matrix A for our computation.

4.2 Rotation angular coefficient accuracy

The second source of inaccuracy is in the calculation of the rotation angular coefficient. The previous method [23] omitted the subtrahend 1 under the square root in Eq. (10); we instead include it. Additionally, we retain full accuracy in the calculation of the rotation angular coefficients by using the full eigenvalues. Furthermore, our implementation allows us to dynamically expand the number of qubits to represent rotation angular coefficients with higher accuracy. Finally, we are able to use the optimum number of qubits based on the convergence of the error in the solution, which is discussed in the next section.

4.3 Optimize the number of qubits used for rotation angles

While we initially presented a rotation on all bits of reg. A, in practice, this is not necessary. In Eq. (11), if the bits of \(\omega _{j_k}\) for all j are zero for a given k, then the corresponding \(R_y^{\omega _{j_k}}\) operation has no effect on the Ancillary register. In other words, if there is no information conveyed on a given qubit in reg. A by any of the rotation angular coefficients, then we can safely omit that qubit and its rotation without impacting the results. Intuitively, this makes sense as the controlled rotations do not happen if a given control bit is 0. Imagine a case where our \(\omega _j = 0.0000100110, 0.0000001010, 0.0000000101\). The first four qubits, as well as the sixth qubit, are 0 for all \(\omega _j\), so the respective \(R_y(2^{-1}\pi )\), \(R_y(2^{-2}\pi )\), \(R_y(2^{-3}\pi )\), \(R_y(2^{-4}\pi )\), and \(R_y(2^{-6}\pi )\) rotations never happen. We have no need to include these controlled rotations nor the qubits in reg. A that they correspond to. This allows us to further optimize the circuit by removing any control qubits with bit \(\omega _{j_k} = 0\) from reg. A. As a result, though at the beginning we chose \(l\ge m\) qubits for reg. A, after the circuit optimization, the register has fewer than l qubits.

4.4 Optimize CNOT gates usage

The rotation angular coefficient allows us to entangle the prepared state on reg. E with the controlled rotation on reg. A. This is achieved with multi-controlled multi-target (MCMT) gates controlled on the binary expansion of the eigenvalues on reg. E. However, MCMT gates transpile to many CNOT gates, which carry significant errors into the experiment. We want to minimize the number of controlled bits in this operation. At the end of the phase estimation, the qubits of reg. E are entangled; thus, the phase information can be accessed through fewer qubits in reg. E. This allows us to control our encoding of the rotation angular coefficients on only the unique most significant bits of reg. E. For example, in the \(3 \times 3\) case, if our eigenvalues are 9, 32, 54 (taking only the integer part for simplicity), then their binary encodings are 001001, 10000, 110110, respectively. It is then evident that the two most significant bits of the binary encodings are enough to differentiate between different eigenvalues: 00, 10, and 11. Controlling the angular rotations on only these two qubits allows us to reduce the number of CNOT gates in the circuit significantly.

4.5 Classical versus quantum approach to rotation angular coefficient

Preparing the angular coefficient state \(|\omega _j\rangle \) (see Eq. 10 here) using a quantum circuit presented in Ref. [23] has a cost that grows exponentially with the problem size. This proves to be a challenge because the current state of the simulator, and quantum hardware supports a limited number of qubits. Therefore, though from a theoretical standpoint, the quantum approach to \(|\omega _j\rangle \) is appealing, from a practical standpoint its classical treatment is the viable option. This is particularly true because our main goal is to scale the Poisson solver to realistic problem sizes, which requires us to appropriately allocate computational resources.

Therefore, in this work, we pre-calculate \(\omega _j\) classically and encode them into the circuit. These two steps of the workflow are already fast and for large number of \(w_j\)’s; the total processing time can be significantly reduced by parallelizing them over j on CPU or GPU hardware and thus avoids exponential processing time growth on large problems. Please note that this computation is required only once, independent of the number of repeated shots, and we make the process substantially more efficient by dynamically calculating it for any problem size.

However, even within this hybrid approach, dealing with very large problems, e.g., encoding \(10^8\) values of \(\omega _j\) for a \(10^8\times 10^8\) matrix, would be challenging. Such a large problem would make the circuit depth prohibitively large from an experimental standpoint. A multilevel solution to this problem is discussed in the subsequent section.

5 Simulated results and discussions

We constructed our algorithm in the Python programming language using IBM’s Qiskit package [29]. This allowed us to create our circuit in a modular fashion as well as use some of Qiskit’s abstractions, such as the MCMT gate and simple implementations of quantum Fourier transform.

Fig. 4
figure 4

(Left) Comparison of the Matrix Product State (MPS)-based simulated solution of the Poisson equation with exact and existing QRUNES [23] solutions for a \(3 \times 3\) problem size. (Right) The relative error in QRUNES and MPS simulated outputs with respect to the exact result

Fig. 5
figure 5

(Left) Step-by-step improvement in MPS-simulated solution by using accurate rotation angular coefficients \(\omega _j\) and encoding them into up to 16 qubits on reg. A. (Middle) relative error in the improved MPS-simulated solutions with respect to the exact result. (Right) relative errors at the level of individual states

To the best of our knowledge, previous work has not included the simulation of a solution of the one-dimensional quantum Poisson equation beyond a \(3\times 3\) matrix A. However, here we present the simulation of solutions of much larger problems, that is, for larger sizes of A. In fact, we will present that our algorithm and its circuit representation are capable of dynamically controlling problem size in NISQ devices. For simulation, we use IBM’s Matrix Product State (MPS) simulator since it supports a relatively large number of qubits (up to 100) necessary for presenting the circuit for larger problems while also maintaining reasonable accuracy. In this section, we analyze the source of error in the solution and accordingly demonstrate step-by-step improvements in the algorithm that secure higher accuracy in the solution.

Reproduce existing results As shown in Fig. 4 (left), we first produce the solution of a \(3 \times 3\) problem with \(|b\rangle = \frac{1}{\sqrt{2}}|01\rangle + \frac{1}{2}(|10\rangle + |11\rangle )\) being the right-hand side of the Poisson equation. In order to compare this with the existing QRUNES results [23], we use their same inputs, that is, only the integer part of \(\lambda _j\) and the approximated \(\omega _j\) encoded on 10 qubits of reg. A. Notice that in Fig. 4 (left), we show the vertical axis starting from 0.4, so that even any tiny differences in the heights of the histograms are clearly visible. Though our MPS-based simulated solution shows an excellent agreement with QRUNES, there are some discrepancies compared to the exact solution. To analyze further, the accuracy of our MPS-based result is depicted using the relative error in the MPS solution with respect to the exact result and here the relative error, e. g. , for MPS, is defined as \({\Vert \text {Exact} - \text {MPS}\Vert }_2/{\Vert \text {Exact}\Vert }_2\) [40] and shown in the right panel of Fig. 4. Relative errors in both QRUNES and MPS are virtually equivalent.

Improvements in results To improve our MPS-based result presented above and have a better agreement with the exact solution, we made the following two improvements: First, we used the accurate formula for \(\omega _j\) given in Eq. (10); then, we encoded these values in up to 16 qubits on reg. A. The results are shown in the left panel of Fig. 5, which displays the gradual improvements in the solutions as compared to the exact result. The improvements in solutions are clearly visible through the relative error presented in the middle panel of Fig. 5, and its right panel explicitly shows the components of those relative errors.

Next, we extend the problem size to \(7 \times 7\) with \(|b\rangle = \frac{1}{4}(|001\rangle + |010\rangle + |011\rangle + |100\rangle ) + \frac{1}{2}(|101\rangle + |110\rangle + |111\rangle )\) and further investigate the effects of using a more precise \(\omega _j\) by increasing the number of qubits in reg. A. As clear in Fig. 6, improvements in the solutions and reduction in the relative errors resulted as the qubit number increased from 12 to 20. The relative error of the \(7 \times 7\) problem with \(\omega _j\) encoded in 16 qubits is about \(0.88\%\), which is about 9 times larger than that of the \(3 \times 3\) problem. This is understood by the fact that the error due to the truncated eigenvalues \(\lambda _j\) used in phase estimation plays a major role here. This is because a larger problem requires a larger number of controlled-U operations (see Fig. 2), resulting in more error accumulation. The overall accuracy of the results is relatively similar when using 16 and 20 qubits; thus, we chose to fix \(\omega _j\) at 16 qubits as we investigated further improvements to the solutions.

To further reduce the error discussed in the previous paragraph, we used eigenvalue amplification (as discussed in Sect. 4.1) with a factor of \(2^f\) where f takes the value 0, 4, and 8. A larger f includes more number of bits in the fractional part of the eigenvalue and thus retains more accuracy in the solution. The effects of eigenvalue amplification on the \(7 \times 7\) problem are shown in Fig. 7, which confirms the significant reduction to the relative error when we use eigenvalue amplification. At \(2^8\) amplification, the relative error is \(0.18\%\), a 5-fold improvement in accuracy compared to using no amplification. We are confident that a higher amplification factor (\(f>8\)) would further reduce this error.

Fig. 6
figure 6

Effects of increasing the number of qubits on the rotation angular coefficients \(\omega _j\) on a \(7 \times 7\) problem size. (Left) Exact solution with the MPS result simulated by encoding \(\omega _j\) on different numbers of qubits of reg. A. (Right) Relative errors of the left panel results with respect to the exact solution

Fig. 7
figure 7

Effects of eigenvalue amplification on a \(7 \times 7\) problem size. (Left) Comparing solutions with varying levels of eigenvalue amplification while using 16 qubits for \(\omega _j\). (Right) Drastic reduction in relative errors of the solutions on the left panel with respect to the exact result

Fig. 8
figure 8

Solution of a \(15 \times 15\) problem size. (Left) Comparing the solutions with different levels of eigenvalue amplification (with a fixed \(\omega _j\) of 16 qubits) and exact result. (Right) Significant reduction in the relative errors with respect to the exact result

Fig. 9
figure 9

Success probability of obtaining the desired state on \(7 \times 7\) and \(15 \times 15\) problem sizes. Here (a), (b), and (c) correspond to the cases presented in Figs. 6, 7, and 8, respectively. The arrow line in (b) and (c) indicates the steady improvements in success probability toward their respective analytical values with the increase in eigenvalue amplification

To confirm the robustness of our algorithm and its accuracy in solving the Poisson equation for practical problem sizes, we present the solution for a \(15 \times 15\) problem, including its exact result, in Fig. 8. For an arbitrarily chosen input state \(|b\rangle \) (see Table 1 for its expression), the overall solution is encouraging. The relative error with respect to the exact result again quickly goes down as we apply eigenvalue amplification and increase its amplification factor.

Success probability Analytically, the success probability (SP) of the measurement is determined by the eigenvalue distribution and their levels of accuracy. As can be seen in the state before the measurement, i.e., \(|0\rangle \otimes \sum _{j=1}^{2^n - 1} \beta _j |u_j\rangle \big (\sqrt{1-C^2/\lambda _j^2}|0\rangle + C/\lambda _j|1\rangle \big )\) (C being a normalizing constant) [12], the SP is determined by the summation of the squares of reciprocals of eigenvalues. So the values of SP using the truncated (i.e., integer) eigenvalues of \(3\times 3\) and \(7\times 7\) problems are \(1.367\%\) and \(1.337\%\), respectively (SP is greater than 1 as \(C = 1\) is considered here). However, on the simulation side, we compute the SP by dividing the number of trials with correct output by the total number of repeated trials and then multiplying the factor by 100 [41]. When no eigenvalue amplification is used, compared to the analytical SP of \(1.367\%\) on a \(3\times 3\) problem, we computed an SP of \(1.103\%\), which is very close to the number \(1.120\%\) reported by Wang et al. [23]. On the \(7\times 7\) problem, as shown in Fig. 9a, the SP appears to vary between \(0.818\%\) and \(0.826\%\), but without showing any steady movement toward its analytical value \(1.337\%\) as more accurate \(|\omega _j\rangle \)’s were used by increasing the number of qubits in reg. A. This suggests that the SP is more sensitive to the other dominant source of error that involves the truncation of eigenvalues used in phase estimation. Therefore, controlling such error requires using eigenvalue amplification. Figure 9b, and c shows the SP on \(7\times 7\) and \(15\times 15\) problems plotted with different levels of amplification. As expected, both figures confirm the steady improvements in the SP rightly proceeding to their analytical values \(1.154\%\) and \(1.122\%\) (those calculated using the exact eigenvalues), respectively, with higher levels of amplification. Though we have no doubt that a higher amplification factor (\(f>8\)) would further improve the SP approaching it to its respective analytical value, we are unable to fully characterize the reason for two different variation trends of SP shown by the dotted arrow in Fig. 9b and c. In both cases, though we have taken a large number of trials (1.2 million for each), theoretically, a \(15\times 15\) problem requires a polynomially greater number of trials than that of a \(7\times 7\) problem as required by the relation of \(\kappa \) with N (the size of the discretized matrix A). In practice, one could still add more trials to the \(15\times 15\) case, but doing that only might not fix the problem. This is because, theoretically, both the amplification factor f and the number of trials need to be infinitely large to ensure an accurate and reproducible solution. To resolve this two-factor constraint, in the scaling section, we discuss a plan to extend our algorithm by combining it with an iterative method [21]. This will further optimize the number of qubits required for eigenvalue amplification with a higher f, resulting in achieving a converged solution using a relatively lower number of trials.

Also, as \(\kappa \) grows, matrix A becomes more and more difficult to invert, and the solutions become less stable [12]. Furthermore, the basic error of the solutions caused by the central difference approximation is related to the condition number as \(\kappa = O(\epsilon ^{-2\alpha })\) (\(\epsilon \) being error and \(\alpha \) being a smoothness parameter), and therefore, an additive preconditioner [42] may be used to reduce \(\kappa \).

Summarizing the input and output In Table  1, we present all the problems we discussed so far, along with each input state \(|b\rangle \) and Poisson solution \(|v\rangle \). Note that the relative errors shown in Table 1 gradually increase with the problem size. This may be explained by the fact that even a small inaccuracy in the encoded eigenvalues would cause the accumulation of a larger amount of error due to the extra controlled-U operations required for larger problems. Therefore, an optimum solution of a larger problem would require using an even higher amplification factor f, which ensures more number of bits (0 and 1) of the fractional part of the eigenvalues to be taken into account for this computation. In the current implementation, though it is possible to pre-set the value of f, one may not want to do that. Instead, it is desirable to increase f until the convergence of the solution with the desired accuracy is achieved. Also, for all of our simulations, even though we use angular coefficients encoded to a fixed number of qubits (16), encoding them to a higher number of qubits would certainly improve the accuracy of the solution.

Algorithm scaling and further improvement direction Compared to Cao’s algorithm [22], Wang’s method reduces the cost of the problem by one order, from \(O(m^4)\) to \(O(m^3)\), by performing the controlled rotation of HHL using the arc cotangent function, meaning the rotation angles are prepared directly from the eigenvalues instead of their reciprocals. While the theoretical complexity of the algorithm is discussed in Ref. [23]; in this work, we focus on both computational and experimental feasibility.

On the computational aspect, we not only ensure better accuracy of solutions that are lacking in the existing approaches [12, 22, 23] but also successfully demonstrate the scaling of the problem to larger matrices. Within our implementation, our codebase dynamically generates optimized circuits for any given size of the problem. The practicality of the algorithm is demonstrated through the scaling performance of problems for up to \(31 \times 31\) (see Fig. 10). During runtime, we recorded the total number of qubits used and the circuit depth on the basis of elementary gates after the circuit decomposition. As shown in Fig. 10, though the circuit depth grows exponentially as the problem size increases, the number of qubits scales linearly, which is encouraging. This is because, for an existing simulator or quantum hardware, relatively the total number of qubits is more critical factor than the circuit depth.

However, one may also point out that the exponential increase in the circuit depth may require a longer coherent time, which indeed is still a challenge to increase from the technological development point of view. Also, the exponential growth of the circuit depth may have a similar effect on the time complexity of the algorithm. The circuit depth issue and the overall scaling can be further optimized by: (1) Optimally mapping the logical to physical qubits when compiling quantum circuits onto hardware with restricted connectivity by trading off circuit depth and gate count [43]. We have already implemented this and discuss more about it later in the experimental section; (2) Combining our algorithm structure with an iterative method [21] would further optimize qubit usage, especially for the eigenvalue expression, while improving the computational speedup by requiring fewer repeated measurements. In fact, our algorithm is well suited for coupling with an iterative solution process, which would ensure even higher accuracy in results; and (3) Adapting a circuit knitting technique [44,45,46,47], which allows partitioning of large quantum circuits into subcircuits that fit on smaller devices and then knitting the results back together using a classical computer. Although there is some overhead associated with the knitting process, it would open a path to explore massive problems, including multidimensional ones. In our current implementation, due to the full circuit being processed in a single quantum processor, the section of the workflow is relatively slow, especially on large problems. Circuit knitting would require locating processing bottlenecks through profiling and accordingly distributing the tasks on multiple quantum processing units (QPUs), ensuring the tasks’ parallelism with load balancing, which would result in the speeding up of the whole computation. In fact, this is the path IBM takes in realizing their near-term hardware development by combining multiple QPUs [48,49,50] through circuit knitting techniques.

Table 1 One-dimensional Poisson equation, i.e., \(A|v\rangle =|b\rangle \)’s inputs and solution states, and relative errors in the solutions for different sizes of problems

If we extend this to d dimensions, the main difference would be the Hamiltonian simulation for \(e^{iA^{(d)}t}\), which can be parallelized across d (see Eq. 5). Therefore, for the multidimensional case, the complexity of our algorithm still grows linearly. This is encouraging as the cost of any classical algorithm solving the d-dimensional Poisson equation grows exponentially with d. The linear cost of the quantum algorithm makes it ideal for experiments solving the d-dimensional Poisson equation on near-term quantum hardware and achieving exponential speedup in terms of d.

Fig. 10
figure 10

An optimum number of qubits is used for constructing the circuit representing the Poisson solver algorithm. (Left) How the resources, that is, the number of qubits, scales with the size of the problem. The scaling is shown with varying levels of amplification. (Right) The circuit depth with the size of the problem

6 Circuit demonstration on quantum hardware

Qiskit allows for easy circuit optimization and the running of circuits on IBM’s quantum hardware [29]. Their ibmq_manila and ibmq_brooklyn systems containing 5 and 65 qubits, respectively, [30] are used to run our circuit experiments. These systems support only the CNOT, I, \(R_z\), \(\sqrt{X}\), and X gates, so any other gates used must be compiled down to these basic components, for example, the MCMT operation is compiled to CNOT gates. The circuit transformation is performed using Qiskit’s transpiler [51] with samplings that ensure a minimum depth of the optimized circuit.

One crucial part of experimenting with circuits on physical hardware is finding the optimal mapping of virtual qubits to physical qubits on the hardware [43, 52,53,54,55]. Qiskit does this automatically via stochastic mappings of virtual to physical qubits and offers different levels of transpilation for circuit optimization. We experimented with multiple levels of optimization, conducting stochastic searches of mappings in an effort to find an optimal mapping for our circuit. Our final circuit was scholastically sampled over 1500 times to find such an optimal mapping. However, as we will discuss, the accuracy of our experiments was ultimately limited by the accumulated error of the large number of CNOT gates required in our circuit.

6.1 Measurement error mitigation on \(|b\rangle \)

The current state of quantum hardware presents many challenges, particularly the short coherence time and accumulation of noise in experiments [56]. In addition, on physical devices like IBM’s ibmq_manila or ibmq_brooklyn, different pairs of qubits have different CNOT error rates, which also affects the ultimate accuracy of the system as many qubits are directly entangled with other qubits using CNOT gates in the course of an experiment [30]. Therefore, it makes sense to first set up a small system with a limited number of CNOT gates and to experiment on that.

Additionally, there are two more purposes for this experiment: (1) Setting up a test model with the exact input state used in the full circuit for the \(3\times 3\) problem (corresponding to Figs. 4 and 5), from which we get an estimated error related to the measurement part of the algorithm, and (2) Determining how much of that measurement error may be mitigated through the existing model and how the error associated with the relatively small number of CNOT gates affects the overall result.

Based on the available options for experimentation, we first investigate errors using a simple noise model generated from the properties of real device ibmq_manila from the IBM Quantum [30] and mitigate those errors on the measurement qubits [57,58,59,60]. To estimate the amount of error in our actual circuit, it was enough to use a test circuit involving only the input/output state \(|b\rangle \) (those acting on reg. B) where we do the measurement. A diagram of the circuit is shown in the top panel of Fig. 11. Following the circuit transformation through the transpiler for ibmq_manila, the circuit decomposes to a number of basis gates that includes 2 CNOT gates and a few single qubit gates. We experiment with the circuit on ibmq_manila with and without mitigating errors on both measurement qubits and compare those results with the MPS-simulated result. The results with two optimization levels 0 and 3 are shown in the bottom-left panel of Fig. 11. While there are some noticeable differences in probability for some states, the overall result appears to improve with the error mitigation and for higher levels of transpiler optimization. This is clearly evident in the bottom-right panel of the figure, where it shows the relative error with respect to the simulation. It confirms that to reduce the error in the experiment significantly, it is not enough just to tune the optimization levels; error mitigating is also essential on the NISQ hardware.

We want to mention that while this experiment does not completely represent the full circuit of the Poisson equation solver, we believe that it offers us a projection as to what one could expect if NISQ or near-term hardware could support the experiment of the full circuit. Our experimental results of this test system project a significant reduction in the relative error in the measurement part of the circuit, hence indicating the possibility of mitigating a similar magnitude of error on the full system.

6.2 Effects of CNOT error on the experiment of \(3\times 3\) problem on quantum hardware

One measure of the fidelity of quantum systems is in terms of their CNOT error rates, that is, the accuracy of individually entangled bits when performing a two-qubit CNOT gate [61, 62]. The average CNOT error on the ibmq_brooklyn system is \(8.094\text {e-}2\); in other words, they have an accuracy of about 0.92. Thus, we can estimate the overall accuracy of the experiment based on the final number of CNOT gates transpiled from the more abstract circuit, approximated by \(0.92^c\) where c is the final number of CNOT gates after transpilation. Ultimately, every Toffoli gate, as well as more complex gates such as MCMT, is transpiled into many CNOTs. After a series of transformations using different levels of transpiler optimization, our best circuit for the \(3 \times 3\) problem required roughly 5.5k CNOT gates. Unfortunately, this number is quite large given the experimental fidelity of current NISQ devices. As a result, the accumulated errors of the CNOT-gates result in the washing out of the experimental accuracy, which contributes to the artifact of a nonzero contribution for the \(|00\rangle \) state (see the figure in Ref. [1]). In general, experimenting with the circuit on different IBM hardware would end up with similar results, as the best CNOT accuracy on any system is less than 0.99. Therefore, the CNOT error rate appears to be the most dominant bottleneck in realizing the algorithm on NISQ hardware. This experiment helped us pinpoint this key limiting factor of the NISQ device. In spite of the instrumental difficulties involving the CNOT errors, for the first time, we showed that such a full circuit can easily be mapped (logical to physical gates) and experimented on the existing quantum hardware.

Fig. 11
figure 11

Measurement error mitigation of the simplified circuit built out of \(3\times 3\) problem. (Top) Quantum circuit built using the input state \(|b\rangle = \frac{1}{\sqrt{2}}|01\rangle + \frac{1}{2}(|10\rangle + |11\rangle )\) of the \(3 \times 3\) problem. (Bottom left) The MPS-simulated result is compared with the experimental result from the noisy IBM’s ibmq_manila device and after mitigating the error on the measurement qubits. Results are shown with the optimization levels 0 and 3 of transpiler. (Bottom right) The relative error in the experimental results with respect to MPS-simulated result

7 Conclusions

We have successfully demonstrated several crucial improvements and optimizations essential for scaling the Poisson Solver to larger problem sizes within a hybrid algorithm. By identifying two major sources of error accumulation in the algorithm, one in the phase estimation involving truncating eigenvalues and the other related to the accuracy of the rotation angular coefficients, we were able to build a circuit implementation that was dynamically tunable with respect to those two sources of inaccuracy. Adding accuracy to the eigenvalues through eigenvalue amplification yielded the best improvements and proved to be necessary when expanding to larger, unsolved problem sizes. Not only did we perform more accurate computations with these amplified eigenvalues, but we also were able to achieve a higher success probability on every single circuit than previously possible with truncated eigenvalues. We presented results on significantly larger problem sizes than previous works, as well as improved accuracy on existing problem sizes. These accuracy improvements also translated to the larger problems we demonstrated.

Clearly, our algorithm represents an advancement in accuracy and usability and more closely represents what will be put into real-world applications of this theory in the near future. Scalability is a critical step toward breaking the curse of dimensionality that currently plagues solving the Poisson equation, and our multilevel optimized circuit alleviates many of the pressures holding this technology back by dynamically controlling the problem size and register size of crucial segments of the algorithm.

While we were successful in demonstrating our advancements to the Quantum Poisson Solver on a simulator, current quantum hardware proved to be too error-prone to provide accurate results [56, 63, 64]. In spite of that, we were able to demonstrate the improvements in the experimental result on IBM’s ibmq_manila device by mitigating error on the measurement qubits on a simplified circuit built out of an exact input/output state of a \(3\times 3\) problem and including a small number of CNOT gates. Ultimately, the accumulated error of the large number of CNOT gates required in the full circuit for the Poisson solver, in conjunction with the number of qubits necessary for larger problems, was the limiting factor in our exploration. However, this work has laid the foundation for advanced algorithms that will become usable in the near future as hardware improvements continue. As we see the arrival of more accurate systems with lower CNOT error rates, our algorithm will become usable in larger and more practical problems.

We have also discussed a vision of how the problem size can be further extended while managing the circuit width and depth at a level suitable to the current technology. In this regard, we have prescribed multilevel solutions, including combining an iterative framework as proposed by Saito et al. [21], in order to ensure even higher accuracy in results with fewer repeated shots while requiring an optimum number of qubits. Encouraged by the industry’s near-term hardware development roadmap (such as IBM’s upcoming quantum-centric supercomputing hardware [48], for example), we proposed partitioning large circuits through circuit knitting techniques and then running the subcircuits on multiple QPUs in parallel. This would allow us to explore significantly larger problems, including multidimensional ones, with greater computational speed-up.