Unusual energy spectra of matrix product states

J. Maxwell Silvester¹ Giuseppe Carleo² Steven R. White¹ ¹Department of Physics and Astronomy, University of California, Irvine, California 92667, USA ²Institute of Physics, École Polytechnique Fédérale de Lausanne (EPFL), CH-1015 Lausanne, Switzerland

(August 21, 2024)

Abstract

In the simulation of ground states of strongly-correlated quantum systems, the decomposition of an approximate solution into the exact eigenstates of the Hamiltonian—the energy spectrum of the state— determines crucial aspects of the simulation’s performance. For example, in approaches based on imaginary-time evolution, the spectrum falls off exponentially with the energy, ensuring rapid convergence. Here we consider the energy spectra of approximate matrix product state (MPS) ground states, such as those obtained with the density matrix renormalization group (DMRG). Despite the high accuracy of these states, contributions to the spectra are roughly constant out to surprisingly high energy, with an increase in bond dimension reducing the amplitude but not the extent of these high-energy tails. The unusual spectra, which appear to be a general feature of compressed wavefunctions, have a strong effect on sampling-based methods, yielding large fluctuations. For example, estimating the energy variance using sampling performs much more poorly than one might expect. Bounding the most extreme samples makes the variance estimate much less noisy but introduces a strong bias. However, we find that this biased variance estimator is an excellent surrogate for the variance when extrapolating the ground-state energy, and this approach outperforms competing extrapolation methods in both accuracy and computational cost.

^†^†preprint: DMRG_Sampled_Variance

Numerical simulations of a variety of sorts have become essential for the study of quantum many-body systems. Given an approximate ground state, $\ket{\psi}$ , we consider its decomposition into exact energy eigenstates, i.e., its energy spectrum,

\ket{\psi}=\sum_{n}c_{n}\ket{n},

(1)

where $\{\ket{n}\}$ are the eigenstates of the Hamiltonian with corresponding energies $\{E_{n}\}$ . For a number of numerical approaches based on imaginary-time evolution, e.g. quantum Monte Carlo, the coefficients $c_{n}^{2}$ fall off exponentially with the gap $E_{n}-E_{0}$ , where $E_{0}$ is the ground-state energy [1, 2]. This exponential fall-off provides strong guarantees on convergence rates, particularly where there are no near-degeneracies. Because the density matrix renormalization group (DMRG) can achieve very accurate ground-state energies, one might assume that the high-energy coefficients of the corresponding matrix product state (MPS) would also fall off rapidly as the energy gap increases [3, 4, 5, 6, 7, 8]. Since producing the spectrum of a DMRG state requires a full diagonalization of the Hamiltonian, it is only available on small test systems. Surprisingly, we find that the high-energy coefficients do not decrease exponentially with the gap. In fact, for a substantial energy range they do not decrease much at all.

Refer to caption — Figure 1: Energy spectra of approximate ground states from DMRG and imaginary-time evolution (ITE) for a small cluster with fully periodic boundary conditions, broadened by Gaussians with width $0.1$ . The states are matched to have the same energy, with $E-E_{0}=0.016$ , but the energy variance $\sigma_{\hat{H}}^{2}$ is much larger for DMRG, $0.119$ versus $0.0092$ .

In Fig. 1 we show spectra for a spin $S=1/2$ Heisenberg model on a $4\times 4$ square lattice cluster. A description of all the models we consider can be found in the supplementary materials (SM)[9]. We compare DMRG results to those from an exact imaginary-time evolution (ITE) starting with a Néel state. The imaginary-time duration, $\tau$ , is chosen to make the two states have approximately the same energy. Note that we choose fully periodic boundary conditions, for which DMRG requires larger bond dimension to achieve accurate results, in order to emulate a DMRG run on a larger system while still being able to fully diagonalize the Hamiltonian. A clear exponential tail is seen for the ITE spectrum, as expected. In contrast, the DMRG spectrum has very little weight in the first few excited states and a tail that is roughly constant out to surprisingly high energies.

In this Letter, we show that these high-energy contributions to DMRG wavefunctions appear quite generally across different lattice models and geometries, and explore important implications of such spectra for sampling-based methods.

Unusual spectra—Such unusual energy spectra are due to the compression of the state into an MPS, rather than some aspect of the DMRG energy optimization. This is shown in Fig. 2(a), where we compress an ITE state into an MPS. The spectrum flattens at an energy gap determined by the bond dimension, $m$ . Note that for this small cluster, the maximum bond dimension for any state to be represented exactly is $m=256$ , the Hilbert space dimension of half the system. It is remarkably that the flat tail is apparent even with a truncation to $m=250$ . While some details of the results of Fig. 2(a) are specific to the fully periodic boundaries, the qualitative behavior is independent of boundary conditions. This is illustrated in Fig. 2(b), where we compare DMRG spectra to spectra obtained from truncating the exact ground-state to the same bond dimension as the corresponding DMRG state. Regardless of bond dimension or boundary conditions, the spectra show similar qualitative features, extending to high energy. Moreover, in each case the DMRG spectrum matches closely the spectrum from truncation of the exact solution, confirming that indeed the spectra are a result of the compressed MPS representation of the wavefunction.

It is also interesting to consider the average energy of a DMRG state when the exact ground state is projected out, i.e., the energy of the excited part of the solution. Thus, we decompose the approximate ground state as

\ket{\psi}=\sum_{n}c_{n}\ket{n}=\cos\theta\ket{0}+\sin\theta\ket{r},

(2)

where the residue, $\ket{r}$ , is normalized. In Fig. 2(c) we show the average energy of the residue, i.e., $E_{r}=\braket{r}{\hat{H}}{r}$ , versus energy error as we vary the bond dimension in a DMRG calculation. Remarkably, $E_{r}$ increases sharply as the bond dimension is increased, eventually reaching about $2/3$ of the total energy bandwidth (dashed blue line). Of course, this increase in the residue’s energy is more than compensated for by a decrease in $\theta$ , so that the $E-E_{0}$ still approaches zero. In contrast, for the ITE state, with increasing $\tau$ , $E_{r}$ simply approaches $E_{1}=\braket{1}{\hat{H}}{1}$ , the energy of the first excited state.

The striking dissimilarity between the residues of the DMRG and ITE states provides a way to probe the energy spectra of larger systems, where exact diagonalization is unavailable. Note that for the special case when $\ket{r}=\ket{1}$ , i.e., when all of the error in the approximate ground-state is in the first excited state, the quantity

\tilde{v}=(E-E_{0})^{2}\cos\theta+(E-E_{1})^{2}\sin\theta.

(3)

is equal to the wavefunction’s energy variance, $\sigma_{\hat{H}}^{2}=\bra{\psi}(\hat{H}-E)^{2}\ket{\psi}$ . Importantly, calculating $\tilde{v}$ requires only quantities that can be easily estimated within DMRG, such as the lowest energy gap. The ratio $\sigma_{\hat{H}}^{2}/\tilde{v}$ serves as a measure of the high-energy contributions to the wavefunction. Specifically, for ITE states, this ratio reduces to unity as the ground state is approached. In contrast, for DMRG wavefunctions this ratio steadily increases as the MPS converges, reaching values of $10^{2}-10^{4}$ or higher. This prominent signal is seen across different geometries, boundary conditions, and models, as described in the SM [9], showing that high-energy residues are a quite-general feature of MPS solutions.

Why do MPS spectra have these high-energy tails? Suppose they did not; imagine instead that like an ITE state, an MPS state of low bond dimension was, to high accuracy, composed of a small number of excited states. The MPS representation of each of these excited states will have high bond dimension, since we do not consider them to be approximate. Then, in forming the low bond dimension DMRG state as a linear combination of the few large bond dimension states, we would need a precise cancellation of a huge number of parameters. This cancellation is reasonable only if the number of states we are combining is comparably large, therefore requiring that they extend to high energy. Note that this reasoning is not very specific to MPS states: not only would it apply to other tensor network states, but to any type of state that can be considered a general compression of the exponentially large space, e.g., a neural network state.

Consequences for sampling— To highlight the strong effect that MPS spectra can have on sampling-based methods, we now consider the specific task of calculating the energy variance of the state, first reviewing non-sampling approaches. Note that in the inset of Fig. 2(c), where the black and red dots represent successive DMRG sweeps or increasing $\tau$ , respectively, the DMRG state has a much higher variance than the ITE state, for fixed energy error, $E-E_{0}$ . While this discrepancy—a direct consequence of the high-energy tails—may make comparisons of different numerical methods based on energy variance misleading[10], the near-linear behavior shows that the variance may still be an ideal candidate for extrapolating the energy to the infinite bond dimension, zero variance limit. Unfortunately, computing the exact energy variance of a DMRG wavefunction is more expensive than a DMRG sweep by a factor of $w$ , the bond dimension of the Hamiltonian expressed as a matrix product operator. For systems with long-range interactions or two-dimensional (2D) geometry we may have $w\sim 30$ or greater.

Historically, it has instead been standard practice in DMRG calculations to perform extrapolations in the truncation error, i.e., the total probability of the states that are thrown away during sweeping to compress the state[5, 11, 12]. Not only do such extrapolations provide error bars, but non-linearity in the energy versus truncation error can also indicate that the initial wavefunction was far from the ground state and the system is not well converged. While the truncation error is calculated at almost zero computational cost during a DMRG sweep, it depends on details of the sweeping, and producing a truncation error usable for extrapolation requires care in the construction of the sweeping procedure. In cases with long-range interactions, truncation error extrapolations may be unreliable. Moreover, estimating the truncation error is unavailable in the single-site implementation of DMRG, which is cheaper than the two-site algorithm[13, 14].

More recently, the two-site variance has been introduced as a means to approximate the energy variance of the DMRG wavefunction at an affordable cost, enabling extrapolations when running single-site DMRG[15]. While this quantity is an accurate approximation of the variance for nearest-neighbor one-dimensional (1D) open chains, it can deviate strongly for 2D systems. However, its errors are systematic and the two-site variance vanishes when the state is an exact eigenstate. Thus, this quantity can be used to produce high-quality extrapolations. The cost of computing the two-site variance is similar to that of a DMRG sweep, roughly doubling run-time if the variance is calculated every sweep.

As another alternative to estimate the variance cheaply, we consider using sampling. Unlike in many quantum Monte Carlo algorithms, which construct a Markov chain with nonzero autocorrelation times, MPS states can be sampled with one perfectly independent sample per step. This so-called perfect sampling technique is the starting-point of both the minimally entangled typical thermal state algorithm and MPS-based variational Monte Carlo methods[16, 17, 18, 19]. A sample is a product state $\ket{s}$ , where the product is over states of each site, which is generated with probability $P_{s}=|\braket{\psi}{s}|^{2}$ . The product states form a complete basis, $\mathcal{B}$ , so that we may rewrite the energy variance as

\sigma^{2}_{\hat{H}}=\sum_{\ket{s}\in\mathcal{B}}\braket{\psi}{\hat{H}-E}{s}% \braket{s}{\hat{H}-E}{\psi},

(4)

where $E=\braket{\psi}{\hat{H}}{\psi}$ is the expectation value of the energy. The support, $\mathcal{S}$ , is the set of basis vectors for which $P_{s}$ is nonzero. For states in $\mathcal{S}$ , we define the local energy, $E_{s}^{L}=\frac{\braket{s}{\hat{H}}{\psi}}{\braket{s}{\psi}}$ ; outside $\mathcal{S}$ , the local energy is undefined. Then, splitting the sum in Eq. (4) into two parts, we have

\sigma^{2}_{\hat{H}}=\Delta_{s}+\sum_{\ket{s}\in\mathcal{S}}P_{s}|E_{s}^{L}-E|% ^{2},

(5)

where the local energy bias is

\Delta_{s}=\sum_{\ket{s}\notin\mathcal{S}}\|\braket{\psi}{\hat{H}}{s}\|^{2}.

(6)

Note that, alternatively, an unbiased estimator can be obtained by inserting the identity operator next to $(\hat{H}-E)^{2}$ in Eq. (4), instead of between the two copies of $(\hat{H}-E)$ . In that case the samples with $P_{s}=0$ do not contribute to the variance. However, in the cases we have studied, the sampling bias $\Delta_{s}$ is typically small, and it can be shown to rigorously vanish for exact eigenstates; usually, it can be ignored. We also find that the unbiased version of sampling is both more expensive and also prone to worse fluctuations, so we utilize the local energy sampling[9].

With $N_{s}$ samples, the sampling estimate of the energy variance is

\sigma^{2}_{\hat{H}}-\Delta_{s}\approx\frac{1}{N_{s}}\sum_{i=1}^{K}|E_{i}^{L}-% E|^{2}.

(7)

The cost of selecting a product state, $\ket{s}$ , in a perfect measurement, is $\mathcal{O}(m^{2}d+md^{2})$ , where $m$ is the bond dimension of the MPS and $d$ is the number of states on one site. Somewhat more expensive is a calculation of $E_{s}^{L}$ , which scales as $\mathcal{O}(m^{2}dw+md^{2}w^{2})$ , where $w$ is the bond dimension of the Hamiltonian MPO. This is a factor of the bond dimension $m$ less expensive than a DMRG sweep, which scales as $\mathcal{O}(m^{3}dw+m^{2}d^{2}w^{2})$ . Estimating the variance via sampling can cost less than or be comparable to a sweep, provided that we take a modest number of samples, i.e., $N_{s}<m$ , where typically $m\sim 1000-10000$ . In Fig. 3(a) we plot both the two-site and sampled variance of an approximate solution from DMRG, for a nearest-neighbor 2D Heisenberg cluster with periodic boundary conditions. With on the order of a thousand samples, our method provides a better variance estimate than the two-site variance, which is quite inaccurate due to the 2D geometry and periodic boundary conditions.

Despite its relative success in Fig. 3(a), the sampled variance is noisy, with large fluctuations even after hundreds of thousands of samples; near sample 200,000 is a large jump in the running average, due to a sample with a very large local energy deviation. We believe the large fluctuations we see in sampling the energy variance are closely tied to the unusual energy spectrum. In general, some error measure is a good candidate for extrapolation if it reduces smoothly and predictably to zero in the limit of large bond dimension. Because of these large fluctuations, the standard sampling of the variance does not in general produce good extrapolations. Even in cases such as Fig. 3(a) where the sampled variance is more accurate, the non-stochastic nature of the two-site variance makes it a better error measure for extrapolation.

To understand and potentially eliminate the noisiness, we investigate the sampling statistics in more detail. For small systems such as the one used for Fig. 3(a), it is possible to enumerate every sample and compute its local energy. Results from such an analysis are shown in Fig. 3(b) - (c). We see a dramatic difference in the local energy probability distributions. The DMRG distribution has samples with very small probability but substantial contributions to the variance. For example, consider the two samples near $P_{s}\sim 10^{-8}$ in Fig. 3(b): these contributed $\sim 10^{-4}$ to the variance each, so they must have $|E^{L}_{s}-E|^{2}\sim 10^{4}$ . Such extreme outliers for the DMRG state, as shown explicitly in Fig. 3(c), explain the sudden spikes in the running average.

Biased variance extrapolation—Given large fluctuations in the probability distributions, one is tempted to eliminate the outliers in some way. We consider bounding the local energy to within a symmetric interval of the exact energy, $E$ , which is known from the DMRG sweep. Letting $\epsilon=E_{s}^{L}-E$ , we define the piece-wise function

f(\epsilon)=\begin{cases}-\epsilon_{\text{max}}&\text{ if }\epsilon<-\epsilon_% {\text{max}}\\ \epsilon&\text{ if }-\epsilon_{\text{max}}\leq\epsilon\leq\epsilon_{\text{max}% }\\ \epsilon_{\text{max}}&\text{ if }\epsilon>\epsilon_{\text{max}}\end{cases}

(8)

where $\epsilon_{\text{max}}>0$ is some specified maximum local deviation. We replace $\epsilon$ by $f(\epsilon)$ for estimating the variance. This transformation of the data pushes in the long tails of the the local energy distribution to $\pm\epsilon_{\text{max}}$ , while leaving samples with smaller deviations untouched.

As one might expect, this transformation produces significantly biased variances; see Figure 4(a). We call this the cutoff bias, $\Delta_{c}$ . However, in the extrapolation of the energy to zero variance as the bond dimension is increased, $\Delta_{c}$ decreases smoothly with the variance, and we find that this bias does not interfere with extrapolation. In order to extrapolate data, we must have a systematic way of choosing $\epsilon_{\text{max}}$ at each bond dimensions. Note that given a $\epsilon_{\text{max}}$ , and a set of samples, we can measure the cutoff ratio, $c$ , the percentage of the local energies where $f(\epsilon)=\epsilon$ . A good approach for determining $\epsilon_{\text{max}}$ comes from be fixing $c$ . Then the question is: what is the optimal $c$ , and how well does this method extrapolate?

For a variety of models, geometries, and boundary conditions, we have found that setting $c=0.90$ provides excellent results; see the SM for the details[9]. As shown in Figure 4(a), as $c$ decreases the sampling fluctuations drastically decrease; by $c=0.9$ the biased variance is determined accurately with a few thousand samples. For this small cluster, we can explicitly compute the asymptotic values [9], as indicated by the dashed lines. In Figure 4(b) we show extrapolations of the ground-state energy for a $12\times 6$ cylindrical Heisenberg cluster, where we compare extrapolations using the exact and two-site variances, as well as from the sampled variance ( $N_{s}=2000$ samples) with and without the 90-th percentile cutoff. The higher bond dimension data points on the inset are not used to generate the best-fit lines, but rather to test their accuracy. We find that the $c=0.9$ extrapolations offer a significant improvement over the original $c=1$ data. Even more, this biased sampling usually extrapolates better than the energy variance calculated exactly without sampling. It appears that the fat tails of the variance distribution interfere with extrapolation, even when there is no sampling error. Of course, due to the stochastic nature of sampling, another batch of 2000 samples may do better or worse than this particular batch. Still, over many batches, sampling usually yields energy errors that are generally equal to or quite-often smaller than those from other methods. See the SM for a more details on the performance checks for this extrapolation technique and how it compares to two-site variance extrapolations [9].

Summary—We have found that high-energy tails are a universal feature of the energy spectra of MPS approximate ground states. We have argued that this energy structure is an inevitable consequence of the the compression of the ground state into a tensor network, although it would be useful to make this argument more mathematically precise. There are important implications to this unexpected result. For instance, when comparing a DMRG result to a result from another numerical approach, e.g., quantum Monte Carlo, the energy variance of the DMRG state will be significantly larger if the two results have similar energy errors. An even more significant consequence are large fluctuations in sampling-based approaches. The distribution of sampled local energies have fat tails, impeding the accurate determination of the energy variance. Similar large fluctuations may appear in other approaches, such as MPS-based variational Monte Carlo. However, in the case of the energy variance of DMRG wavefunctions, we found that a biased variance estimator made from bounding the fluctuations can be used to produce excellent extrapolations of the ground state energy at low cost. Importantly, the sampled variance can be computed even in the context of single-site DMRG, in the presence of long range interactions, and without careful design of the sweeping procedure, making it a powerful new tool for DMRG calculations.

Acknowledgements—This work was supported by the National Science Foundation under DMR-2110041, as well as by the Eddleman Quantum Institute. We would also like to thank Shengtao Jiang, Miles Stoudenmire, Uli Schollwöck, and Antoine Georges for insightful conversations.

References

Becca and Sorella [2017] F. Becca and S. Sorella, Quantum Monte Carlo Approaches for Correlated Systems (Cambridge University Press, 2017).
Shi et al. [2018] T. Shi, E. Demler, and J. I. Cirac, Ann. Phys. 390, 245 (2018).
White [1992] S. R. White, Phys. Rev. Lett. 69, 2863 (1992).
Schollwöck [2011] U. Schollwöck, Ann. Phys. 326, 96 (2011).
Stoudenmire and White [2012] E. Stoudenmire and S. R. White, Annu. Rev. Condens. Matter Phys. 3, 111–128 (2012).
White and Scalapino [2009] S. R. White and D. J. Scalapino, Phys. Rev. B 79, 220504 (2009).
Depenbrock et al. [2012] S. Depenbrock, I. P. McCulloch, and U. Schollwöck, Phys. Rev. Lett. 109, 067201 (2012).
S. Yan and White [2011] D. A. H. S. Yan and S. R. White, Science 332, 1173 (2011).
[9] See Supplemental Material at URL-will-be-inserted-by-publisher for description of lattice models, details about alternative sampling setups, analysis of energy spectra of larger systems, and performance checks of our new extrapolation method on a variety of systems.
Wu et al. [2023] D. Wu, R. Rossi, F. Vicentini, N. Astrakhantsev, F. Becca, X. Cao, J. Carrasquilla, F. Ferrari, A. Georges, M. Hibat-Allah, M. Imada, A. M. Läuchli, G. Mazzola, A. Mezzacapo, A. Millis, J. R. Moreno, T. Neupert, Y. Nomura, J. Nys, O. Parcollet, R. Pohle, I. Romero, M. Schmid, J. M. Silvester, S. Sorella, L. F. Tocchio, L. Wang, S. R. White, A. Wietek, Q. Yang, Y. Yang, S. Zhang, and G. Carleo, Variational benchmarks for quantum many-body problems (2023), arXiv:2302.04919 [quant-ph] .
LeBlanc et al. [2015] J. P. F. LeBlanc, A. E. Antipov, F. Becca, I. W. Bulik, G. K.-L. Chan, C.-M. Chung, Y. Deng, M. Ferrero, T. M. Henderson, C. A. Jiménez-Hoyos, E. Kozik, X.-W. Liu, A. J. Millis, N. V. Prokof’ev, M. Qin, G. E. Scuseria, H. Shi, B. V. Svistunov, L. F. Tocchio, I. S. Tupitsyn, S. R. White, S. Zhang, B.-X. Zheng, Z. Zhu, and E. Gull (Simons Collaboration on the Many-Electron Problem), Phys. Rev. X 5, 041041 (2015).
Ehlers et al. [2017] G. Ehlers, S. R. White, and R. M. Noack, Phys. Rev. B 95, 125125 (2017).
White [2005] S. R. White, Phys. Rev. B 72, 180403 (2005).
Hubig et al. [2015] C. Hubig, I. P. McCulloch, U. Schollwöck, and F. A. Wolf, Phys. Rev. B 91, 155115 (2015).
Hubig et al. [2018] C. Hubig, J. Haegeman, and U. Schollwöck, Phys. Rev. B 97, 045125 (2018).
Stoudenmire and White [2010] E. M. Stoudenmire and S. R. White, New J. Phys. 12, 055026 (2010).
Ferris and Vidal [2012] A. J. Ferris and G. Vidal, Phys. Rev. B 85, 165146 (2012).
Sandvik and Vidal [2007] A. W. Sandvik and G. Vidal, Phys. Rev. Lett. 99, 220602 (2007).
Wouters et al. [2014] S. Wouters, B. Verstichel, D. Van Neck, and G. K.-L. Chan, Projector quantum monte carlo with matrix product states, Phys. Rev. B 90, 045104 (2014).