Keywords

1 Introduction

Simulating atomistic evolution is essential for predicting materials properties for use in materials science, chemistry, and biology. Molecular dynamics (MD), the gold standard for atomistic simulations, simulates the interactions of atoms and molecules for a fixed period of time [1, 14]. The limited temporal scale is inherent from the system’s sequential nature but poses great challenges for observing various transitions on MD time scales. Recent advances leverage statistics and massively parallel computers to accelerate the simulations, such as the accelerated molecular dynamics method [15, 18] and the parallel replica dynamics method (ParRep) [19]. In particular, the parallel trajectory splicing method (ParSplice) [12], as a generalization of ParRep, has drawn much attention in recent years because of its superior computational performance by utilizing large-scale high-performance computers to parallelize the state-to-state simulations in the time domain. One major burden, however, is the need to store all the local minimizers computed in the long MD simulation, since they are used as the initial guess for the next state simulation. Since the number of local minimizers is exponential in the number of atoms, storage space must be conserved, even though each individual minimizer is a small set of numbers (e.g., 3D coordinates of the atom positions). Lossy compression of these local minimizers can potentially result in significant savings. By compressing the values, we can cache more values locally on the compute node and send/receive less information from the global database. However, a critical step is to identify the basins of local convergence for a particular numerical method. As long as the compressed data stays in the same basin, we can recover the local minimizer by applying the numerical method to the stored iterate.

In this work, for an atomic system and Newton’s method as the underlying solver, we employ a Kantorovich-type theorem for studying the basins of attraction of potential energy minima. We propose a mathematical framework to estimate bounds for the compression level of an optimizer when using binary digit rounding as the compression routine. One critical step is to calculate the required information, namely, the local Lipschitz-type constant of the function’s derivative. As the system gets large, an analytic expression for this information is likely impossible. Therefore, a robust method to approximate this information is required. By exploring the topology of the potential energy landscape, we propose an efficient way to estimate the required constant. We prototype our method on the simple, yet common, Lennard-Jones potential function and validate our approach on various numerical examples.

2 Minimization of Lennard-Jones Potential

Given a configuration of n atoms in a cluster \(\mathbf{X}= \lbrace \mathbf{x}_1, \dots , \mathbf{x}_n \rbrace \), we consider the Lennard-Jones (LJ) potential function [8], a simplified model that simulates the potential energy of the cluster based on inter-atom distances:

$$\begin{aligned} \hat{V}_n(\mathbf{X}) = 4\varepsilon \sum _{i < j}\left( \left( \frac{\sigma }{\hat{d}(\mathbf{x}_i,\mathbf{x}_j)}\right) ^{12} - \left( \frac{\sigma }{\hat{d}(\mathbf{x}_i,\mathbf{x}_j)}\right) ^{6}\right) , \end{aligned}$$
(2.1)

where \(\hat{d}(\mathbf{x}_i,\mathbf{x}_j)\) is the Euclidean distance between atoms \(\mathbf{x}_i,\mathbf{x}_j\in \mathbb {R}^3\). The physical constants \(\varepsilon \) and \(\sigma \) are the depth of the potential well and inter-atom reaction limit, respectively, both depending on the type of atom. Because of its importance in computational chemistry, finding optimal configurations that locally minimize the potential energy (2.1) remains an active research area. For example, Maranas et al. [10] proposed an exotic optimization algorithm to find many stationary points, and Asenjo et al. [3] studied the mapping of the basins of attraction for various optimization algorithms.

In this work, we follow the longstanding convention in [7] and consider the reduced-unit optimization problemFootnote 1

$$\begin{aligned} \displaystyle \min _{\mathbf{X}}V_n(\mathbf{X}) = \sum _{i < j} \left( \frac{1}{d(\mathbf{x}_i,\mathbf{x}_j)^{12}} - \frac{1}{d(\mathbf{x}_i,\mathbf{x}_j)^6} \right) , \end{aligned}$$
(2.2)

whereby we set \(\displaystyle d(\mathbf{x}_i,\mathbf{x}_j) = \frac{\hat{d}(\mathbf{x}_i,\mathbf{x}_j)}{\sigma }\) and \(\displaystyle V_n = \frac{\hat{V}_n}{4\varepsilon }\) to eliminate the parameters in (2.1). The coordinates of the atoms \(\mathbf{x}_i=(x_i,y_i,z_i),\, i=1,\dots ,n\) are unknown variables; and we fix atom 1 at (0, 0, 0), atom 2 at \((x_2,0,0)\), and atom 3 at \((x_3,y_3,0)\). In addition to eliminating the need to postprocess a collection of distances in order to visualize the system, using coordinates in this manner offers some mathematical advantages. First, for \(n \ge 3\) the function \(V_n\) has \(3n-6\) degrees of freedom (DOF), while the pairwise distance formulation has \(O(n^2)\) DOF. Second, using coordinates eliminates the need for both configuration feasibility and non-negative constraints on the variables, resulting in an unconstrained optimization problem. Third, fixing the coordinates of atoms 1–3 in the described manner ensures distinguishable minimizers with reduced quantity. However, the reduced LJ potential function \(V_n\) still has \(O(\exp (n^2))\) unique minimizers [7], and even enumerating the quantity of local minima of a cluster has been shown to be an NP-hard problem [21]. Furthermore, each unique minimizer has at least \(O(n^3)\) equivalent permutations (for example, by interchange of atom numbering and rotation).

Fig. 1.
figure 1

Left: landscape of function \(V_2\) given \(x_2\) as its only free variable. Right: landscape of function \(V_3\) with reduced dimension by setting \(x_2 = 2^{1/6}\) (its equilibrium distance) with DOF \(x_3,y_3\).

Figure 1 illustrates the functions \(V_2\) and \(V_3\), respectively. We can see that minimizing even such small systems can be challenging. First, when any pairwise atom distance approaches zero, there exists a singularity in \(V_n\). Second, when any pairwise distance \(d(\mathbf{x}_i,\mathbf{x}_j)\) is large, the norm of the corresponding gradient components are (numerically) very small. That is, there can be an infinite number of (nearly) critical points when \(\Vert \mathbf{x}_i\Vert \rightarrow \infty \) for some i. As such, a careful choice of initial guess is necessary to ensure that the optimization algorithm converges to the desirable minimizer. Many researchers (e.g., [17, 23]) leverage geometric principles in their numerical methods to generate initial iterates.

3 Method and Small System Validation

In this section, we survey existing theorems that can guarantee the local convergence of problem (2.2). The application of these theorems allows us to determine a maximal amount that a local minimizer can be perturbed under a lossy compression technique, while preserving its basin of attraction. That is, the given optimization algorithm applied to the compressed data point converges to the same optimal point. To quantify the amount of compression in terms of the acceptable perturbation, we use binary digit rounding as a simple compression technique.

For the optimization algorithm applied to recover the minimizer, we use Newton’s method to solve the first-order optimality conditions of (2.2), since convergence can be guaranteed under certain conditions when the iterates are near a root. Among the most well-known convergence results for Newton’s method, the Kantorovich Theorem (KT) [9] continues to draw attention by many researchers (e.g., [2, 13, 24]). The crux of the KT result guarantees that a solution \(\mathbf{X}^*\) of \(U_n(\mathbf{X}) = V_n'(\mathbf{X}) = 0\) exists in a neighborhood of a point \(\mathbf{X}^{(0)}\) under some assumptions, and that Newton’s method converges to \(\mathbf{X}^*\) when \(\mathbf{X}^{(0)}\) is the initial iterate. Note that KT does not require knowledge of the optimal point, only information based on the initial iterate, a workflow opposite to our compression framework where \(\mathbf{X}^*\) is given. Instead, we exploit a variant of KT called the ball of convergence about \(\mathbf{X}^*\) that assumes its prior knowledge.

First, we define \(B(\mathbf{X}^*,r)\) as an open ball in a Banach space \(\mathcal {B}\) with center \(\mathbf{X}^*\) and radius \(r > 0\), that is, \(B(\mathbf{X}^*,r) = \lbrace \mathbf{X}\in \mathcal {B} : \Vert \mathbf{X}- \mathbf{X}^* \Vert < r\rbrace \). In our case, for any \(U_n\), \(n \ge 3\), we have \(\mathcal {B} = \mathbb {R}^{3n-6}\), and we choose the Euclidean norm on \(\mathcal {B}\), abbreviating \(\Vert \mathbf{X}\Vert := \Vert \mathbf{X}\Vert _2\). We use the same convention for the compatible matrix operator norm, namely, \(\displaystyle \Vert U_n'(\mathbf{X}) \Vert := \Vert U_n'(\mathbf{X}) \Vert _2 = \max _{\mathbf{Y}\in \mathcal {B}}\frac{\Vert U_n'(\mathbf{X})\mathbf{Y}\Vert _2}{\Vert \mathbf{Y}\Vert _2}\).

Theorem 1

(Rheinboldt [16]). For \(U_n: N \subseteq \mathbb {R}^m \rightarrow \mathbb {R}^m\), suppose \(U_n(\mathbf{X}^*) = 0\) and that \((U_n'(\mathbf{X}^*))^{-1}\) exists. Furthermore, for some \(r_1 > 0\), there exists a constant \(L_* > 0\) such that

$$\begin{aligned} \Vert (U_n'(\mathbf{X}^*))^{-1}(U_n'(\mathbf{X})-U_n'(\mathbf{Y})) \Vert \le L_* \Vert \mathbf{X}-\mathbf{Y}\Vert \end{aligned}$$
(3.3)

for all \(\mathbf{X},\mathbf{Y}\in B(\mathbf{X}^*,r_1)\). Then, if \(r_1 < \frac{2}{3L_*}\), Newton’s method converges uniquely to \(\mathbf{X}^*\) for any \(\mathbf{X}^{(0)} \in B(\mathbf{X}^*,r_1)\).

Alternatively, we have the following.

Theorem 2

(Wang [20]). For \(U_n: N \subseteq \mathbb {R}^m \rightarrow \mathbb {R}^m\), suppose \(U_n(\mathbf{X}^*) = 0\) and that \((U_n'(\mathbf{X}^*))^{-1}\) exists. Furthermore, for some \(r_2 > 0\), there exists a constant \(L_{**} > 0\) such that

$$\begin{aligned} \Vert (U_n'(\mathbf{X}^*))^{-1}U_n'(\mathbf{X})-I \Vert \le L_{**} \Vert \mathbf{X}-\mathbf{X}^* \Vert \end{aligned}$$
(3.4)

for all \(\mathbf{X}\in B(\mathbf{X}^*,r_2)\). Then, if \(r_2 \le \frac{2}{L_{**}}\), Newton’s method converges uniquely to \(\mathbf{X}^*\) for any \(\mathbf{X}^{(0)} \in B(\mathbf{X}^*,r_2)\).

A ball \(B(\mathbf{X}^*,r)\) satisfying either Theorems 1 or 2 is a ball of convergence for \(\mathbf{X}^*\), and any compressed minimizer that lies in \(B(\mathbf{X}^*,r)\) preserves the basin of attraction of \(\mathbf{X}^*\) under Newton’s method.

Notice that the given theorems require that \(U_n(\mathbf{X}^*)\) be exactly zero. For computer/numerical calculations this assumption is almost never true. To adapt this condition, one can perform a comprehensive convergence test for terminating the iterates to ensure that the sequences \(\{U_n(\mathbf{X}^{(k)})\}\) and \(\{\mathbf{X}^{(k)}\}\) are both converging. This numerical issue is negligible when accurate minimizers are computed. All computed \(\mathbf{X}^*\) for our numerical tests are accurate to at least 10 decimal places.

While different lossy compression techniques can be implemented depending on their intended use, our work focuses on providing a theoretical bound on the maximum amount of perturbation that any lossy compression scheme can afford, in order to preserve a basin of attraction. We use significant binary digit truncation/rounding as the underlying compression technique since it is computationally cheap to implement and easy to quantify. Furthermore, the resulting relative perturbations are small and easy to control, since rounding a real number to c significant binary digits (bits) produces a maximum relative perturbation of \(\displaystyle \frac{2^{-(c+1)}}{1+2^{-(c+1)}}\). For example, rounding to 4 significant bits has a maximum relative perturbation of \(3.03 \%\), and to 8 bits with more than 0.0195 %, a potentially significant reduction in memory from the standard 52 significant bits of IEEE double precision. We use \(\mathbf{X}^{(0),c}\) to denote \(\mathbf{X}^*\) compressed to c significant bits, which is used as the initial guess for Newton’s method.

To understand the usage of the stated theory and how it may be applied at large scales, we offer a brief analysis of the 2-atom (1 free variable) LJ potential function \(V_2\). We apply Newton’s method to the first-order conditions of (2.2) by solving \(0 = U_2(\mathbf{X}) = U_2(x_2) = -12x_2^{-13} + 6x_2^{-7}\), for \(x_2^*=2^{1/6}\). To apply Theorems 1 and 2, we must choose radii \(r_1\) and \(r_2\) small enough to meet their corresponding Lipschitz inequality. This step may not be straightforward, however, because both \(L_*\) and \(L_{**}\) depend on r. In this one-dimensional case, we can explicitly solve for these radii because we know that the maximum values of \(\vert U_2' \vert \) and \(\vert U_2'' \vert \) are attained at \(2^{1/6} - r\) for any \(0< r < 2^{1/6}\). We have the following:

$$\begin{aligned} \left| \frac{ -2184(2^{1/6} - r_1)^{-15} + 336(2^{1/6} - r_1)^{-9}}{156(2^{1/6})^{-14} - 42(2^{1/6})^{-8}} \right|< \frac{2}{3r_1}&\implies r_1 < 0.02432,\\ \left| \frac{156(2^{1/6} - r_2)^{-14} - 42(2^{1/6} - r_2)^{-8}}{r_2(156(2^{1/6})^{-14} - 42(2^{1/6})^{-8})} - \frac{1}{r_2} \right| \le \frac{2}{r_2}&\implies r_2 \le 0.06276. \end{aligned}$$

Therefore, given the radii needed for different truncations \(r_c = \vert x_2^{(0),c} - x_2^* \vert \), either the Rheinboldt or the Wang criterion can guarantee that Newton’s method converges to \(x_2^* = 2^{1/6}\) for compression as far as 4 bits (\(r_c = 0.0025\)). Note that Newton’s method succeeds beyond the theoretical guarantee for all compressed \(x_2^{(0),c}\) down to 1 bit, but it does fail for \(c = 0\), where \(x_2^{(0),c} = 2.0\) and the Newton iterates coincide with large \(x_2\) of nearly critical configurations.

In this case, choosing Theorems 1 or 2 does not make a difference in the amount of binary digit compression. However, the ball of convergence provided by Theorem 2 can be three times more generous, given the different prefactors in the radius conditions. More specifically, if the derivative \(U_n'\) is linear, for any configurations \(\mathbf{X},\mathbf{Y}\), we have

$$\frac{\Vert U_n'(\mathbf{X})-U_n'(\mathbf{X}^*) \Vert }{\Vert \mathbf{X}-\mathbf{X}^* \Vert } = \frac{\Vert U_n'(\mathbf{X})-U_n'(\mathbf{Y}) \Vert }{\Vert \mathbf{X}-\mathbf{Y}\Vert }.$$

Provided that \(U_n'(\mathbf{X})\) is approximately linear near \(\mathbf{X}^*\), then

$$\begin{aligned} \sup _{\mathbf{X}\in B(\mathbf{X}^*,r)} \frac{\Vert (U_n'(\mathbf{X}^*))^{-1}U_n'(\mathbf{X})-I \Vert }{\Vert \mathbf{X}-\mathbf{X}^* \Vert } \; \approx \sup _{\mathbf{X},\mathbf{Y}\in B(\mathbf{X}^*,r)} \frac{\Vert (U_n'(\mathbf{X}^*))^{-1}(U_n'(\mathbf{X})-U_n'(\mathbf{Y})) \Vert }{\Vert \mathbf{X}- \mathbf{Y}\Vert }. \end{aligned}$$

Thus \(r_2 \approx 3r_1\), suggesting that Wang’s theorem can offer a ball of convergence up to 3 times the size of Rheinboldt’s. Furthermore, the univariate calculation of \(L_{**}\) in (3.4) is more easily tractable than the bivariate calculation required in (3.3), in the sense that \(\displaystyle \mathbf{Y}= \underset{\mathbf{X}\in B(\mathbf{X}^*,r)}{{{\,\mathrm{arg sup}\,}}} \left\{ \frac{\Vert (U_n'(\mathbf{X}^*))^{-1}U_n'(\mathbf{X})-I \Vert }{\Vert \mathbf{X}-\mathbf{X}^* \Vert } \right\} \) can be described as a single configuration(s), leading to better estimation of \(L_{**}(n,r)\). For these reasons, we focus on applying Wang’s theorem through the rest of the paper.

Given the goal to provide a radius \(r_2\) guaranteed to satisfy Theorem 2, so that Newton’s method converges to \(\mathbf{X}^*\) for any initial guess \(\mathbf{X}^{(0),c}\in B(\mathbf{X}^*,r_2)\), we propose Algorithm 1 as the first contribution.

figure a

Figure 2 illustrates the results of applying Algorithm 1 to \(U_6\) for various radii to find an acceptable convergence ball radius. The curve for \(2/L_{**}\) is due to the approximation described in the forthcoming sections. Note that an acceptable radius can always be derived after one guess of r; but by following step 3 and refining to \(\hat{r}\) based on the results of step 1, continual improvement can be realized until the user decides to stop.

Fig. 2.
figure 2

Behavior of Algorithm 1 for \(U_6\). For any r, the result is the minimum of r and \(2/L_{**}(r)\), which for incremental r is shown by the black markers. The maximal r based on \(L_{**}^{est}\) is 0.0352.

4 Approximation of \(L_{**}\)

The 2-atom case allows us to explicitly solve for the radius of convergence because we know the location of \({{\,\mathrm{arg sup}\,}}\lbrace L_{**}(2,r) \rbrace \). As the system becomes large (\(n\ge 3\)), however, finding an analytic solution for this information is exceedingly difficult. The critical step is to find \(L_{**}(n,r)\) for a given r satisfying (3.4). For any r less than the distance to the nearest critical configuration, we assume that the supremum for (3.5) is located at the boundary of the ball.Footnote 2 Then, problem (3.5) is equivalent to finding the configuration \(\mathbf{X}\) which maximizes

$$\begin{aligned} L_{**}(n,r) = \max _{\mathbf{X}\in \partial B(\mathbf{X}^*,r)} \left\{ \frac{ \Vert (U_n'(\mathbf{X}^*))^{-1} U_n'(\mathbf{X}) - I \Vert }{r} \right\} , \end{aligned}$$
(4.6)

where \(\partial B\) denotes the boundary of the ball B.

It is straightforward that certain entries of \(U_n'(\mathbf{X})\) approach infinity if \(\mathbf{X}\) approaches a singular configuration \(\mathbf{X}^{(s)}\), where two or more atoms coincide and \(V_n\) returns infinite energy. Therefore, any singular configuration maximizes \(||U_n'(\mathbf{X}) - U_n'(\mathbf{X}^*)||\). Furthermore, the shortest route to perturb a configuration to be singular is to move the two closest atoms toward each other. These observations inspire us to propose an efficient approach to approximate \(L_{**}\) in Algorithm 2, which is denoted as \(L_{**}^{est}\).

figure b

To validate Algorithm 2, we first approximate \(L_{**}\) by random sampling, a practical approach to approximate Lipschitz-type constants [22]. Specifically, given a minimizer \(\mathbf{X}^*\) and radius r, we generate \(K = 10^6\) vectors \(\{\mathbf{d}_k, k=1,\dots ,K\}\), where each component of \(\mathbf{d}_k\) is a uniformly chosen random number in \([-1,1]\), and normalized so that \(\Vert \mathbf{d}_k \Vert = 1\). Then we compute

$$\begin{aligned} \displaystyle L_{**}^{rand}(n,r) = \max _{k = 1 \dots K} \left\{ \frac{ \Vert (U_n'(\mathbf{X}^*))^{-1} U_n'(\mathbf{X}_k) - I \Vert }{r} \; : \; \genfrac{}{}{0.0pt}0{\mathbf{X}_k = \mathbf{X}^* + r \mathbf{d}_k,}{\Vert \mathbf{d}_k \Vert = 1} \right\} . \end{aligned}$$
(4.7)

Figure 3 tracks the configurations \(\mathbf{X}^{(r)} = {{\,\mathrm{argmax}\,}}\left\{ L_{**}^{rand}(3,r) \right\} \) for increasing radii, ranging from 0.01 to 1.05, incremented by 0.02. Again, we enforce the semi-fixed positions for atoms 1–3 as described earlier. \(\mathbf{X}^*\) is the equilateral triangle labeled by the stars. Each \(\mathbf{X}^{(r)}\) configuration consists of a cyan marker (position of atom 3), a magenta marker (position of atom 2), and the fixed atom 1. As r increases, atom 3 of \(\mathbf{X}^{(r)}\) moves along the cyan path and atom 2 of \(\mathbf{X}^{(r)}\) moves along the magenta path, where for both atoms a darker shade corresponds to a configuration with larger r. Eventually, the two atoms meet at (0.84, 0, 0) when \(r = 1.05\), which coincides with the nearest singular configuration of \(\mathbf{X}^*\). As hypothesized, the observed path of the \(\mathbf{X}^{(r)}\) follows the shortest path as described in Algorithm 2.

Fig. 3.
figure 3

For \(0.01 \le r \le 1.05\), 53 configurations of \(\mathbf{X}^{(r)}\) which maximize \(L_{**}^{rand}(3,r)\). Each \(\mathbf{X}^{(r)}\) configuration consists of one cyan marker (atom 3), one magenta marker (atom 2), and atom 1. Darker markers indicate an \(\mathbf{X}^{(r)}\) from a greater r, while the star markers are the position of the atoms in the optimal configuration. As r increases, the \(\mathbf{X}^{(r)}\) follow the shortest path toward the nearest singular configuration of \(\mathbf{X}^*\).

The tendency for the configurations \(\mathbf{X}^{(r)}\) to track along the shortest path continues for larger systems, illustrated in Fig. 4 for \(U_4\) (left) and \(U_6\) (right). In each plot, 40 configurations of \(\mathbf{X}^{(r)} = {{\,\mathrm{argmax}\,}}\left\{ L_{**}^{rand}(n,r) \right\} \) are displayed, corresponding to \(0 \le r \le 0.8\). As before, any \(\mathbf{X}^{(r)}\) configuration consists of an atom of each color, as well as atom 1, while the star markers indicate the equilibrium positions of the atoms, which in these cases form a regular tetrahedron and a regular octahedron, respectively. With \(U_4\), the position of atom 2 remains near its \(\mathbf{X}^*\) position in all \(\mathbf{X}^{(r)}\), while atoms 3 and 4 perturb toward each other as r increases. For \(U_6\), atoms 2 and 3 remain near their respective \(\mathbf{X}^*\) coordinates in all \(\mathbf{X}^{(r)}\), while either the pair of atoms 4 and 5 or the pair of atoms 5 and 6 perturb toward each other as r increases. Specifically, for any r, the \(\mathbf{X}^{(r)}\) configuration has atoms 5 and 6 perturb toward each other at an approximate distance of r/2 each, while atoms 1 to 4 remain nearly stationary, or, atoms 4 and 5 move together about r/2 and the remaining atoms perturb very little. This alternating behavior occurs as the nearest singular configuration to \(\mathbf{X}^*\) is not unique, since both the 4–5 and 5–6 pair singularities are equidistant from \(\mathbf{X}^*\). Note that once the system is large enough (\(n \ge 6\)), the path followed by \(\mathbf{X}^{(r)}\) toward the nearest singular configuration is linear, since the nearest pairs of atoms each have three free variables.

Fig. 4.
figure 4

For \(n = 4\) (left) and \(n = 6\) (right), 40 configurations of \(\mathbf{X}^{(r)}\) which maximize \(L_{**}^{rand}(n,r)\), for \(0 \le r \le 0.8\). Atoms of the \(\mathbf{X}^{(r)}\) are coordinated by color tone. The perturbed configurations follow the shortest path toward the nearest singular configuration \(\mathbf{X}^{(s)}\), as indicated by the arrows. For \(U_6\), there are two \(\mathbf{X}^{(s)}\), so the \(\mathbf{X}^{(r)}\) follows both paths, alternating by chance.

Now, we compare the two estimations of \(L_{**}\), from the proposed Algorithm 2 and from random sampling due to (4.7). We focus on the cases \(n\ge 6\) whose shortest paths are strictly linear. Taking the global minimum configurations from [5] for each \(\mathbf{X}^*\), in Fig. 5, we compare the behavior of \(L_{**}^{rand}\) and \(L_{**}^{est}\) for \(U_6,U_8\), and \(U_{10}\), separately in each column.Footnote 3 The top row shows for the cases when \(r \le 0.8\), which is the approximate distance to the nearest singularity for these \(\mathbf{X}^*\). The bottom row zooms in to the cases when \(r \le 0.05\). The curve of \(L = 2/r\) is drawn so that an estimate for the radius of convergence can be inferred from the plots by its intersection with the \(L_{**}\) curves. It is clear that \(L_{**}^{est}(n,r) \ge L_{**}^{rand}(n,r)\) for all test cases, and therefore \(L_{**}^{est}(n,r)\) is a more accurate estimate. For brevity, the cases \(n = 7,9\) are not plotted here, but consistent behavior is observed at all radii.

Fig. 5.
figure 5

Comparison of \(L_{**}^{est}\) with \(L_{**}^{rand}\) for \(U_6,U_8,U_{10}\), showing \(L_{**}^{est}\) is a better estimate for \(L_{**}\). The curve of 2/r is included since \(L_{**} \le 2/r\) determines the radius of the ball of convergence.

Next, we compute the corresponding binary digit compression levels for systems of size \(n=6,\dots ,10\), and we test the convergence of the compressed minimizers using Newton’s method. Table 1 catalogs these results. Its first column shows the system size by number of atoms; the second column shows the returned radius of the ball of convergence (to the nearest 0.002) after applying the proposed method, and the third column shows the corresponding compression levels in terms of number of bits. The last column shows that, in practice, Newton’s method still converges to the same \(\mathbf{X}^*\) with even higher compression levels than the theoretical guarantee.

Table 1. For \(n=6,\dots ,10\), the radii of convergence, maximal compression ability based on \(L_{**}^{est}\), and convergence results for the compressed minimizers \(\mathbf{X}^{(0),c}\) using Newton’s method. The theoretical \(c_t\) obtained by our method provides a guarantee for compression so that \(c \ge c_t\) implies that \(\mathbf{X}^{(0),c}\) converges to \(\mathbf{X}^*\) with Newton’s method.

5 Numerical Results in Higher Dimensions

In this section, we apply the proposed method to larger atomic systems to show its scalability. For the optimal configuration chosen, we use the global minimums tabulated in [5] and first published in [7, 11]. Because of the partially fixed positions of atoms 1,2, and 3, we observe that some permutations of a minimizer \(\mathbf{X}^*\) (in terms of interchanging atom numbering) can cause the spectral radius of \((U_n'(\mathbf{X}^*))^{-1}\) to be multiple orders of magnitude larger than others, resulting in significantly larger \(L_{**}(r)\) than other permutations. This observation is particularly true when atom 3 is nearly on the x-axis. Therefore, we choose permutations of these minimizers so that the matrix \((U_n'(\mathbf{X}^*))^{-1}\) has lower spectral radius.

Fig. 6.
figure 6

Left: \(L_{**}^{est}(r)\) values for the \(U_{20}\) to \(U_{140}\) functions. The curve of 2/r is included because \(L_{**} \le 2/r\) determines the radius of the ball of convergence. Right: distances to nearest singularity \(\Vert \mathbf{X}^* - \mathbf{X}^{(s)} \Vert \) for each \(\mathbf{X}^*\). An increase in \(L_{**}(n,r)\) corresponds to a decrease in \(\Vert \mathbf{X}^* - \mathbf{X}^{(s)} \Vert \).

Table 2. Compression ability based on \(L_{**}^{est}\) and corresponding convergence results of Newton’s method for \(U_{20}\) through \(U_{140}\) functions.

For systems with \(n = 20, 40, 60, 80, 100, 120\), and 140 atoms, Fig. 6 (left) shows the shortest-path estimates \(L_{**}^{est}(n,r)\), respectively. For a fixed radius r, the corresponding \(L_{**}^{est}(n,r)\) are increasing in n for \(n \le 80\), but the values plateau for \(80 \le n \le 140\). These changes in \(L_{**}^{est}(n,r)\) correspond to a decrease and leveling off of the distances to nearest singularity \(\Vert \mathbf{X}^* - \mathbf{X}^{(s)} \Vert \) for each \(\mathbf{X}^*\), which are plotted in Fig. 6 (right). It follows from Hoare [6] that the smallest inter-atom distances in any n-atom structure cannot fall far below the equilibrium distance of \(2^{1/6}\), so that these observed distances to singularity have approached a lower bound. Indeed, for the 1000 atom minimizer recorded in [23], the minimal distance to singularity is 0.733. Further, we have \(L_{**}^{est}(1000,0.014) = 135.5\), a corresponding maximum compression level \(c_t = 13\), and Newton’s method succeeds for all \(\mathbf{X}^{(0),c}\) for \(c \ge 9\). As both this minimal distance to singularity and \(L_{**}^{est}\) are consistent with the 80–140 atom cases, this is an indication that our proposed method scales to much larger atom systems.

Now, we further emphasize the accuracy of \(L_{**}^{est}\) by comparing it to the random sampling approximation from (4.7), again with \(K = 10^6\) guesses. The comparison is done at the radius that we deduced is optimal using Algorithm 1 with \(L_{**}^{est}\). As shown in Table 2, the shortest-path estimates are significantly higher than the random sampling estimates in all test cases. The corresponding compression levels deduced from \(L_{**}^{est}\) are reported in the fifth column. We apply Newton’s method on \(\mathbf{X}^{(0),c}\) from all compression levels, where the last column shows that Newton’s method converges to the same \(\mathbf{X}^*\) for compression levels exceeding the theoretical guarantee.

6 Conclusions

In this work, we propose a framework to guide the amount of compression that can be applied to an optimal atom configuration with the Lennard-Jones potential function, so that the compressed value converges to the same optimal configuration under Newton’s method. Furthermore, we develop a shortest-path estimate by exploring the topology of the Lennard-Jones function to provide the necessary information required by the framework. We show that the shortest-path estimate outperforms the estimate from random sampling. We also demonstrate the reliability and stability of the proposed framework for systems of varying size, so that the provided compression level guarantees the preservation of the basins of attraction.

Our proposed framework also can be applied in a wider range of applications. One potential application is the compression of the large neural nets in deep learning. As the trained neural net is moved from a resource-rich setting to different model architectures with restrictive computation constraints such as memory and speed, the large-scale model with billions of trained weights and hyperparameters needs to be compressed. In this case, one can study the compression accuracy given the proposed framework so that certain accuracy can be preserved from the perspective of optimization.

Some gaps between the theory and numerical approximation need to be studied that are beyond the scope of this work. For example, a rigorous proof for the optimality of the proposed shortest-path estimate is desirable, to offer a truly definitive convergence guarantee. If such a result is unavailable, an error bound could be developed that measures the gap between the true \(L_{**}\) and estimate \(L_{**}^{est}\), which is guaranteed to be only a lower bound for \(L_{**}\). A similar error analysis should investigate the effect of using approximations for the inverse Hessian \((U'(\mathbf{X}^*))^{-1}\) and operator norm \(\Vert \cdot \Vert _2\) in the computation of \(L_{**}^{est}\), as in practice either are likely to be precisely computed. Furthermore, one can extend the proposed framework to local convergence theory for quasi-Newton methods (e.g., [4]) to preserve the corresponding basins of attraction.