JCC 23935
JCC 23935
JCC 23935
ORG
This work is focused on the efficient evaluation of the Boys ciency: The first algorithm combines the work of Gill et al. (Int.
function located at the heart of Coulomb and exchange type J. Quantum Chem. 1991, 40, 745) and Kazuhiro Ishida (Int. J.
electron integrals. Different evaluation strategies for individual Quantum Chem. 1996, 59, 209 and following work), amplifying
orders and arguments of the Boys function are used to the benefits of the two strategies. VC 2015 Wiley Periodicals,
e2x X
1
Cðn10:5Þ x i compared with eqs. (6) and (7), this work dropped them as
Fn ð0 x XÞ5 (3)
2 i Cðn1i11:5Þ well and focused on the latter.
Cðn10:5Þ e2x X 1
Cðn10:5Þ x 2i
Fn ðX < xÞ5 2 (4) The Boys Function Kernel Limit Expression
2x n10:5 2x i Cðn2i11:5Þ
This section investigates the limit expressions with respect to
where the capital X denotes an empirical criterion for a the recursion schemes of eqs. (6) and (7), as well as the border
“large x” value. For the combined evaluation scheme of eqs. expression of eq. (2). Values of the exponential function e2x
(3) and (4), Cook[11] proposed the criterion for a “large x” to become very small for larger values of x. Assuming a maxi-
be 10. He reports that eq. (4) shows clear discontinuities for mum deviation of 10214 to the analytical value, as recom-
orders n > 24 at x 10, but already for lower orders, this mended by Helgaker et al.,[5] this implies a maximum for
x 30. For xⲏ30, it can be assumed that the contribution of
work found a small yet existent discontinuity in the transition
the exponential expression can be neglected in the recursions.
from eqs. (3) to (4) (data not shown). In the course of this
Further, simplifying the two schemes leads to the two FLOP
work, this scheme[11] is used as an analytical reference to
reduced expressions for upward and downward recursion of
several methods of grid-based approaches.[3–5] Because of
the Boys function for larger arguments x:
the observed discontinuities, the “small case” eq. (3) is used
for the complete argument range for all orders to secure a xFn ðxÞ
rigorous evaluation, whereas the large case equation is Fn21 ðxⲏ30Þ (10)
n20:5
skipped entirely. ðn10:5ÞFn ðxÞ
The derivatives of the Boys function with respect to the Fn11 ðxⲏ30Þ (11)
x
argument x follow a vertical recurrence relation, that is, feature
a consecutive dependence in the order n[5,12,13]: As a starting point to the upward recursion, F0 ðxÞ may be
evaluated via its border expression [eq. (2)][12]:
oFn ðxÞ pffiffiffi
52Fn11 ðxÞ (5) p
ox F0 ðxⲏ30Þ pffiffiffi (12)
2 x
Based on this, usually only the Boys function Fn of highest Comparing eqs. (10) and (11) shows that both use two opera-
order n is computed, whereas all others are retrieved via tions in each iteration (if the expressions n60:5 are stored as
downward recursion[5,6,11–13]: numbers). The evaluation of F0 ðxⲏ30Þ according to eq. (12) can
be achieved at the
pffiffi
cost of a square root of the argument x and a
2xFn ðxÞ1e2x division, taking 2p as a precomputed constant. For the downward
Fn21 ðxÞ5 (6)
2ðn21Þ11 recursion, Fn ðxÞ would have to be computed at a definitely higher
FLOP cost. This implies that the upward recursion for obtaining
For small values of the argument x, the upward recursion is the limit expression of the Boys function is more efficient than
reported[5] to become unstable as it involves the difference of the downward recursion. Equations (2), (7), and (11) were used to
two almost equal numbers: compute values for Fn ðxÞ for orders n up to 82, investigating at
which arguments x, these expressions match the analytical Boys
ð2n11ÞFn ðxÞ2e2x function down to an accuracy of 10214. This data is presented
Fn11 ðxÞ5 (7)
2x in Table 1. Several findings are of interest: In general, the previous
assumption that an argument value of approximately 30 matches
For large values of x, both recursions have the same the limit expressions was substantiated for angular moments com-
reported[5,6,12] accuracy. Primorac[14] introduced a new expan- monly used in molecular electronic structure theory, but the limit
sion of the Boys function that is, however, not part of this argument converges toward a value of x 44 for n 5 82. The
study as it was reported by Primorac himself to be inaccurate three schemes are found to be equally tight for all orders, which
at higher orders. Other analytical recurrence relations have render the FLOP reduced recursions [eqs. (7) and (11)] to be
been derived by Guseinov and Mamedov[6,12]: more efficient than the original expressions.
" rffiffiffiffiffiffiffiffiffiffi rffiffiffiffiffiffiffiffiffiffi#
1 k 1 1 2x Xk
i21 k2i 1 Grid-Based Evaluation of the Boys Function:
Fn ðxÞ5 Fn2k ðxÞ n1 2 e x n1 (8)
xk 2 2 i51
2 Methods
" rffiffiffiffiffiffiffiffiffiffi# As the computation of Fn is a fundamental step in electron
1 k 1 2x Xk
i21 k2i 1
Fn2k ðxÞ5 qffiffiffiffiffiffiffiffiffiffi x Fn ðxÞ1 e x n1 (9) integral evaluation over Gaussian functions, it presents itself as
k
n1 12 2 i51
2
one of the many bottlenecks linked to computation time and
efficiency. Throughout applied numerical and computational
with k being any integer denoting a recursive offset to n. Con- science, such demands may be reduced using grid-based
sidering the apparent higher FLOP counts of these relations approaches,[15] in which case the effective solutions to the
x n x n x n
Table 1. Values of the argument x at and after which the eqs. (2), (7), Fn ðxÞ f0n 1 f 1 f 1 f (14)
2D 1 2D 2 2D 3
and (11) match the analytical reference value of Fn ðxÞ down to an accu-
racy of £ 10214 .
with x
2D and fi precomputed[3] for all orders n:
n eq. (2) eq. (7) eq. (11)
om Fn ðXj Þ
0 28.22 28.20 28.20 f m5 5ð2Þm Fn1m ðXj Þ (15)
1 28.23 28.19 28.23
ox m
2 28.26 28.26 28.26 kX
1 2m
D D f k12m ðXj Þ
3 28.29 28.29 28.29 ank 5ð22dk0 Þ (16)
4 28.33 28.33 28.33 2 m50
2 m!ðk1m!Þ
5 28.37 28.37 28.37
2 3
6 28.41 28.41 28.41 Xj Xj Xj
f0n 5 an0 2an2 1 an1 23an3 12an2 14an3 (17)
7 28.45 28.45 28.45 D D D
8 28.49 28.49 28.49
2
9 28.53 28.53 28.53 Xj Xj
10 28.58 28.58 28.58 f1n 5 2an1 26an3 18an2 124an3 (18)
D D
11 28.62 28.62 28.62
12 28.67 28.67 28.67 Xj
13 28.72 28.72 28.72 f2n 58an2 148an3 (19)
14 28.78 28.78 28.78
D
15 28.83 28.83 28.83 f3n 532an3 (20)
16 28.89 28.89 28.89
17 28.95 28.95 28.95
18 29.01 29.01 29.01 An analog scheme was developed for the evaluation of the
19 29.08 29.08 29.08 exponential function.[3]
20 29.15 29.15 29.15 Ishida[4] introduced similar expansions for both the evalua-
21 29.22 29.22 29.22
... ... ... ...
tion of the Boys function and exponential expressions in gen-
80 43.14 43.14 43.14 eral, competing the previous work of Gill et al.[3]:
81 43.47 43.47 43.47
82 43.80 43.80 43.80 Fn ðzÞ an0 2z an1 2z an2 2zan3 (21)
e2z b0 2z ðb1 2z ðb2 2zb3 ÞÞ (22)
kernel function are precomputed to a high level of accuracy
with respect to properly chosen grid point values, up to a cer- with the ai and bi coefficients obtained via:
tain argument cutoff. Cutoff and accuracy have to be chosen 1 1
carefully so that the interpolation scheme applied afterwards an0 5F0 ðzj Þ1zj Fn11 ðzj Þ1 zj2 Fn12 ðzj Þ1 zj3 Fn13 ðzj Þ (23)
2 6
is able to reproduce the desired accuracy demanded of the n 1 2
a1 5Fn11 ðzj Þ1zj Fn12 ðzj Þ1 zj Fn13 ðzj Þ (24)
implementation. In case of electron integral evaluation, the 2
absolute difference between the interpolated and numerically 1 1
exact value has to be as low as possible to provide an exact an2 5 Fn12 ðzj Þ1 zj Fn13 ðzj Þ (25)
2 2
basis for evaluation. For the evaluation of the Boys function, a 1
precision of the order 10214 is recommended.[5] This section an3 5 Fn13 ðzj Þ (26)
6
will discuss the previous work of Gill et al.[3] and Ishida[4] as
1 1
well as two general schemes of grid point interpolation to b0 5e2zj 11zj 1 zj2 1 zj3 (27)
mention in this context: an n-term terminated Taylor series 2 6
expansion[5] and a Newton-like interpolation scheme, based 1
b1 5e2zj 11zj 1 zj2 (28)
on Neville’s algorithm.[15,16] The later reformulation of Ishida[17] 2
prefers accuracy over performance, and is, therefore, not
1
treated in this section. b2 5 e2zj 11zj (29)
2
Gill et al.[3] relied on a modified cubic interpolation scheme
using an m-term Chebyshev polynomial expansion. They 1
b3 5 e2zj (30)
defined a grid step D as a function of a rigorous interpolation 6
error tolerance : where Ishida [4] 24
x used a grid step of 2D55310 [3]
and the indi-
m 1 ces j5int 2D , in accordance to Gill et al. The major differ-
2 ðm11Þ!‹ ðm11Þ ence in the two schemes was found to be the construction of
D5 (13)
maxjf m11 ðxÞj the cubic interpolation coefficients [eq. (15)ff and eq. (23)ff ].
Both algorithms use six FLOPs and are equal in their effi-
Using this step, a set of grid points Xj 5ð2j11ÞD is defined
x ciency. Ishida[4] reported his scheme to be superior to the Che-
with indices j5int 2D , respectively. f is the function to expand byshev expansion proposed by Gill et al.[3] in terms of overall
and fm denotes the mth derivative with respect to the argu- FLOP counts, that is, taking into account all FLOP counts nec-
ment. Considering eq. (5), the Boys function of given order n essary for the construction of ½0m integrals, as outlined in
and argument x is then approximated in only six FLOPs: Table A.1 of the referenced work.
1 1
1 ðx2xgrid Þ2 Fn12 ðxgrid Þ2 ðx2xgrid Þ3 Fn13 ðxgrid Þ (33)
2 6 Grid-Based Evaluation of the Boys Function:
1 1
1 ðx2xgrid Þ4 Fn14 ðxgrid Þ2 ðx2xgrid Þ5 Fn15 ðxgrid Þ Construction
24 120
As mentioned in the previous section, the construction of the
Storing ðx2xgrid Þ as intermediate and optimizing the equa- interpolation grid is crucial. A grid step has to be chosen that is
tion with respect to multiplications by this subexpression [i.e., small enough to reproduce the value of the Boys function in the
similar to eqs. (14) and (21)] eventually results in a maximum full range of the evaluation, that is, arguments up to the limit,
of 14 FLOPs, if all of the six terms are used. If in addition, the conversely, the number of arising grid points has to be small
fractions are tabulated as part of the grid data, this reduces to enough to be stored in conventional RAM. To validate a con-
10 FLOPs for 6 terms, 8 FLOPs for 5 terms, and so on. Any- structed grid in combination with an interpolation method of
way, even if only a five term series is used and the fractions choice, this work uses a general validation scheme: The evalua-
are stored as part of the grid data, the demanded eight FLOPs tion method for “small arguments” [eq. (3)] proposed by Cook,[11]
are still more than the six FLOPs reported by the previously as discussed in the previous, was used to construct testing grids
discussed schemes.[3,4] A series of four terms and less appears in the argument range x5½0; 30 (after which the Boys function
to be the only competitive choice. Gill et al.[3] pointed out limit was found to be rigorous) and these values are assumed by
this work to be exact solutions. The interpolation method to be
that although this scheme was found to be very accurate near
affirmed is chosen and its performance is verified in the range x
the middle of the interpolation interval, it performs bad for
5½0; 30 with a screening step in x of 1026.
small grid steps at the endpoints. This will be closer investi-
Using the analytical expression of eq. (3), the proposed
gated in the following. It has to be considered that this
interpolation grids of Ishida[4] and Gill et al.[3] were con-
scheme demands an overhead in the grid computation of
structed up to the argument x 5 30, taking Ishida’s proposal of
Fn ðxÞ, as the Taylor series uses up to m more grid points, a grid step of 2D5531024 for the cause of a direct compari-
where m is the order of the expansion. This means that for son of both schemes. A direct comparison revealed that the
the rigorous evaluation of the value Fn ðxÞ with an m-term Tay- overall error magnitude is to be considered equal. The algo-
lor series, grid points up to Fn1m ðxÞ have to be present in the rithm proposed by Gill et al.[3] shows slightly lower errors for
grid. certain orders but not for others. This appears to be quite ran-
A different route may be taken via a Newton-like polynomial dom, not having any obvious tendency.
interpolation scheme based on Neville’s algorithm,[15,16] which The Taylor series of eq. (31) was used to select the best ratio
has been reported[15] to be the most efficient algorithm of this of number of Taylor series terms and grid step size. A grid
kind. Whereas the Taylor series uses a vertical interpolation step of 531025 in combination with a three-term Taylor series
between consecutive orders of the Boys function, the Neville expansion is sufficient to rigorously evaluate the Boys function
interpolation is a horizontal recursion in the argument x only. values for all orders up to 82. This comes at the cost of a
Fn ðxÞ is fitted by a Newton polynomial of a chosen degree m dense grid and five FLOPs per operating plus the grid step
that is part of a two dimensional grid. The polynomial is built construction which is one FLOP below the reported count of
recursively, terminating at the final polynomial expression p1;m Gill et al.[3] Using a grid step of 531026, the deviations are
approximating Fn ðxÞ: found to be below the errors of the cubic interpolation of Gill
Figure 3. Flowcharts of the two methods of Gill et al.[3] and Ishida[4] (not Ishida[17]) for the computation of Boys functions and auxiliary integrals used in
electron integral evaluation.
FLOPs, but as mentioned in the previous section, caution has the used grid size, but dependent on the maximum order. This
to be taken if using the recursions directly to obtain either is not the case for the Taylor series used by this algorithm.
lower or higher orders of the Boys function. As only the maxi- Here, the FLOP counts directly depend on the used grid size
mum deviation to the previous recursion element is main- down to a minimum of 3. The two special cases L 5 0 and
tained and not the maximum deviation to the analytical value, L 5 1 are again treated individually. For others the algorithm
the source codes of both algorithms use empirical select case continues with a similar select case structure as algorithm I
structures to get rid of offending intermediates. This results in that is slightly modified at some points to maintain accuracy.
individual FLOP counts for each special case. The exponential
function used in both recursions is either assembled via the
Comparing Both Algorithms
method of Ishida[4,17] or via the approach of Schraudolph[18] in
the argument range where this method was shown to be Table 3 displays the individual FLOP counts of the original algorithm
accurate (see Fig. 1). Both algorithms use the half of the expo- of Gill et al.,[3] and both algorithms derived in this work (I/II) for the
nential that is either parametrized directly in the cubic interpo- angular moment set L5½0; 21, commonly used in todays quantum
lation coefficients (six FLOPs) or comes at the cost of an chemical code, for the argument x 5 0. As the first scheme is a com-
additional division after the Schraudolph[18] approach (three bination of the methods of Gill et al.[3] and Ishida,[4] it obviously dis-
FLOPs). For the limit expression, Gill et al.[3] use an asymptotic plays the same FLOP count. The major improvement here lies in a
series in case the argument is higher than a critical cutoff. This faster construction of the grid via the coefficients provided by Ish-
corresponds to the border expression of (2) also used here. ida[4,17] and an approximation to the exponential function via the
method of Schraudolph[18] for arguments higher than a cutoff. The
second algorithm uses a grid-based three-term Taylor series expan-
FLOP Reduction in a Second Algorithm
sion of the Boys function, and is found to be superior by one FLOP
Using the three-term Taylor series with a grid step of D55:03 for all orders, except for L 5 1 in which case three FLOPs are saved.
1025 results in a similar, yet FLOP reduced algorithm (see Fig. This corresponds to two-electron repulsion integrals over three s-
4), at the cost of an additional step after the grid index deter- type and one p-type orbital. The limit expression is the same for all
mination, as the difference xD 5 x2xg is used in the Taylor methods. A benchmark in terms of total FLOP counts is displayed in
series. This additional FLOP is regained, however, when com- Table 4. For pure s-type integrals (L 5 0), the saving is at about 14%
puting F0 in only four FLOPs. This saving of four FLOPs with and for sp-type integrals (L 5 1), it is about 19%. These savings arise
respect to the other methods is repeated for the evaluation of from the special treatment of the first two moments. As both algo-
F1, respectively. By now, this is three FLOPs more efficient at rithms require the same [3L 1 1] FLOPs for the incremental recur-
the cost of a larger grid and an associated higher memory sion scheme, it is obvious that the FLOP counts become almost
demand. It was found by this work that the FLOP counts in equal for higher moments, if the two first moments are not treated
the methods of Gill et al.[3] and Ishida[4] are independent of explicitly. The total saving, therefore, reduces to only about 0.7% for
Figure 4. Flowcharts of the two algorithms derived by this work for the computation of Boys functions. “1S” denotes that a square root is required. Total
FLOP counts for distinct orders are listed in Table 3, compared with the method of Gill et al.[3]
L5½0; 21 (see Table 3). The average FLOP savings of the second integral. The work of Gill et al.[3] and Ishida[4,17] have been
algorithm with respect to the first is at about 2% for L > 1. The abso- compared with two methods of polynomial interpolation in
lute accuracy, that is, a maximum deviation of 10214 to the analyti- terms of both accuracy and efficiency and were found to be
cal solution, is maintained for all moments and arguments. the most efficient types of evaluation. Both methods rely on a
Chebyshev approximation of the Boys function, where Ish-
Conclusions ida[4,17] provided a set of coefficients that can be constructed
more efficiently than the coefficients of Gill et al.[3] After the
The goal of this work was the construction of an efficient and grid construction, the scheme of Ishida[4,17] continues along
accurate scheme for the evaluation of the Boys function kernel the method of Gill et al.[3] Combining the two methods comes
intuitively and was realized in the algorithm of this work, treat-
Table 3. FLOP count for the construction of all fFm £ L ðxÞg with the origi-
ing several orders and argument ranges individually.
nal algorithm of Gill et al.[3] and the algorithms proposed by this work, To achieve additional FLOP savings with respect to the
for angular moments commonly used in todays quantum chemical code. already highly efficient method of Chebyshev approximation, it
felt necessary to aim for a completely different approach. A
L Gill et al.[3] I II Limit
grid-based Taylor series expansion of the Boys function was
0 7 7 6 1
1 13 13 10 3
2 19 19 18 5
3 22 22 21 7
Table 4. Total FLOP counts of the two proposed algorithms, validated via
4 25 25 24 9
an equidistant grid of n points in the argument range x5½0; 50, and
5 28 28 27 11
sampled over the full angular moment ranges L5[a, b].
6 31 31 30 13
7 34 34 33 15
L n FLOPs I FLOPs II Savings
8 37 37 36 17
9 40 40 39 19 [0,0] 5,000,000 19,754,000 16,932,000 14.29%
10 43 43 42 21 [0,1] 5,000,000 45,407,000 36,938,000 18.65%
11 46 46 45 23 [0,5] 5,000,000 122,696,008 119,859,007 2.31%
12 49 49 48 25 [0,9] 5,000,000 191,412,004 188,559,003 1.49%
13 52 52 51 27 [0,13] 5,000,000 260,000,000 257,128,000 1.10%
14 55 55 54 29 [0,17] 5,000,000 328,420,000 325,525,000 0.88%
15 58 58 57 31 [0,21] 5,000,000 396,624,000 393,702,000 0.74%
16 61 61 60 33 [0,82] 25,000 6,882,166 6,746,448 1.97%
17 64 64 63 35 [0,82] 50,000 13,764,341 13,492,898 1.97%
18 67 67 66 37 [0,82] 250,000 68,821,741 67,464,498 1.97%
19 70 70 69 39 [0,82] 500,000 137,643,491 134,928,998 1.97%
20 73 73 72 41 [0,82] 2,500,000 688,217,491 674,644,998 1.97%
21 76 76 75 43 [0,82] 5,000,000 1,376,434,991 1,349,289,998 1.97%
found to be an adequate alternative, coming at the cost of an [1] S. F. Boys, Proc. R. Soc. Am. 1950, 200, 542.
[2] T. P. Hamilton, H. F. Schaefer, Chem. Phys. 1991, 150, 163.
associated overhead in the grid size and the employment of
[3] P. M. W. Gill, B. G. Johnson, J. A. Pople, Int. J. Quantum Chem. 1991, 40, 745.
additional grid arrays. The original concerns of Gill et al.[3] [4] K. Ishida, Int. J. Quantum Chem. 1996, 59, 209.
about possible memory issues, however, are not of any con- [5] T. Helgaker, P. Jørgensen, J. Olsen, Molecular Electronic-Structure
cern nowadays, facing constantly increasing capabilities of Theory; Wiley: Chichester, England, 2004. ISBN 0-471-96755-6.
[6] I. I. Guseinov, B. A. Mamedov, J. Math. Chem. 2006, 40, 179.
computer hardware. The major savings of this strategy arise [7] I. Shavitt, Methods in Computational Physics, Vol. 2; Academic: New
from the evaluation of F1 ðxÞ integrals that save three FLOPs York, 1963; p. 1.
each with respect to the original method of Gill et al.[3] (see [8] V. R. Saunders, Computational Techniques in Quantum Chemistry and
Molecular Physics; Reidel: Dordrecht, 1975; p. 347.
Table 3). For all other methods, at least one FLOP is saved, [9] V. R. Saunders, In Methods in Computational Molecular Physics; G. H.
but these small savings accumulate. Direct benchmarks of F. Diercksen, S. Wilson, Eds.; D. Reidel Publishing Company: Dordrecht,
both algorithms display an average FLOP saving of about 2% Holland, 1983.
[10] €
M. Kara, A. Nalçaci, T. Ozdogan, Int. J. Phys. Sci. 2010, 5, 1939.
with respect to our implementation of the Chebyshev approxi-
[11] D. B. Cook, Handbook of Computational Quantum Chemistry; Dover
mation for orders L > 1. As the first two moments L 5 0 and L Publications: Mineola, New York, 2005. ISBN 0-486-44307-8.
5½0; 1 are treated explicitly, FLOP savings up to 19% are pos- [12] B. A. Mamedov, J. Math. Chem. 2004, 36, 301.
sible, if using the Taylor series for these orders. This corre- [13] L. E. McMurchie, E. R. Davidson, J. Comput. Chem. 1978, 26, 218.
[14] M. Primorac, Int. J. Quantum Chem. 1997, 68, 305.
sponds to ðssjssÞ and ðpsjssÞ type two-electron repulsion [15] G. Em Karniadakis, R. M. Kirby, II, Parallel Scientific Computing in C11
integrals. As these types of integrals form the basic auxiliary and MPI; Cambridge University Press: Cambridge, New York, 2008.
integrals in recursion schemes, the knowledge gained by our ISBN 978-0-521–52080-5.
[16] W. H. Press, S. A. Teukolsky, W. T. Vetterling, B. P. Flannery, In Numerical
study is hopefully useful for future work. Recipes—The Art of Scientific Computing, 3rd ed.; Cambridge Univer-
sity Press: Cambridge, New York, 2007. ISBN 978-0-521-88068-8.
Keywords: integrals • Boys functions • quantum chemistry [17] K. Ishida, J. Chem. Phys. 2000, 113, 7818.
[18] N. N. Schraudolph, Neural Comput. 1999, 11, 853.
How to cite this article: A. K. H. Weiss, C. Ochsenfeld J. Comput. Received: 4 October 2014
Chem. 2015, 36, 1390–1398. DOI: 10.1002/jcc.23935 Revised: 7 March 2015
Accepted: 12 March 2015
Published online on 13 May 2015