JCC 23935

FULL PAPER WWW.C-CHEM.
ORG
A Rigorous and Optimized Strategy for the Evaluation of

the Boys Function Kernel in Molecular Electronic Structure
Theory
Alexander K. H. Weiss and Christian Ochsenfeld*
This work is focused on the efficient evaluation of the Boys ciency: The first algorithm combines the work of Gill et al. (Int.
function located at the heart of Coulomb and exchange type J. Quantum Chem. 1991, 40, 745) and Kazuhiro Ishida (Int. J.
electron integrals. Different evaluation strategies for individual Quantum Chem. 1996, 59, 209 and following work), amplifying
orders and arguments of the Boys function are used to the benefits of the two strategies. VC 2015 Wiley Periodicals,
achieve a minimal number of floating-point operations. Based Inc.

on previous work of other groups, two similar algorithms are
derived that are compared based on both accuracy and effi- DOI: 10.1002/jcc.23935
Introduction reduced evaluation of the Boys function while maintaining the

final self consistent field (SCF) energy down to pico-Hartree
Maybe the most fundamental kernel routine for the efficient accuracy. This corresponds to a maximum deviation of
evaluation of Coulomb and exchange type electron integrals in 10214 to the analytical value, as recommended by Helgaker
molecular electronic structure theory is the Boys kernel.[1] Its et al.[5] We, therefore, use the terms “relative” and “absolute
fast evaluation is an important factor for the total integral eval- accuracy” to address absolute maximum deviations with
uation time and, as Hamilton and Schaefer pointed out already respect to any relative reference and to the “real” analytical
in 1991,[2] the limiting factor in the formation of used auxiliary value, respectively.
s-type integrals. Based on previous work of other groups,
mainly work of Gill et al.,[3] Ishida,[4] and common methods dis- The Boys Function Kernel Integral
cussed by Helgaker et al.,[5] this work develops an optimized
grid-based evaluation of the kernel function for orders up to 82 The Boys integral (Boys[1] 1950) is related to the complete and
and any argument. The moment of 82 corresponds to the eval- incomplete Euler Gamma functions C/c[5,6]:
uation of the second derivative of a two-electron repulsion inte- ð1
2 Cðn10:5Þ2Cðn10:5; xÞ gðn10:5; xÞ
gral over four z-type orbitals (L 5 20), that is, the highest Fn ðxÞ5 t 2n e2xt dt5 5
0 2x n10:5 2x n10:5
angular moment yet defined for quantum chemistry.
(1)
To derive a strategy for the rigorous evaluation of the Boys
function for arbitrary arguments and orders, the mathematical with given order n and argument x. Shavitt[7] and Saunders[8,9]
nature and behavior of this function has to be investigated discussed the analytical evaluation of this integral and Kara
closely. For this reason, the first part of this work is split into et al.[10] briefly discussed its efficient calculation in terms of
individual sections, each of which addresses one distinct partial integration. The integral is bordered by[3,5]:
aspect to be considered in the overall picture. The second
rffiffiffiffiffiffiffiffiffiffiffi
part of this work will then gather all the information gained ð2n21Þ!! p
by the first part, to construct the two proposed algorithms. In Fn ðxÞ (2)
2n11 x 2n11
the scope of this work, we consider the function argument to
be given and do not treat its evaluation. Ishida[4] in particular This expression [eq. (2)] corresponds to eq. (20) in Gill
proposed a highly efficient scheme for this initial step. et al.[3] and their asymptotic solution.
After a general introduction, the limit expression of the Boys A numerical scheme for the evaluation of the Boys function
function is outlined, followed by notes on grid based evalua- Fn ðxÞ for orders n and arguments x was provided by Cook[11]:
tion strategies and recurrence relations. The individual meth-
ods are compared, and an optimized evaluation strategy is A. K. H. Weiss, C. Ochsenfeld
Department of Chemistry (Theoretical Chemistry), LMU Munich, Butenandtstr.
derived.
7(C), D-81377 Munich, Germany
To prevent possible misunderstandings concerning the E-mail: christian.ochsenfeld@cup.uni-muenchen.de
terms “relative” and “absolute accuracy,” it has to be stressed Contract grant sponsor: Alexander von Humboldt Foundation, Germany
that this work aims at a floating point operations (FLOPs) C 2015 Wiley Periodicals, Inc.
V
1390 Journal of Computational Chemistry 2015, 36, 1390–1398 WWW.CHEMISTRYVIEWS.COM

WWW.C-CHEM.ORG FULL PAPER
e2x X
1
Cðn10:5Þ x i compared with eqs. (6) and (7), this work dropped them as
Fn ð0 x XÞ5 (3)
2 i Cðn1i11:5Þ well and focused on the latter.
Cðn10:5Þ e2x X 1
Cðn10:5Þ x 2i
Fn ðX < xÞ5 2 (4) The Boys Function Kernel Limit Expression
2x n10:5 2x i Cðn2i11:5Þ
This section investigates the limit expressions with respect to
where the capital X denotes an empirical criterion for a the recursion schemes of eqs. (6) and (7), as well as the border
“large x” value. For the combined evaluation scheme of eqs. expression of eq. (2). Values of the exponential function e2x
(3) and (4), Cook[11] proposed the criterion for a “large x” to become very small for larger values of x. Assuming a maxi-
be 10. He reports that eq. (4) shows clear discontinuities for mum deviation of 10214 to the analytical value, as recom-
orders n > 24 at x 10, but already for lower orders, this mended by Helgaker et al.,[5] this implies a maximum for
x 30. For xⲏ30, it can be assumed that the contribution of
work found a small yet existent discontinuity in the transition
the exponential expression can be neglected in the recursions.
from eqs. (3) to (4) (data not shown). In the course of this
Further, simplifying the two schemes leads to the two FLOP
work, this scheme[11] is used as an analytical reference to
reduced expressions for upward and downward recursion of
several methods of grid-based approaches.[3–5] Because of
the Boys function for larger arguments x:
the observed discontinuities, the “small case” eq. (3) is used
for the complete argument range for all orders to secure a xFn ðxÞ
rigorous evaluation, whereas the large case equation is Fn21 ðxⲏ30Þ (10)
n20:5
skipped entirely. ðn10:5ÞFn ðxÞ
The derivatives of the Boys function with respect to the Fn11 ðxⲏ30Þ (11)
x
argument x follow a vertical recurrence relation, that is, feature
a consecutive dependence in the order n[5,12,13]: As a starting point to the upward recursion, F0 ðxÞ may be
evaluated via its border expression [eq. (2)][12]:
oFn ðxÞ pffiffiffi
52Fn11 ðxÞ (5) p
ox F0 ðxⲏ30Þ pffiffiffi (12)
2 x
Based on this, usually only the Boys function Fn of highest Comparing eqs. (10) and (11) shows that both use two opera-
order n is computed, whereas all others are retrieved via tions in each iteration (if the expressions n60:5 are stored as
downward recursion[5,6,11–13]: numbers). The evaluation of F0 ðxⲏ30Þ according to eq. (12) can
be achieved at the
pffiffi
cost of a square root of the argument x and a
2xFn ðxÞ1e2x division, taking 2p as a precomputed constant. For the downward
Fn21 ðxÞ5 (6)
2ðn21Þ11 recursion, Fn ðxÞ would have to be computed at a definitely higher
FLOP cost. This implies that the upward recursion for obtaining
For small values of the argument x, the upward recursion is the limit expression of the Boys function is more efficient than
reported[5] to become unstable as it involves the difference of the downward recursion. Equations (2), (7), and (11) were used to
two almost equal numbers: compute values for Fn ðxÞ for orders n up to 82, investigating at
which arguments x, these expressions match the analytical Boys
ð2n11ÞFn ðxÞ2e2x function down to an accuracy of 10214. This data is presented
Fn11 ðxÞ5 (7)
2x in Table 1. Several findings are of interest: In general, the previous
assumption that an argument value of approximately 30 matches
For large values of x, both recursions have the same the limit expressions was substantiated for angular moments com-
reported[5,6,12] accuracy. Primorac[14] introduced a new expan- monly used in molecular electronic structure theory, but the limit
sion of the Boys function that is, however, not part of this argument converges toward a value of x 44 for n 5 82. The
study as it was reported by Primorac himself to be inaccurate three schemes are found to be equally tight for all orders, which
at higher orders. Other analytical recurrence relations have render the FLOP reduced recursions [eqs. (7) and (11)] to be
been derived by Guseinov and Mamedov[6,12]: more efficient than the original expressions.
" rffiffiffiffiffiffiffiffiffiffi rffiffiffiffiffiffiffiffiffiffi#
1 k 1 1 2x Xk
i21 k2i 1 Grid-Based Evaluation of the Boys Function:
Fn ðxÞ5 Fn2k ðxÞ n1 2 e x n1 (8)
xk 2 2 i51
2 Methods
" rffiffiffiffiffiffiffiffiffiffi# As the computation of Fn is a fundamental step in electron
1 k 1 2x Xk
i21 k2i 1
Fn2k ðxÞ5 qffiffiffiffiffiffiffiffiffiffi x Fn ðxÞ1 e x n1 (9) integral evaluation over Gaussian functions, it presents itself as
k
n1 12 2 i51
2
one of the many bottlenecks linked to computation time and
efficiency. Throughout applied numerical and computational
with k being any integer denoting a recursive offset to n. Con- science, such demands may be reduced using grid-based
sidering the apparent higher FLOP counts of these relations approaches,[15] in which case the effective solutions to the
Journal of Computational Chemistry 2015, 36, 1390–1398 1391

FULL PAPER WWW.C-CHEM.ORG
x n x n x n
Table 1. Values of the argument x at and after which the eqs. (2), (7), Fn ðxÞ f0n 1 f 1 f 1 f (14)
2D 1 2D 2 2D 3
and (11) match the analytical reference value of Fn ðxÞ down to an accu-
racy of £ 10214 .
with x
2D and fi precomputed[3] for all orders n:
n eq. (2) eq. (7) eq. (11)
om Fn ðXj Þ
0 28.22 28.20 28.20 f m5 5ð2Þm Fn1m ðXj Þ (15)
1 28.23 28.19 28.23
ox m
2 28.26 28.26 28.26 kX
1 2m
D D f k12m ðXj Þ
3 28.29 28.29 28.29 ank 5ð22dk0 Þ (16)
4 28.33 28.33 28.33 2 m50
2 m!ðk1m!Þ
5 28.37 28.37 28.37
2 3
6 28.41 28.41 28.41 Xj Xj Xj
f0n 5 an0 2an2 1 an1 23an3 12an2 14an3 (17)
7 28.45 28.45 28.45 D D D
8 28.49 28.49 28.49
2
9 28.53 28.53 28.53 Xj Xj
10 28.58 28.58 28.58 f1n 5 2an1 26an3 18an2 124an3 (18)
D D
11 28.62 28.62 28.62
12 28.67 28.67 28.67 Xj
13 28.72 28.72 28.72 f2n 58an2 148an3 (19)
14 28.78 28.78 28.78
D
15 28.83 28.83 28.83 f3n 532an3 (20)
16 28.89 28.89 28.89
17 28.95 28.95 28.95
18 29.01 29.01 29.01 An analog scheme was developed for the evaluation of the
19 29.08 29.08 29.08 exponential function.[3]
20 29.15 29.15 29.15 Ishida[4] introduced similar expansions for both the evalua-
21 29.22 29.22 29.22
... ... ... ...
tion of the Boys function and exponential expressions in gen-
80 43.14 43.14 43.14 eral, competing the previous work of Gill et al.[3]:
81 43.47 43.47 43.47

82 43.80 43.80 43.80 Fn ðzÞ an0 2z an1 2z an2 2zan3 (21)
e2z b0 2z ðb1 2z ðb2 2zb3 ÞÞ (22)
kernel function are precomputed to a high level of accuracy
with respect to properly chosen grid point values, up to a cer- with the ai and bi coefficients obtained via:
tain argument cutoff. Cutoff and accuracy have to be chosen 1 1
carefully so that the interpolation scheme applied afterwards an0 5F0 ðzj Þ1zj Fn11 ðzj Þ1 zj2 Fn12 ðzj Þ1 zj3 Fn13 ðzj Þ (23)
2 6
is able to reproduce the desired accuracy demanded of the n 1 2
a1 5Fn11 ðzj Þ1zj Fn12 ðzj Þ1 zj Fn13 ðzj Þ (24)
implementation. In case of electron integral evaluation, the 2
absolute difference between the interpolated and numerically 1 1
exact value has to be as low as possible to provide an exact an2 5 Fn12 ðzj Þ1 zj Fn13 ðzj Þ (25)
2 2
basis for evaluation. For the evaluation of the Boys function, a 1
precision of the order 10214 is recommended.[5] This section an3 5 Fn13 ðzj Þ (26)
6
will discuss the previous work of Gill et al.[3] and Ishida[4] as
1 1
well as two general schemes of grid point interpolation to b0 5e2zj 11zj 1 zj2 1 zj3 (27)
mention in this context: an n-term terminated Taylor series 2 6
expansion[5] and a Newton-like interpolation scheme, based 1
b1 5e2zj 11zj 1 zj2 (28)
on Neville’s algorithm.[15,16] The later reformulation of Ishida[17] 2
prefers accuracy over performance, and is, therefore, not
1
treated in this section. b2 5 e2zj 11zj (29)
2
Gill et al.[3] relied on a modified cubic interpolation scheme
using an m-term Chebyshev polynomial expansion. They 1
b3 5 e2zj (30)
defined a grid step D as a function of a rigorous interpolation 6
error tolerance : where Ishida [4] 24
x used a grid step of 2D55310 [3]
and the indi-
m 1 ces j5int 2D , in accordance to Gill et al. The major differ-
2 ðm11Þ!‹ ðm11Þ ence in the two schemes was found to be the construction of
D5 (13)
maxjf m11 ðxÞj the cubic interpolation coefficients [eq. (15)ff and eq. (23)ff ].
Both algorithms use six FLOPs and are equal in their effi-
Using this step, a set of grid points Xj 5ð2j11ÞD is defined
x ciency. Ishida[4] reported his scheme to be superior to the Che-
with indices j5int 2D , respectively. f is the function to expand byshev expansion proposed by Gill et al.[3] in terms of overall
and fm denotes the mth derivative with respect to the argu- FLOP counts, that is, taking into account all FLOP counts nec-
ment. Considering eq. (5), the Boys function of given order n essary for the construction of ½0m integrals, as outlined in
and argument x is then approximated in only six FLOPs: Table A.1 of the referenced work.

An intuitive way to expand any continuous function around Fn ðxÞ p1;m

a stationary point x0 is to use a Taylor series expansion[5]: k5intðx=xstep Þ
fxi g5fxk2floorðm21
2 Þ
; :::; xk ; :::; xk1ceilðm21
2 Þ
g
X 1
ðx2x0 Þn on f ðx0 Þ (34)
f ðxÞ5 (31) pi;i ðxÞ5yi 5Fn ðxi Þ
n50
n! ox0n
ðxj 2xÞpi;j21 ðxÞ2ðxi 2xÞpi11;j ðxÞ
pi;j ðxÞ5
The Taylor series expansion of the Boys function of order n xj 2xi
and argument x may be computed down to a controllable
where xk is the kth grid point in x and xi (i51; :::; m) denotes the
accuracy, using a sum-term threshold criterion with respect to
set of grid points in the range m21
2 before and after xk (properly
a grid point expansion[5]:
using integer arithmetic to support even degrees). In the begin-
X
1 ning and end of the interpolation, this set has to be adapted to
Fn1k ðx2D xÞð2D xÞk
Fn ðxÞ5 (32) fxik50 g5fx0 ; :::; xm21 g and fxik5max g5fxmax2m21 ; :::; xmax g,
k50
k!
respectively. Storing ðxi=j 2xÞ as intermediate expressions, this
scheme uses five FLOPs in each iteration where i and j differ,
with D x5x2xgrid and xgrid is the nearest point to x in the
and zero FLOPs (i.e., assignments) for each step where i equals j.
grid. Alternatively, a strictly term-terminated sum expression
The FLOP counts are found to be 7 for degree 2, 18 for degree
may be used instead. For example, a Taylor series expansion
3, 34 for degree 4, and so on. This already renders the Neville
of Fn ðxÞ around a grid point xgrid with six explicit terms is
interpolation attempt to be less efficient than all previously dis-
given as:
cussed, however, also this method will be closer investigated in
Fn ðxÞ Fn ðxgrid Þ2ðx2xgrid ÞFn11 ðxgrid Þ the following section.
1 1
1 ðx2xgrid Þ2 Fn12 ðxgrid Þ2 ðx2xgrid Þ3 Fn13 ðxgrid Þ (33)
2 6 Grid-Based Evaluation of the Boys Function:
1 1
1 ðx2xgrid Þ4 Fn14 ðxgrid Þ2 ðx2xgrid Þ5 Fn15 ðxgrid Þ Construction
24 120
As mentioned in the previous section, the construction of the
Storing ðx2xgrid Þ as intermediate and optimizing the equa- interpolation grid is crucial. A grid step has to be chosen that is
tion with respect to multiplications by this subexpression [i.e., small enough to reproduce the value of the Boys function in the
similar to eqs. (14) and (21)] eventually results in a maximum full range of the evaluation, that is, arguments up to the limit,
of 14 FLOPs, if all of the six terms are used. If in addition, the conversely, the number of arising grid points has to be small
fractions are tabulated as part of the grid data, this reduces to enough to be stored in conventional RAM. To validate a con-
10 FLOPs for 6 terms, 8 FLOPs for 5 terms, and so on. Any- structed grid in combination with an interpolation method of
way, even if only a five term series is used and the fractions choice, this work uses a general validation scheme: The evalua-
are stored as part of the grid data, the demanded eight FLOPs tion method for “small arguments” [eq. (3)] proposed by Cook,[11]
are still more than the six FLOPs reported by the previously as discussed in the previous, was used to construct testing grids
discussed schemes.[3,4] A series of four terms and less appears in the argument range x5½0; 30 (after which the Boys function
to be the only competitive choice. Gill et al.[3] pointed out limit was found to be rigorous) and these values are assumed by
this work to be exact solutions. The interpolation method to be
that although this scheme was found to be very accurate near
affirmed is chosen and its performance is verified in the range x
the middle of the interpolation interval, it performs bad for
5½0; 30 with a screening step in x of 1026.
small grid steps at the endpoints. This will be closer investi-
Using the analytical expression of eq. (3), the proposed
gated in the following. It has to be considered that this
interpolation grids of Ishida[4] and Gill et al.[3] were con-
scheme demands an overhead in the grid computation of
structed up to the argument x 5 30, taking Ishida’s proposal of
Fn ðxÞ, as the Taylor series uses up to m more grid points, a grid step of 2D5531024 for the cause of a direct compari-
where m is the order of the expansion. This means that for son of both schemes. A direct comparison revealed that the
the rigorous evaluation of the value Fn ðxÞ with an m-term Tay- overall error magnitude is to be considered equal. The algo-
lor series, grid points up to Fn1m ðxÞ have to be present in the rithm proposed by Gill et al.[3] shows slightly lower errors for
grid. certain orders but not for others. This appears to be quite ran-
A different route may be taken via a Newton-like polynomial dom, not having any obvious tendency.
interpolation scheme based on Neville’s algorithm,[15,16] which The Taylor series of eq. (31) was used to select the best ratio
has been reported[15] to be the most efficient algorithm of this of number of Taylor series terms and grid step size. A grid
kind. Whereas the Taylor series uses a vertical interpolation step of 531025 in combination with a three-term Taylor series
between consecutive orders of the Boys function, the Neville expansion is sufficient to rigorously evaluate the Boys function
interpolation is a horizontal recursion in the argument x only. values for all orders up to 82. This comes at the cost of a
Fn ðxÞ is fitted by a Newton polynomial of a chosen degree m dense grid and five FLOPs per operating plus the grid step
that is part of a two dimensional grid. The polynomial is built construction which is one FLOP below the reported count of
recursively, terminating at the final polynomial expression p1;m Gill et al.[3] Using a grid step of 531026, the deviations are
approximating Fn ðxÞ: found to be below the errors of the cubic interpolation of Gill

attempts use a grid-based interpolation, as discussed in the

previous sections. A completely different approach was taken
by Schraudolph[18] in the context of neural network computa-
tions: By manipulation of the components of the standard
(IEEE-754) floating-point representation, he proposed a fast
and compact approximation of the exponential function. This
can be achieved at a cost of three or even two FLOPs if the
empirical factor EXPC[18] is preselected and the two intermedi-
ate numbers in the EXP define-pragma in Figure 2 of the refer-
enced work[18] are stored as one. Figure 1 displays that the
methods of Gill et al.[3] and Ishida[4,17] appear to be most accu-
rate but the approximation of Schraudolph[18] is found to be
the most efficient. The question arises if the scheme of Schrau-
dolph[18] is inaccurate over the full argument range or only in
Figure 1. Relative errors of a grid-based evaluation of the exponential function: certain parts. What can not be seen directly from this figure,
a) Gill et al.,[3] b) Ishida,[4,17] and c) Schraudolph[18] Grids for a) and b) have because of its resolution, is that the Schraudolph[18] method
been constructed with D52:531024 , as proposed by Ishida[4,17] for a direct
comparison. The method of Schraudolph[18] is an approximation and grid free.
reaches the recommended accuracy of down to 10214 at
x 14:39, after which the deviation further decreases. This
et al.[3] and Ishida[4] for selected orders. This is similar for the implies that the Schraudolph[18] approximation becomes quite
grid step of 531025 (data not shown). It was suggested accurate in the higher argument range. Because of the overall
before that the FLOP count of the Taylor series may be further good accuracy of the Ishida[4,17] interpolation, we will proceed
decreased if multiplicative factors are stored as part of grid by combining the methods of Ishida[4,17] and Schraudolph.[18]
data. This would increase the memory demand of the grid, The algorithm proposed by Ishida[4,17] for the evaluation of
but further reduce the FLOP count for the three-term series the exponential function will be used for arguments below a
from 5 to 4. Similar observations were made by Ishida.[17] certain order dependent cutoff and the method of Schrau-
Neville’s algorithm[15,16] of eq. (34) was validated as well via dolph[18] for arguments above. For arguments x > xlimit ðnÞ,
the defined scheme. Close comparison to the Taylor grid classi- both algorithms will use the respective limit expressions of the
fies the latter to be superior in terms of FLOP counts, but infe- recursions.
rior in terms of accuracy. The lowest degrees for which the Concerning the recurrence relations of eqs. (6) and (7), there
proposed accuracy of 10214 could be reached for all orders are two major questions of interest to be answered in the
up to 82, using a grid step of 531026, where found to be course of this section: As it was reported[5] that the upward
much higher than for the Taylor series. recursion is numerically unstable for small arguments, it is of
interest which effective values of x are behind this empirical
definition. Second, the question arises if there is any significant
Choosing Proper Recurrence Schemes
loss of accuracy with respect to an order-stride used in the
The limit expression of the Boys function has already been dis- recursions: For example, what exactly is the error made if F0 ðxÞ
cussed, as was the evaluation of individual values of Fn ðxÞ for is computed from F82 ðxÞ via downward recursion directly? Is it
orders n5½0; 82 and arguments x lower than the limit argu- more accurate in general to split the recursions into batches
ment. It is now mandatory to select a rigorous and optimized of m: Computing F0 from Fm, Fm11 from F2m , and so on? If the
scheme for the recursive computation, either upward or down- latter is the case, what is a proper value for the stride m and
ward as given in eqs. (6–9). It has already been derived in a does this differ in between the two recursions? Using numeri-
previous section that for the limit expression, the upward cally exact Boys function values, Figure 2 depicts the deviation
recursion scheme is always superior because of the lower of the evaluation of F0 and F1 via both upward and downward
FLOP cost for the evaluation of F0 ðxÞ in comparison to any recursion of eqs. (6) and (7) with respect to the analytical refer-
other Fn ðxÞ, however, it will be investigated which of the two ence in the argument range x5½0; 30: Both methods display a
recursion schemes is superior with respect to accuracy for dis- small maximum deviation clearly below 10214. The deviation
tinct combinations of argument ranges and orders. for smaller arguments is indeed higher for the upward than
Assuming that the Boys function of either the lowest or for the downward recursion which reflects the reported
highest order has been computed accordingly, values for all numerical instability[5] discussed before. The evaluation of F1
other orders at the same argument can be obtained via the from F0 via upward recursion, however, has an acceptable
defined recursions. These schemes use the exponential func- deviation for higher arguments than x 2:4.
tion expð2xÞ and it is mandatory for any efficient recursion of Generalizing this validation with m5½1; 82 either F0 is eval-
the Boys integral to use a highly FLOP reduced evaluation of uated via downward recursion from Fm or Fm is evaluated from F0
this function. This was already pointed out by Gill et al.[3] pro- via upward recursion. It is found that for the downward recursion,
posing to use a cubic interpolation similar to their evaluation it does not matter at all from which order m F0 is derived, as all
of the Boys function [eq. (14)], adapted by Ishida[4] by replac- recursions show a deviation lower than 10216 . This suggests that
ing the coefficients of the cubic interpolation [eq. (27)]. Both there is no need for any strides, if the recursion scheme is

optimized for an efficient evaluation (i.e., low FLOP counts) while

maintaining an adequate accuracy.
Combining the Methods of Gill et al. and Ishida

The first algorithm of this work combines the exponential
approximation of Schraudolph[18] with the benefits of the
methods of Gill et al.[3] and Ishida,[4] to an overall scheme. The
left hand side of Figure 3 displays how the ½0ðmÞ integrals are
assembled in the cubic interpolation scheme proposed by Gill
et al.[3] The original method of Ishida[4] is depicted on the
right hand side of the same figure. It has to be note here that
he later on reformulated this algorithm (Ishida[17]) to a Taylor
series expansion. His later formulation, however, is focused on
Figure 2. Relative deviations to the analytical reference values of the recur- accuracy, rather than performance, so this work (and Figure 3
sive evaluation of F0 ðxÞ and F1 ðxÞ: a) F0 ðxÞ from F1 ðxÞ via downward recur- for that matter) treats the original. As pointed out by Gill
sion [eq. (6)] b) F1 ðxÞ from F0 ðxÞ via upward recursion [eq. (7)]. The et al.,[3] there is no need to compute F1 via the recursion, as
ordinate in b) is cut at 6310216 for a better comparison to a). [Color figure
three FLOPs can be saved by a direct evaluation.
can be viewed in the online issue, which is available at wileyonlinelibrary.
com.] It is assumed that the grid is created on-the-fly during pro-
gram execution and is not stored on disk. In such a case, it is
properly chosen for a given argument x, and the order has no beneficial to construct the grid most efficiently, which is via
influence on this. At and after a transition point in the argument the cubic interpolation coefficients proposed by Ishida,[4] as
T, an empirical transition point obtained for all orders individually they can be assembled in less FLOPs than the coefficients pro-
(Table 2), the upward recursion becomes sufficiently accurate with posed by Gill et al.[3] [eqs. (23) vs. (15)]. After the grid is
respect to the demanded threshold. Apparently, the key to an effi- assembled successfully with Ishida’s proposal of a grid step
cient evaluation is a combination of both methods in individual D52:531024 , this algorithm assembles the Boys function simi-
argument ranges: downward recursion for x < 5T and upward lar to the proposal of Gill et al.[3]: The two special cases L 5 0
recursion for T < x. According to these findings, all T(n) are ⱗ30 and L 5 1 are treated individually. Higher orders are evaluated
(see Table 2) and the upward recursion may safely be used for via either downward or upward recursion. Gill et al.[3] used the
the values beyond the argument limit at a low FLOP cost. downward recursion for all orders, whereas this work switches
It has to be considered that Boys function values are related between them, as for each order, there is a certain transition
to respective order multipole–multipole interactions of charge point at and after which the upward recursion is found to be
distributions. Because of this, low orders of the Boys function more accurate than the downward recursion (see Table 2 for
have to be computed at a higher accuracy than function val- these values). Both recursions come at a formal cost of 3L
ues of higher orders. Both recursrion schemes maintain the rel-
ative accuracy, that is, the maximum deviation to the initial Table 2. Transition values for the argument x 5 T (two digits) at and after
value, while the absolute accuracy might be lost. If an approxi- which the upward recursion of eq. (7) becomes sufficiently accurate (i.e.,
mated Boys function value is put in as initial value, the set of absolute errors <10216 with respect to the numerical reference).
final values may show a deviation higher than the demanded n T n T n T n T

threshold with respect to the analytical expression.
1 2.35 22 8.59 43 16.22 64 24.17
2 2.35 23 8.88 44 16.69 65 24.56
Gathering the Pieces: Optimized Evaluation 3 2.35 24 9.31 45 17.14 66 24.85
4 2.35 25 9.84 46 17.49 67 25.23
Strategy 5 2.38 26 9.99 47 17.92 68 25.63
6 2.82 27 10.40 48 18.28 69 25.96
To construct an optimized evaluation strategy for the rigorous
7 3.05 28 10.71 49 18.61 60 26.43
evaluation of the Boys function kernel, it is not sufficient to aim 8 3.72 29 11.15 40 18.85 71 26.78
for a low FLOP count, but also for a low maximum error with 9 3.76 30 11.54 51 19.18 72 27.09
respect to the analytical expression. Grid-based evaluation is 10 4.01 31 11.88 52 19.68 73 27.56
11 4.61 32 12.37 53 19.94 74 27.89
found to be superior to any other evaluation scheme as it pro-
12 4.92 33 12.69 54 20.53 75 28.27
vides a small and per definition, which is most important, a con- 13 5.15 34 13.21 55 20.73 76 28.71
stant maximum FLOP count for all orders n and arguments x. 14 5.63 35 13.42 56 21.25 77 29.00
This count may be reduced by a careful per-element screening, 15 6.02 36 13.81 57 21.54 78 29.38
16 6.33 37 14.25 58 21.89 79 29.84
dropping negligibly small summation terms. This work proposes 17 6.83 38 14.65 59 22.29 70 30.22
to merge the existing schemes, using the most effective and accu- 18 7.15 39 14.97 50 22.72 81 30.56
rate for individual argument ranges and orders. Eventually, this 19 7.43 40 15.31 61 23.07 82 30.85
results in two multiple case implementations, one focused on a 20 7.94 41 15.84 62 23.44
21 8.27 42 15.96 63 23.90
rigorous evaluation of the Boys function kernel, the second

Figure 3. Flowcharts of the two methods of Gill et al.[3] and Ishida[4] (not Ishida[17]) for the computation of Boys functions and auxiliary integrals used in
electron integral evaluation.
FLOPs, but as mentioned in the previous section, caution has the used grid size, but dependent on the maximum order. This
to be taken if using the recursions directly to obtain either is not the case for the Taylor series used by this algorithm.
lower or higher orders of the Boys function. As only the maxi- Here, the FLOP counts directly depend on the used grid size
mum deviation to the previous recursion element is main- down to a minimum of 3. The two special cases L 5 0 and
tained and not the maximum deviation to the analytical value, L 5 1 are again treated individually. For others the algorithm
the source codes of both algorithms use empirical select case continues with a similar select case structure as algorithm I
structures to get rid of offending intermediates. This results in that is slightly modified at some points to maintain accuracy.
individual FLOP counts for each special case. The exponential
function used in both recursions is either assembled via the
Comparing Both Algorithms
method of Ishida[4,17] or via the approach of Schraudolph[18] in
the argument range where this method was shown to be Table 3 displays the individual FLOP counts of the original algorithm
accurate (see Fig. 1). Both algorithms use the half of the expo- of Gill et al.,[3] and both algorithms derived in this work (I/II) for the
nential that is either parametrized directly in the cubic interpo- angular moment set L5½0; 21, commonly used in todays quantum
lation coefficients (six FLOPs) or comes at the cost of an chemical code, for the argument x 5 0. As the first scheme is a com-
additional division after the Schraudolph[18] approach (three bination of the methods of Gill et al.[3] and Ishida,[4] it obviously dis-
FLOPs). For the limit expression, Gill et al.[3] use an asymptotic plays the same FLOP count. The major improvement here lies in a
series in case the argument is higher than a critical cutoff. This faster construction of the grid via the coefficients provided by Ish-
corresponds to the border expression of (2) also used here. ida[4,17] and an approximation to the exponential function via the
method of Schraudolph[18] for arguments higher than a cutoff. The
second algorithm uses a grid-based three-term Taylor series expan-
FLOP Reduction in a Second Algorithm
sion of the Boys function, and is found to be superior by one FLOP
Using the three-term Taylor series with a grid step of D55:03 for all orders, except for L 5 1 in which case three FLOPs are saved.
1025 results in a similar, yet FLOP reduced algorithm (see Fig. This corresponds to two-electron repulsion integrals over three s-
4), at the cost of an additional step after the grid index deter- type and one p-type orbital. The limit expression is the same for all

mination, as the difference xD 5 x2xg is used in the Taylor methods. A benchmark in terms of total FLOP counts is displayed in
series. This additional FLOP is regained, however, when com- Table 4. For pure s-type integrals (L 5 0), the saving is at about 14%
puting F0 in only four FLOPs. This saving of four FLOPs with and for sp-type integrals (L 5 1), it is about 19%. These savings arise
respect to the other methods is repeated for the evaluation of from the special treatment of the first two moments. As both algo-
F1, respectively. By now, this is three FLOPs more efficient at rithms require the same [3L 1 1] FLOPs for the incremental recur-
the cost of a larger grid and an associated higher memory sion scheme, it is obvious that the FLOP counts become almost
demand. It was found by this work that the FLOP counts in equal for higher moments, if the two first moments are not treated
the methods of Gill et al.[3] and Ishida[4] are independent of explicitly. The total saving, therefore, reduces to only about 0.7% for

Figure 4. Flowcharts of the two algorithms derived by this work for the computation of Boys functions. “1S” denotes that a square root is required. Total
FLOP counts for distinct orders are listed in Table 3, compared with the method of Gill et al.[3]
L5½0; 21 (see Table 3). The average FLOP savings of the second integral. The work of Gill et al.[3] and Ishida[4,17] have been
algorithm with respect to the first is at about 2% for L > 1. The abso- compared with two methods of polynomial interpolation in
lute accuracy, that is, a maximum deviation of 10214 to the analyti- terms of both accuracy and efficiency and were found to be
cal solution, is maintained for all moments and arguments. the most efficient types of evaluation. Both methods rely on a
Chebyshev approximation of the Boys function, where Ish-
Conclusions ida[4,17] provided a set of coefficients that can be constructed
more efficiently than the coefficients of Gill et al.[3] After the
The goal of this work was the construction of an efficient and grid construction, the scheme of Ishida[4,17] continues along
accurate scheme for the evaluation of the Boys function kernel the method of Gill et al.[3] Combining the two methods comes
intuitively and was realized in the algorithm of this work, treat-
Table 3. FLOP count for the construction of all fFm £ L ðxÞg with the origi-
ing several orders and argument ranges individually.
nal algorithm of Gill et al.[3] and the algorithms proposed by this work, To achieve additional FLOP savings with respect to the
for angular moments commonly used in todays quantum chemical code. already highly efficient method of Chebyshev approximation, it
felt necessary to aim for a completely different approach. A
L Gill et al.[3] I II Limit
grid-based Taylor series expansion of the Boys function was
0 7 7 6 1
1 13 13 10 3
2 19 19 18 5
3 22 22 21 7
Table 4. Total FLOP counts of the two proposed algorithms, validated via
4 25 25 24 9
an equidistant grid of n points in the argument range x5½0; 50, and
5 28 28 27 11
sampled over the full angular moment ranges L5[a, b].
6 31 31 30 13
7 34 34 33 15
L n FLOPs I FLOPs II Savings
8 37 37 36 17
9 40 40 39 19 [0,0] 5,000,000 19,754,000 16,932,000 14.29%
10 43 43 42 21 [0,1] 5,000,000 45,407,000 36,938,000 18.65%
11 46 46 45 23 [0,5] 5,000,000 122,696,008 119,859,007 2.31%
12 49 49 48 25 [0,9] 5,000,000 191,412,004 188,559,003 1.49%
13 52 52 51 27 [0,13] 5,000,000 260,000,000 257,128,000 1.10%
14 55 55 54 29 [0,17] 5,000,000 328,420,000 325,525,000 0.88%
15 58 58 57 31 [0,21] 5,000,000 396,624,000 393,702,000 0.74%
16 61 61 60 33 [0,82] 25,000 6,882,166 6,746,448 1.97%
17 64 64 63 35 [0,82] 50,000 13,764,341 13,492,898 1.97%
18 67 67 66 37 [0,82] 250,000 68,821,741 67,464,498 1.97%
19 70 70 69 39 [0,82] 500,000 137,643,491 134,928,998 1.97%
20 73 73 72 41 [0,82] 2,500,000 688,217,491 674,644,998 1.97%
21 76 76 75 43 [0,82] 5,000,000 1,376,434,991 1,349,289,998 1.97%

found to be an adequate alternative, coming at the cost of an [1] S. F. Boys, Proc. R. Soc. Am. 1950, 200, 542.
[2] T. P. Hamilton, H. F. Schaefer, Chem. Phys. 1991, 150, 163.
associated overhead in the grid size and the employment of
[3] P. M. W. Gill, B. G. Johnson, J. A. Pople, Int. J. Quantum Chem. 1991, 40, 745.
additional grid arrays. The original concerns of Gill et al.[3] [4] K. Ishida, Int. J. Quantum Chem. 1996, 59, 209.
about possible memory issues, however, are not of any con- [5] T. Helgaker, P. Jørgensen, J. Olsen, Molecular Electronic-Structure
cern nowadays, facing constantly increasing capabilities of Theory; Wiley: Chichester, England, 2004. ISBN 0-471-96755-6.
[6] I. I. Guseinov, B. A. Mamedov, J. Math. Chem. 2006, 40, 179.
computer hardware. The major savings of this strategy arise [7] I. Shavitt, Methods in Computational Physics, Vol. 2; Academic: New
from the evaluation of F1 ðxÞ integrals that save three FLOPs York, 1963; p. 1.
each with respect to the original method of Gill et al.[3] (see [8] V. R. Saunders, Computational Techniques in Quantum Chemistry and
Molecular Physics; Reidel: Dordrecht, 1975; p. 347.
Table 3). For all other methods, at least one FLOP is saved, [9] V. R. Saunders, In Methods in Computational Molecular Physics; G. H.
but these small savings accumulate. Direct benchmarks of F. Diercksen, S. Wilson, Eds.; D. Reidel Publishing Company: Dordrecht,
both algorithms display an average FLOP saving of about 2% Holland, 1983.
[10] €
M. Kara, A. Nalçaci, T. Ozdogan, Int. J. Phys. Sci. 2010, 5, 1939.
with respect to our implementation of the Chebyshev approxi-
[11] D. B. Cook, Handbook of Computational Quantum Chemistry; Dover
mation for orders L > 1. As the first two moments L 5 0 and L Publications: Mineola, New York, 2005. ISBN 0-486-44307-8.
5½0; 1 are treated explicitly, FLOP savings up to 19% are pos- [12] B. A. Mamedov, J. Math. Chem. 2004, 36, 301.
sible, if using the Taylor series for these orders. This corre- [13] L. E. McMurchie, E. R. Davidson, J. Comput. Chem. 1978, 26, 218.
[14] M. Primorac, Int. J. Quantum Chem. 1997, 68, 305.
sponds to ðssjssÞ and ðpsjssÞ type two-electron repulsion [15] G. Em Karniadakis, R. M. Kirby, II, Parallel Scientific Computing in C11
integrals. As these types of integrals form the basic auxiliary and MPI; Cambridge University Press: Cambridge, New York, 2008.
integrals in recursion schemes, the knowledge gained by our ISBN 978-0-521–52080-5.
[16] W. H. Press, S. A. Teukolsky, W. T. Vetterling, B. P. Flannery, In Numerical
study is hopefully useful for future work. Recipes—The Art of Scientific Computing, 3rd ed.; Cambridge Univer-
sity Press: Cambridge, New York, 2007. ISBN 978-0-521-88068-8.
Keywords: integrals • Boys functions • quantum chemistry [17] K. Ishida, J. Chem. Phys. 2000, 113, 7818.
[18] N. N. Schraudolph, Neural Comput. 1999, 11, 853.
How to cite this article: A. K. H. Weiss, C. Ochsenfeld J. Comput. Received: 4 October 2014
Chem. 2015, 36, 1390–1398. DOI: 10.1002/jcc.23935 Revised: 7 March 2015
Accepted: 12 March 2015
Published online on 13 May 2015

Copyright of Journal of Computational Chemistry is the property of John Wiley & Sons, Inc.
and its content may not be copied or emailed to multiple sites or posted to a listserv without
the copyright holder's express written permission. However, users may print, download, or
email articles for individual use.

JCC 23935

Uploaded by

Copyright:

Available Formats

JCC 23935

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

JCC 23935

Uploaded by

Copyright:

Available Formats

FULL PAPER WWW.C-CHEM.

A Rigorous and Optimized Strategy for the Evaluation of

achieve a minimal number of floating-point operations. Based Inc.

Introduction reduced evaluation of the Boys function while maintaining the

1390 Journal of Computational Chemistry 2015, 36, 1390–1398 WWW.CHEMISTRYVIEWS.COM

Journal of Computational Chemistry 2015, 36, 1390–1398 1391

1392 Journal of Computational Chemistry 2015, 36, 1390–1398 WWW.CHEMISTRYVIEWS.COM

An intuitive way to expand any continuous function around Fn ðxÞ p1;m

Journal of Computational Chemistry 2015, 36, 1390–1398 1393

attempts use a grid-based interpolation, as discussed in the

1394 Journal of Computational Chemistry 2015, 36, 1390–1398 WWW.CHEMISTRYVIEWS.COM

optimized for an efficient evaluation (i.e., low FLOP counts) while

Combining the Methods of Gill et al. and Ishida

final values may show a deviation higher than the demanded n T n T n T n T

Journal of Computational Chemistry 2015, 36, 1390–1398 1395

1396 Journal of Computational Chemistry 2015, 36, 1390–1398 WWW.CHEMISTRYVIEWS.COM

Journal of Computational Chemistry 2015, 36, 1390–1398 1397

1398 Journal of Computational Chemistry 2015, 36, 1390–1398 WWW.CHEMISTRYVIEWS.COM

You might also like