Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Towards an order- N DFT method

1998, Theoretical Chemistry Accounts: Theory, Computation, and Modeling (Theoretica Chimica Acta)

Theor Chem Acc (1998) 99:391±403 DOI 10.1007/s002149800m26 Regular article Towards an order-N DFT method C. Fonseca Guerra1, J.G. Snijders2,1, G. te Velde1, E.J. Baerends1 1 Afdeling Theoretische Chemie, Scheikundig Laboratorium der Vrije Universiteit, De Boelelaan 1083, 1081 HV Amsterdam, The Netherlands 2 Laboratorium voor Chemische Fysica, Rijksuniversiteit Groningen, Nijenborgh 4, 9747 AG Groningen, The Netherlands Received: 30 March 1998 / Accepted: 6 July 1998 / Published online: 15 September 1998 Abstract. One of the most important steps in a KohnSham (KS) type density functional theory calculation is the construction of the matrix of the KS operator (the ``Fock'' matrix). It is desirable to develop an algorithm for this step that scales linearly with system size. We discuss attempts to achieve linear scaling for the calculation of the matrix elements of the exchangecorrelation and Coulomb potentials within a particular implementation (the Amsterdam density functional, ADF, code) of the KS method. In the ADF scheme the matrix elements are completely determined by 3D numerical integration, the value of the potentials in each grid point being determined with the help of an auxiliary function representation of the electronic density. Nearly linear scaling for building the total Fock matrix is demonstrated for systems of intermediate size (in the order of 1000 atoms). For larger systems further development is desirable for the treatment of the Coulomb potential. Key words: Linear scaling ± Density functional theory ± ADF program electronic Coulomb potential is particular to the ADF code. It is the purpose of this contribution to discuss the way in which linear scaling can be achieved in setting up the Fock matrix in this method. In this introduction we will ®rst brie¯y consider the scaling of various steps in the ADF method and compare this with alternative density functional theory (DFT) methods. In order to achieve linear scaling we need to de®ne a ®nite spatial region over which a basis function can be considered to be nonzero. A de®nition of the spatial cut-o€ for the basis functions that takes into account the e€ect of the cut-o€ on the matrix elements to be constructed will be discussed in Sect. 2. Section 3 treats the linearization of the density ®tting, Sect. 4 the linearization of the evaluation of the exchange-correlation and Coulomb potentials in the grid points, Sect. 5 the building of the Fock matrix and Sects. 6 and 7 o€er results and conclusions respectively. The matrix elements of the exchange-correlation (XC) potential and the Coulomb potential are both evaluated by numerical integration, using P integration points Glm ˆ hvl jV^xc ‡ V^Coul jvm i  1 Introduction There is currently much interest in the development of O(N) methods (linear scaling of computation time with system size) for both Hartree-Fock (HF) and KohnSham (KS) calculations. In contrast to most other current schemes for KS calculations, the Amsterdam density functional code [1] (ADF) uses 3D numerical integration for the evaluation of the ``Fock'' matrix elements throughout (in spite of some historical and logical inconsistency, we denote the matrix of the KS one-electron operator simply as Fock matrix). The use of 3D numerical integration also for the elements of the Correspondence to: E.J. Baerends ‡ VCoul rk ††vm rk †wk P X kˆ1 vl rk † Vxc rk † 1:1† Using 3D numerical integration in this way has the advantage that it is possible to use almost any type of basis function, although there may be some restrictions arising from the required evaluation of VCoul rk † at the grid points (the Poisson solver may or may not use the basis functions). STOs have been chosen as a convenient and physically well-motivated basis. Apart from enabling the one-electron wavefunctions and the total density to have the correct behaviour in the tail and at the nucleus, they are also ecient since their number can typically be a factor of 3 lower than the number of Gaussians. Another choice that has been made is numerical atomic orbitals (NAOs, also called single-site orbitals, SSOs) [2±6] With N basis functions, there is an NP step in the calculation for the evaluation of the basis 392 functions at all points, assuming for the time being that no cut-o€s are applied. The value of the XC potential at each point simply follows, with the current approximate functionals, from the density and its gradient at that point. The density can be calculated either through the density matrix, yielding an N2P step, or in terms of the occupied orbitals, yielding an NeNP step [7]. Here Ne is the number of electrons. Of course not every size parameter is equal, in particular for large molecules one usually has P  N 0 > N > Ne 1:2† where N¢ is the number of auxiliary functions to be discussed below. However, each of these parameters scales linearly with system size, so the sum of their exponents can be used roughly as the scaling order and the density evaluation appears to be an O(N3) step. In fact, a set of atom-centred auxiliary functions (density ®tting basis) was introduced [1], which allowed the density to be written as a linear expansion in N¢ functions, XX q r† ! q~ r† ˆ ai fi r† 1:3† A i2A making evaluation of the density and of Vxc an N¢P, i.e. an O(N2) step. The determination of the ®t coecients ai was also implemented as an O(N2) step (for more about the ®tting procedure, see Sect. 3). The Coulomb potential at each point is calculated from the functions f c, obtained as Coulomb integrals from the ®t functions f, Z fi r† c fi rk † ˆ dr jr ÿ rk j Z q~ r† ~ dr VCoul rk † ! V rk † ˆ jr ÿ rk j XX ai fic rk † 1:4† ˆ A i2A This makes the evaluation of the Coulomb potential at the grid points also an N¢P [O(N2)] step, though with a much larger prefactor than the density and Vxc evaluation. Finally, setting up the G matrix according to Eq. (1.1), when the values of the potentials at all points are available, as well as the values of the basis functions at all points, is an N2P step, i.e. O(N3), as is the subsequent diagonalization. The scalings of the computation steps in the original ADF implementation [1] are summarized below: Fit coe€. Potentials in determi- grid points nation Functions in grid points Fock matrix Diagonalset-up ization O(N2) NP N2P aN¢P+bN¢P O(N3) For many systems, in particular systems incorporating transition metal atoms (transition-metal complexes or clusters with a limited number of metal atoms) which were the prime application targets, the dominant steps were the ®t coecient determination and the Fock matrix set-up, so the scaling was typically between N2 and N3. This constituted a signi®cant improvement over the then N4 scaling of ab initio HF calculations. Before discussing how the various steps in the calculations can be scaled down to O(N) by appropriate cut-o€ procedures, we comment brie¯y on the di€erence between the ADF scheme and the now widespread use of auxiliary functions in Gaussian-based DFT codes. Sambe and Felton [8] adopted the density ®tting procedure of ADF in order to scale down the computation of the Coulomb operator matrix elements, but tried to avoid 3D numerical integration as much as possible (which, however, is not completely possible). They noted that the density ®tting could also be used to scale down the calculation of the matrix elements of the Coulomb operator J^ ˆ VCoul r† from O(N4) to O(N3) when using analytical integral evaluation for the 1/r12 integrals: X Z 1 ai vl r1 †vm r1 † fi r2 † dr1 dr2 Jlm ˆ hvl jVCoul jvm i ˆ r 12 i 1:5† This, of course, is not tied in any way to DFT, but is applicable in HF calculations as well. The requirement of ecient analytical integral evaluation implied, however, that STOs had to be abandoned and Gaussian basis sets had to be used instead. This approach is rather ecient if the Coulomb operator alone is considered: the N2N¢ process of Sambe and Felton, which does not have a high prefactor since the Coulomb integrals of Eq. (1.5) are easy with Gaussian functions, compares favourably to the N¢P+N2P process of ADF (N¢P for the potential evaluation on the grid; N2P for the matrix set-up), since P is typically high. However, since the matrix elements of the exchange-correlation potential are already evaluated by numerical integration (not by Sambe and Felton, but in many DFT codes today), the added cost of including the Coulomb potential in Eq. (1.1) is only the N¢P evaluation of VCoul rk † on the grid. This is nevertheless an important part of the calculation, since the prefactor is relatively high (the evaluation of f c(rk) is expensive). It is also dicult to scale down to O(N) due to the long-range nature of the Coulomb potential. The use of auxiliary function sets for the Coulomb problem, sometimes denoted the RI-J technique, is currently receiving renewed interest, cf. Kendall and FruÈchtland [9] and Eichkorn et al. [10]. In this paper we will address the reduction of the various steps in the Fock-matrix set-up to linear scaling. It is to be noted that a variety of methods are currently being explored to achieve linear scaling for the matrix elements of the Coulomb potential [11±24]. There are other steps in the calculation which we do not discuss in this paper but which have already received considerable attention elsewhere, such as the scaling of the point generation itself [25] and, of course, the diagonalization [26±35]. 2 De®nition of cut-o€s for matrix elements For each function it is natural to de®ne a radius outside which it is negligible, and to determine the cut-o€ radius of each atom as the maximum of the radii of its 393 functions. In the literature [7, 25] the function radius kl has usually been determined from the function values by the condition jvl r†j < s for jrj > kl . This condition does not take into account the fact that cutting at a given function value is not quite the same for weakly decaying functions (small exponent) as for sharply decaying functions. A (much) larger contribution to, for instance, the normalization integral would be neglected in this way for a (very) weakly decaying function. In order to have approximately the same relative error in matrix elements of both di€use and contracted functions we determine the function radius according to a somewhat di€erent algorithm, without of course any change in the underlying idea. The relative weight of the radial part of a basis or ®t function beyond a certain cut-o€ radius r0 is calculated as the ratio between the integral of the tail of the function beyond the cut-o€ point r0 and the total radial integral, R1 2 R 1 n‡1 ÿar r e dr r0 r v r† dr R ˆ 0 w v; r0 † ˆ 1 2 2:1† n ‡ 1†!=an‡2 r v r† dr 0 Here n is the main quantum number, i.e. the radial part of v is rnÿ1 eÿar . Since we wish to have a very ecient algorithm, in view of the frequent tests on these weights, we avoid the time-consuming integration over the partial region r0 ; 1† in the numerator by using a simple exponential function to approximate the tail of the function r2 v, g r† ˆ Ceÿbr 2:2† The parameters b and C are determined by d 2 2:3† r vjrˆr0 dr At large cut-o€ points the function g simulates the tail of r2 v almost perfectly. Equations (1.1) and (1.3) lead to the following expression for the relative weight of the tail g r0 † ˆ r02 v r0 † and g0 r0 † ˆ w v; r0 † ˆ an‡2 r0n‡1 eÿar0 n ‡ 1†! a ÿ n ‡ 1†=r0 † 2:4† If we had used the normalization integral rather than the direct radial integral over (the radial part of) the function in Eq. (2.1), approximately the square of the present expression would have resulted (apart from a factor of the order of magnitude 1) for the contribution of the region beyond r ˆ r0 to the normalization integral. Therefore the typical choice for the weight of 0.1% (w ˆ 10)3) implies neglecting a contribution 10)6 to the normalization integral. The weight of the neglected tails, from which the cut-o€ radii r0 (denoted kl) are determined, is of course chosen depending on the desired maximum ``cut-o€ error''. The latter has to be in keeping with the precision of the numerical integration. An atomic radius can be determined as kA ˆ max kl †. All l2A matrix elements of a pair AB can be excluded if RAB > kA ‡ kB . For a given atom A we have a neighbourhood of atoms B which are such (radius kB larger than RAB ÿ kA ) that the pairs AB have to be treated. This will always be a small number and for large molecules will not increase with system size. Parts of the code with an outer and inner loop over the atoms [O(N2)] transform into a single loop over the atoms with, for each atom, a small inner loop over the neighbouring atoms [O(N)]. As an additional re®nement, for an atom pair AB that has to be considered, the function pairs may be checked and excluded if RAB > kl ‡ km . This will not change the O(N) scaling but will improve the prefactor. For matrix elements that have to be evaluated according to the chosen criterion one can still restrict the number of sample points in the numerical integration by using essentially the same criterion to neglect distant points (see Sect. 4). 3 Linearization of the density ®tting The linearization of the density ®tting is trivial once an important point is recognized concerning the implementation of the density ®tting in the ADF program. To make this clear we brie¯y review the density ®tting as introduced in [1]. The P®t coecients for a density q r†; q r† ! q~ r† ˆ ai f r† are determined from a least-squares ®tting procedure, minimizing the deviation D between true and ®tted density subject to the condition that charge is conserved Z Z Z Dˆ q ÿ q~†2 ds; q~ ds ˆ q ds ˆ N 3:1† Introducing the Lagrangian multiplier 2k one obtains the equation Sa ˆ t ‡ kn 3:2† where a is a column vector with the ®t coecients, S is the matrix of ®t function overlap integrals, n is a R column with the integrals ni ˆ fiRdr and t is a column vector collecting the overlaps ti ˆ q r†fi r† dr. Solving for the Lagrange multiplier gives explicitly a ˆ Sÿ1 t ‡ kSÿ1 n ˆ Sÿ1 t ‡ N ÿ ny Sÿ1 t ÿ1 S n ny Sÿ1 n 3:3† Dunlap et al. [36] have changed the deviation to be minimized from the least-squares deviation in the density to the least-squares deviation in the Coulomb self repulsion of q ÿ q~. This allows the electronic Coulomb energy, evaluated from the ®tted density, to approach the true Coulomb energy from above (``variational ®tting for the Coulomb term in the energy''). Carrying out the ®tting process as outlined here for the whole density at once will be an O N 03 † process in view of the linear system of Eq. (3.2). Considering possibilities to make the ®tting scale more favourable, we ®rst note that apart from the charge conservation condition showing up in the second (Lagrange multiplier) term of the equation for a, the determination of the ®t coecients is a linear process in the ®tted object, the density P q r†: it is possible to write q as a sum of parts q ˆ i qi say) and determine the coecients as the sum 394 of the coecients arising from ®tting the parts separately, provided all functions are used for each qi term. It is natural to express the charge density q r† as the sum of one- and two-centre charge distributions qAB : X XXX X qˆ Plm vl vm ˆ Plm vl vm ˆ qAB 3:4† l;m AB l2A m2B AB The use of all ®t functions for the ®t of each qAB term implies that apart from functions at A and B, also functions at neighbouring centres, C say, would have to be used. The ADF implementation [1], however, carries out the ®tting for pair densities qAB separately and restricts the ®tting of qAB to ®t functions on A and B only, making the ®tting an O N 2 † process. This policy does not seem to have been followed by either Sambe and Felton [8] or Dunlap et al. [36] or any of the more recent implementations of the use of auxiliary functions, except for the recent work of Gallant and St-Amant [37], who also partition the total density into a sum of subsystem densities. We brie¯y review the arguments in favour of atomic pair ®tting: 1. The process becomes O N 2 †. Moreover, the ®tting of each pair density qAB with functions on A and B only implies that the dimension of each linear system, Eq. (3.3) remains modest and is independent of system size. 2. The possible bene®t in ®tting qAB that could be derived from functions on other atoms is heavily system dependent. Since the presence of atoms C close to qAB is accidental, the quality of the ®t could vary from one system to another depending on the accidental surrounding of a pair AB by other atoms. A reliable ®t basis will have to be able to ®t a density qAB accurately, with approximately the same precision, under all circumstances, also in cases where other atoms are remote or not present at all, for instance in the diatomic molecule AB. 3. It is not possible to take the accidental presence of atoms C into account when systematically developing sets of ®t functions. For ®t functions on A this is possible to a certain extent for directly bonded atoms B, since the variation in the number and type of chemical bonds formed by an atom A is limited (nearest neighbour distances for instance fall in a certain range). Fit function sets for an atom A can therefore be generated that can cope with the presence of any ``normal'' twocentre bond with an atom B. The presence of ``third parties'' C is not certain and can therefore not be exploited in a generation scheme. It is possible that the ®t sets generated for reliable pair ®ts are actually larger than would be necessary in some particular case ± the ADF ®t sets are indeed relatively large ± but this is not a problem since the linear system, Eq. (3.3), always remains small. 4. The pair ®tting scheme can be trivially scaled down to O(N) (see below). Given a pair-wise ®tting scheme, linear scaling can simply be achieved by including only atom pairs for which the sum of the radii, determined as described in Sect. 2, is larger than the distance between the atoms. Furthermore, a smaller prefactor can be obtained by eliminating the pairs of basis functions for which the sum of the radii is smaller than the distance between their atoms. We note that the pair-wise ®tting has also proven valuable during the parallelization of the ADF program. As long as the number of nodes is smaller than the number of atom pairs, distribution of the one- and two-centre densities over the nodes of a parallel machine, with proper account of their computational ``weights'' to achieve load balancing, yields a perfectly scaling parallel implementation [38]. In the present case of course, only the pairs that are not excluded are to be distributed. 4 Linearization of manipulations involving grid points: Coulomb and XC potential evaluation The calculation of the XC potential will already be linear when we apply cut-o€s to the evaluation of the values of ®t functions at the grid points (and derivatives) to obtain the density and XC potential. Since it is the value of a ®t function at the grid point that enters the density evaluation, cut-o€s can be based on a threshold for the function value: fi r† is considered negligible outside a radius r0 at which the radial part of fi is equal to a given small threshold, fi r0 †=Zlm ^r0 † ˆ s (Zlm is a real spherical harmonic). We use the weight criterion, Eq. (1.4), since the density is actually used for the XC potential, which enters the spatial integrals making up the Fock matrix elements. The XC potential evaluation is much cheaper than the evaluation of the Coulomb potential, but in our scheme the calculation of the two potentials is closely related. For the distance cut-o€s in the Coulomb potential we will follow the same strategy as implemented in the ADF-BAND code for in®nite periodic systems [5]. The Coulomb potential in a certain integration point can be written as a sum over the atoms: X V C rk † ˆ VCA rk † with VCA rk † A ˆ X Z ai i2A fi r0 † dr0 jrk ÿ r0 j 4:1† Applying the expansion of jr ÿ r0 j in spherical harmonics and using the exponential form of the STO ®t functions gives us the following spherical harmonics expansion for the Coulomb potential from each atom: VCA rk † ˆ A Ilm RkA † ˆ mˆl XX l X 4p ^ kA †I A RkA † Zlm R lm 2l ‡ 1 mˆÿl d l; li †d m; mi †ai I ni ; li ; ai ; RkA † 4:2† i2A Here RkA is the distance vector from nucleus A to the A point rk, and Ilm is the radial part of the (lm) term in the spherical harmonics expansion. The function I is obtained from incomplete Gamma functions. Going back to the algorithms used to evaluate incomplete Gamma functions, it is apparent that this function can be written as the sum of a multipolar and an exponentially decaying part: 395 I ni ; li ; ai ; RkA † ˆ 1 RkA † li ‡1 ni ‡ li ‡ 1†! ai †ni ‡li ‡2 4:3† ‡ eÿai RkA J ni ; li ; ai ; RkA † The function J consists of a power series in RkA, with ni as highest power. This is just one factor RkA more than in the value of the STO ®t function itself, which of course has a simple rni ÿ1 eÿai r radial behaviour. For the sake of clarity one cut-o€ radius for both the exponential part of the Coulomb potential and the function value for the density evaluation is used, where the rni eÿai r radial behaviour of the former is to be taken into account when determining the cut-o€ threshold. The multipolar part of the Coulomb potential is longrange, which hampers achieving linear scaling. It is adA vantageous to ®rst compute the ``strength'' Mlm of an atomic multipolar term of nucleus A as A ˆ Mlm X d l; li †d m; mi †ai i2A ni ‡ l ‡ 1†! ai †ni ‡l‡2 4:4† so that the multipolar term becomes Qlm RkA † ˆ A Mlm RkA †l‡1 They determine the set of relevant basis functions for each grid point with an ecient tree algorithm. However, ADF uses (as do most DFT implementations, see TURBOMOLE [39] and GAUSSIAN 94 [40]) blocks of points for eciency reasons, so that vectorization in inner loops over grid points can be used. Moreover, as explained in Sect. 2, we do not base the cut-o€ decisions on a radius for each basis function that is determined by a ®xed threshold for the function value. We wish to base the cut-o€ decisions on an estimate of the percentage of the matrix elements that will be neglected. Therefore the set of functions that has to be taken into account for a given block of points B is based on whether or not all the points of the block are located in the region of space beyond the radius r0, which according to Sect. 2 contributes less than a given percentage [w of Eq. (2.4)] to the radial integral over the whole space. We de®ne the tail function T(vl; rk) for function vl with respect to a point rk as the weight in the sense of Sect. 2 of the radial integral of vl beyond a radius which is the distance from the centre of the function to the point rk, or T vAl ; rk † ˆ w vl ; jrk ÿ RA j† 4:5† The multipolar strength can be calculated once for all atoms (at a certain SCF cycle), and then the possibility to neglect higher-order multipoles can, for a particular A as well block of grid points, be based on the strength Mlm as the distances RkA. For the size of systems we have been considering, it has never been possible to neglect also the l ˆ 0 terms, not even from the most distant atoms, so there has always remained an O(N2) term in the Coulomb potential evaluation. It has been possible to neglect higher multipole terms, in particular above l ˆ 1. We will investigate below to what extent the O(N2) scaling of the l ˆ 0 potential terms, which only require a small fraction of the total time in the calculations without cut-o€s, a€ects the overall scaling in the systems considered (of the order of a several hundreds of atoms). Obviously, the structure of the whole scheme is suitable for fast multipole techniques [11±13, 23], as are now being applied in Gaussian-based codes [5, 11±15, 17]. Nevertheless, it appears that such techniques would only pay o€, and are actually only needed in the present scheme, when the size of systems becomes one order of magnitude larger. 5 Linearization of manipulations involving grid points: function evaluation and Fock matrix set-up PeÂrez-Jorda and Yang [7] have developed an O(N) method for the evaluation of the density at the grid points, and of matrices such as the overlap matrix, based on the construction of the sets S(rk) of basis functions that have nonnegligible values at a given grid point rk (we ignore here the partition function pa that also plays a role in the development of [7]):  5:1† S rk † : vl vl rk †j > s 5:2† In case a block of points B is speci®ed (T(vl; B)), the radius is to be taken as the distance from the centre of the function to the nearest point of the block. The use of the tail function allows a more even-handed treatment of di€use and contracted functions than the uniform cutting of basis functions at a given value. We now de®ne the set of functions S(B) that belongs to a block of points B as follows: vl (on atom A) belongs to S(B) if there exists a vm (on atom C) such that both the following conditions are satis®ed: a) RAC < kl ‡ km ; b) T vl ; B†  T vm ; B† > threshold 5:3† Atoms A and C are chosen from the set of atoms that possess functions with a tail T(vl; B) over the block larger than a given threshold. The condition represented in Eq. (5.3a) then means that the function vl is suciently close to some other function vm to have a non-negligible matrix element with it. The second condition checks if the contribution of the block B to the numerical integration that makes up the lm matrix element would indeed be non-negligible. Stratman et al. [25] have pointed out that the points in the block should be localized in space in order to limit the set of relevant basis functions. As a matter of fact, the Voronoy polyhedron scheme of point generation in ADF already generates points in spatial cells (small spheres around an atom, or wedges of the atom's Voronoy cell) [41, 42]. As pointed out in [7, 25] the creation of the sets S(B) of basis (and ®t) functions belonging to block B with just the condition expressed in Eq. (5.1) is an O(N2) step, the number of checks required being in principle (number of points) ´ (number of functions). In our case, where pair tests are being done for each block, cf. Eq. (5.3), it is in principle an O(N3) step. However, with present target sizes of molecules of up to 1000 atoms, it is still very 396 cheap. It is organized eciently in a tree structure by ®rst, for a given block of points, excluding (most) atoms on account of their distance from the block, and considering only pairs lm belonging to the (small) set of atoms ``belonging'' to the block. Once the sets of functions S(B) belonging to the blocks B have been determined, the calculation of the function values is evidently O(N). The calculation of matrices of operators like the identity (overlap matrix) or VN(r), VCoul(r), Vxc(r) is now also automatically linear, if the values of the potentials in the points are known (see previous section), since for each block of points B a limited number of basis functions has to be processed, which is independent of system size. Let us denote the total operator P value by W(r) (operators like the kinetic energy T^ ˆ i ÿ 12 r2 i† can be treated in a completely analogous way), then for the matrix of the ^ and also for the density, if it is calculated operator X, from the density matrix, we have the following double loop over the basis functions: fConstruct contribution of block B to matrix elements; construct values of density in points of Bg For l 2 S B† for all rk 2 B : nl rk † ˆ X rk †vl rk † For m 2 S B† if RAC < kl ‡ km then if T vl ; B†  T vm ; B† > threshold then fadd contribution of B to Xlm g X v m r k †  n l rk † Xlm ˆ Xlm ‡ operator matrix† rk 2B fcalculate density in points of Bg for all rk 2 B : q rk † ˆ q rk † ‡ Plm vl rk †vm rk † code including the cut-o€s. As a test case we chose alkanes with the following geometry: the carbon atoms lie in one plane and form a fully extended zig-zag chain; the hydrogen atoms are below and above the plane. As a second test case we took hypothetical two-dimensional systems, namely hexagonal graphitic sheets, which are terminated by nitrogens. Figure 1 shows the largest one. The ®tting of our timing curves to a power function is done with the Levenberg-Marquardt algorithm [43]. For the investigation of the computational eciency in a three-dimensional system we have chosen the Pt(P(Ph)3)3CO complex. 6.1 Density ®t Figures 2 and 3 display the scaling behaviour of the two routines involved in the ®tting of the density. Prior to the SCF cycles for each pair of atoms the ®rst routine (FITINT) calculates the elements of the matrices S and n, and the individual integrals tlm;i ˆ hvAl vBm fiA or B i, which are contracted on each cycle with the density matrix elements Plm to generate the t vector [cf. Eqs. (3.2) and (3.3)], and writes these integrals to ®le (timing in Fig. 2). On every SCF cycle the second routine (RHOFIH) reads the matrix elements from ®le and performs the ®tting of the present density by solving Eq. (3.3) for each pair of atoms (timing in Fig. 3). Both ®gures show the scaling with and without the application of cut-o€s to the basis functions. The use of cut-o€s implies that we are only ®tting atom-pair densities of ``neighbouring'' atoms. The calculation of the ®t integrals, which is dominated by the integrals tlm;i ˆ hvAl vBm fiA or B i, meets our expectations in that Fig. 2 shows a decrease of an almost quadratic scaling to an almost linear scaling. The small deviations from quadratic and linear scaling respectively are due to density† End m End l We have included here the density evaluation from the density matrix to show that it follows essentially the same route, although in our implementation the density is directly evaluated from the ®t functions during the selfconsistent ®eld (SCF). Note that the use of the tail function T vl ; B† does not change the scaling, which would also become linear if only tests on a cut-o€ radius were performed. However, it allows an additional re®nement by which the contracted or di€use nature of basis functions, which is important for the contribution of block B to the matrix elements, is properly accounted for. 6 Results and discussion This section contains an overview of the e€ect of the implementation of cut-o€s in various stages of the calculation of the (electronic potentials part of the) Fock matrix, including graphs with timings for the separate routines of the original code of ADF and the adapted Fig. 1. The largest planar aromatic system used for the timing results 397 Fig. 2. The scaling behaviour of the routine (FITINT) that calculates and writes the ®t integrals to ®le, with and without the use of cut-o€s, for a series of alkane chains with the indicated total numbers of atoms Fig. 3. For the alkane chains, the scaling behaviour of the routine (RHOFIH) that reads the ®t integrals from ®le and solves the least-squares equation for the ®t coecients special circumstances. The scaling lower than 2.0 in the old code arises from atom pairs with large distance being cheaper with the current integral routines. The small deviation from linear scaling of the calculation with cuto€s may be due to I/O e€ects perturbing the timing, since we could determine this to be also the origin of the scaling of the routine RHOFIH (Fig. 3) not being as predicted. The scaling of RHOFIH without cut-o€s , Eq. (2.4), as well as with cut-o€s, Eq. (1.6), is higher than predicted. Analysis of this routine showed that the discrepancy between the actual scaling and the predicted one is caused by an I/O bottleneck. The routine is not computationally intensive but spends the largest part of its time in I/O manipulations, viz. the reading of the ®t integrals (compare the absolute times in Figs. 2 and 3). For the smaller molecules the ®le with the ®t integrals will ®t into memory; however, for the larger molecules this becomes impossible, as the size of this ®le scales linearly with the number of atoms. At certain molecular sizes this causes a tremendous growth of paging which perturbs the timings. The same will happen in the FITINT routine (Fig. 2) and may become visible for the smaller computation time in the calculation with cuto€s. 6.2 The Coulomb potential In Fig. 4 the scaling of the calculation of the Coulomb and the XC potential is shown with and without cuto€s. The reduction in the scaling power is rather small, although even this implies a reduction in computation time by a factor of 10 for the 600-atom system. We examine these timings more closely to understand the 398 small decrease in the scaling power. In Fig. 5 the di€erent components of the calculation of the Coulomb and XC potential are presented. First, the density in the grid points q (rk) is evaluated as written in Eq. (1.3). Due to the cut-o€s for each grid point only ®t functions in its neighbourhood are used. This results in a perfectly linear scaling of this step. The evaluation of the XC potential is also linear, because it only manipulates the density in the grid points. With V sr Coulomb we denote the calculation of the short-range (exponential) part of the Coulomb potential, Eq. (4.3). For this component of the Coulomb potential the same cut-o€ radii are applied as for the density. Therefore this component of the potential also scales linearly. For the evaluation A of the multipolar parts QAlm RkA † ˆ Mlm = RkA †l‡1 of the Coulomb potential, 1/RkA has to be calculated for each grid point rk with respect to all atoms in the system. Fig. 4. For the alkane chains, the scaling behaviour of the routine that calculates the Coulomb and the XC potential in the grid points Fig. 5. Timing and scaling behaviour for the various steps in the calculation of the Coulomb and the XC potentials in the grid points when cut-o€s are used (alkane chains). V sr Coulomb is the short range (exponential) part of the Coulomb potential; VlrCoul is the long range (multipolar) part A few computational operations are needed to deter2 2 2 2 mine q 1/R kA: ®rst, (RkA) from xkA ‡ ykA ‡ zkA , then RkA  from R2kA † and ®nally 1/RkA. We show explicitly just how much time the simple evaluation of the 1/RkA values takes by adding the computation time to that of VXC and V sr Coulom . This step of course scales quadratically, and since it takes an amount of time that is comparable to the previous steps, its addition changes the scaling from linear to about 1.5 for systems up to approximately 1000 atoms. The relatively large computation time of this simple step can be attributed to the evaluation of the square root being extremely expensive. The last curve shows the scaling for the complete evaluation of the Coulomb and the XC potential. Comparing the last two curves we see that once the 1/RkA values are known, only little time is needed to 399 evaluate the full multipolar contributions of Eq. (4.5). Since the evaluation of the long-range part of the Coulomb potential is a quadratically scaling step in our present implementation, its computation time would ultimately start to dominate. For the size of systems we are interested in, the evaluation of the long-range part, although algorithmically very simple, takes an amount of time that is comparable to (even somewhat larger than) the sum of the exchange-correlation potential and the short-range part of the Coulomb potential. Since several of the other computation steps can be scaled down very successfully (see the comparison for total timings below), the computation time for the long-range Coulomb potential changes from a modest fraction of the time without cut-o€s to a signi®cant fraction after application of cut-o€s. The computational expense is Fig. 6. The scaling behaviour (alkane chains) of the routine that calculates the values of the basis functions in the grid points, with and without the use of cut-o€s Fig. 7. The scaling behaviour (alkane chains) for the ®nal setup of the Fock matrix, when potentials and basis function values are known, with and without the use of cut-o€s still acceptable, but it is one of the prime targets for future e€orts towards computational speed-up. 6.3 The Fock matrix Besides the potential in the integration points we need the values of the basis functions in the integration points to set up the Fock matrix. Figure 6 shows the two curves, with and without cut-o€s, for the evaluation of v(rk). The original code of the routine scales quadratically due to the loop over the functions and the loop over the integration points. The adapted code scales perfectly linearly as expected. Having all ingredients available (function and potential values in the grid points) the ®nal set-up of the Fock matrix is shown in Fig. 7. A double loop over the 400 atoms and a loop over the integration points in the original code give a cubic scaling. The adapted code shows exactly linear scaling with system size. Dramatic improvement of the computational eciency results. 6.4 Aromatic systems The applicability of our method to two-dimensional systems is demonstrated with the results of the aromatic systems. We do not show the individual routines but the timing is shown for the sum of the two routines involved in the density ®t (Fig. 8) and the calculation of the total Fock matrix (Fig. 9), including the calculation of function values, of the potentials, and of the matrix elements of the Fock matrix with both the kinetic energy and the electronic potentials as operators. In Fig. 8 the scaling behaviour with cut-o€s is dominated by the Fig. 8. The combined scaling behaviour in the planar aromatic systems of the two routines (FITINT and RHOFIH) that are involved in the ®tting of the density, with and without the use of cut-o€s (one call of each routine) Fig. 9. The combined scaling behaviour of all routines that are used to set-up the Fock matrix: the calculation of the Coulomb and the XC potential in the grid points, the evaluation of the basis functions in the grid points, the initialization of the Fock matrix with the kinetic energy part, and the ®nal set-up of the Fock matrix. The timings are shown with and without the use of cut-o€s evaluation of the ®t integrals. The aromatic systems are not yet large enough to enter the size range where the cut-o€s cause the number of pairs to be handled to increase linearly (note that in one dimension even the largest aromatic system has the size of only a C35 alkane chain). The number of pairs still increases like Eq. (1.3), explaining the observed scaling. The poorer scaling of RHOFIH, Eq. (1.7), is not visible due to the computation time being more than an order of magnitude less than that of FITINT. In Fig. 9 the scaling achieved in the calculation of the Fock matrix with cut-o€s applied is an average over the various components of the calculation. The quadratic scaling of the long-range Coulomb part ± still relatively cheap at this system size ± is not yet important, the 1.3 scaling has its origin in the increase of the number of pairs, as in the case of the ®t integrals. 401 6.5 Relative error Our objective throughout this project has been to improve the scaling with system size, while retaining sucient precision in the results. It is natural to gauge the required precision against the precision to which the numerical integration is typically set. A key quantity is the bond energy, i.e. the energy of the molecule minus the sum of the energies of the atoms. With increasing system size, the relative error in this quantity is required to stay constant (approximately), implying a constant absolute error in the bonding energy per bond, which of course also stays roughly constant irrespective of system size. The numerical integration has the required property of yielding a constant relative error independent of system size; its magnitude being determined of course by the setting of the accuracy parameter for the numerical integration. The cut-o€ parameters should be set in such a way that if the numerical integration a€ords a bonding energy with a relative error of only 0.1% say, the calculation done with the cut-o€ radii should also return bonding energies with a relative error not larger than 0.1%. The setting of the cut-o€ parameters has been tested for the following molecules as test cases: C50H102, C100H202 and Pt(P(Ph)3)3CO. Each cut-o€ parameter was investigated separately to prevent spurious results from accidental cancellation of errors and also with a high numerical precision (relative error less than 10)4%) to avoid numerical noise. Our goal has been to obtain a setting for each cut-o€ parameter that gives a relative error in the bonding energy smaller than 0.1%. At the same time we have tried to achieve stability in the computational process in the sense that not just the ®nal bonding energy is stable to the required precision but also the SCF cycling proceeds in the same way, so that the same number of SCF cycles is needed to reach convergence. The latter requirement resulted for some of the cut-o€s in values that lead to a relative error in the bonding energy much less than 0.1%. The cut-o€ used in the ®tting of the density is determined by the radius of a basis function as calculated from the weight parameter w for neglecting the tail of the basis functions given in Eq. (2.4). The radii are used to decide which pairs of basis functions have negligible overlap. The neglecting of basis function pairs vlvm in the ®tting of the density leads to loss of charge (the contribution PlmSlm is neglected) and to a relative error in the bonding energy. A good choice for the cut-o€ criterion w turned out to be 0.05%. This gives a radius of 5.8 AÊ for the carbon atom. The loss of charge is almost negligible: less than 10)7%. Although this is a small charge de®cit, it turned out to be important to rescale the ®t coecients so the exact charge is retained. This rescaling of the number of electrons in the ®t density proved to be particularly bene®cial for stability in the (number of) SCF cycles. The relative error in the bonding energy due to this cut-o€ in the density ®t is very small: less than 10)6%. The calculation of the Coulomb and XC potential in the grid points uses two cut-o€ criteria. One is for the cut-o€ of the tails of the ®t functions, which is used for both the density evaluation and the short-range Coulomb potential. The weight function w of Eq. (2.4) is used as a criterion to determine the function radius. The other cut-o€ is for the multipolar part of the Coulomb potential, Eq. (4.5). The in¯uence of the two cut-o€ parameters on the relative error of the bonding energy and the convergence of the SCF was investigated independently. The cut-o€ criterion of the ®t functions was set at 0.005%, which implies a maximum radius of carbon of 7.4 AÊ. The relative error in the bond energy was about 0.01% for Pt(P(Ph)3)3CO, but for the alkanes it was much smaller: about 10)5%. The cut-o€ for the multipolar part was set to 10)5 a.u., i.e. a multipole term A Mlm = RkA †l‡1 is evaluated in all the points of a block B if A Mlm = ‰RkA †l‡1 in the point rk of the block nearest to atom A is larger than 10)5 a.u. In this case the error in the energy was less than 10)5%. For the building of the Fock matrix two cut-o€ criteria are used: (1) the threshold that determines if two basis functions have a suciently large overlap and therefore a Fock matrix element, analogous to the (independent) threshold used to decide which basis function pairs can be neglected in the density ®t; (2) the tail-function indicating the weight of a basis function over a block of points, which determines if the points of a block have to be used for the numerical integration of a Fock matrix element, Eq. (5.3). We will refer to the former cut-o€ as the overlap threshold. Again the two cut-o€ criteria are investigated separately. We investigated the error for the building of the Fock matrix by comparing the matrix of the unit operator, calculated numerically with either one of the cut-o€s applied, to the analytically calculated ``exact'' overlap matrix. The radii of the basis functions for the overlap threshold were determined with w [Eq. (2.4)] set to 0.05%. The maximum radius of the carbon atom in the building of the Fock matrix then equals 7.4 AÊ. The maximum absolute error in any overlap matrix element is less than 10)4 for this cut-o€. To determine if a block B has to be used for hs jvm i matrix element of the Fock matrix, the a hvl j^ threshold of Eq. (5.3b) was set to 10)4, which also resulted in a maximum absolute error of 10)4. We have ®nally ascertained that with the above mentioned setting of the cut-o€ criteria the relative error in the bond energy does not increase with the system size. Figure 10 displays the relative error of the bond energy with a precision of 10)6 (10)4%) in the numerical integrations of the Fock matrix elements, as well as with a numerical precision of 10)3. A numerical precision of 10)6 assures that we do not have errors due to numerical integration and cut-o€s that accidentally cancel each other. The results with numerical precision of 10)3 are displayed because this precision was used for the timings. We see that the cut-o€ criteria satisfy our requirement of a relative error less than 0.1% and of a stable relative error, independent of system size. 7 Summary In Tables 1±3 we present some overall timings for the old and new codes. As a genuinely three-dimensional 402 Fig. 10. The relative error in the bond energy (total energy minus sum of atomic energies) for the alkanes. The cases with a numerical integration precision of 10)3 and 10)6 are shown Table 1. Timing results (times in seconds) for Pt(P(Ph)3)3CO Calculation of ®t integrals Calculation of kinetic energy matrix Calculation of VCoulomb and VXC Calculation of v(rk) Set-up of Fock matrix Determination of ®t coecients Diagonalization Old New Speed-up 2892.1 4089.5 2582.7 759.9 1.1 5.4 926.1 162.3 2292.4 144.7 0.8 266.3 90.5 416.1 95.5 3.5 1.8 5.5 1.5 Table 2. Timing results (times in seconds) for C35H72 Calculation of ®t integrals Calculation of kinetic energy matrix Calculation of VCoulomb and VXC Calculation of v(rk) Set-up of Fock matrix Determination of ®t coecients Diagonalization Old New Speed-up 420.5 343.2 179.6 28.7 2.3 12.0 138.8 24.7 170.4 9.4 8.0 31.3 7.3 11.5 3.6 4.4 3.4 14.8 2.6 Table 3. Timing results (times in seconds) for C200H402 Calculation of ®t integrals Calculation of kinetic energy matrix Calculation of VCoulomb and VXC Calculation of v(rk) Set-up of Fock matrix Determination of ®t coecients Diagonalization Old New Speed-up 10342.2 62777.0 1280.8 205.0 8.1 306.2 3602.8 765.5 35277.0 459.0 53.5 352.9 44.5 74.8 39.6 10.2 17.2 471.6 11.6 system Pt(P(Ph)3)3CO is included, where three P(Ph)3 ligands are directly coordinated to the Pt atom, as is the CO ligand. The number of atoms (105) is equal to that in the chain-like alkane C35H72. The 602 atom alkane C200H402 is more typical for the size of system for which the present development is intended. The numbers of STO basis functions are 685, 459 and 2604 for these three systems, respectively. In the old situation, the calculation of the kinetic energy matrix, which is presented separately since it is carried out once, prior to the SCF cycles, and the setting up of the Fock matrix of the electronic potentials, are the most time-consuming parts of the calculation for Pt(P(Ph)3)3CO and in particular for C200H402. They are also the steps for which (by far) the largest speed-up factors are achieved. Although the total speed-up is already quite signi®cant for the 105-atom systems, the real bene®t of the better scaling is, of course, most apparent in the largest system. In all systems the relative weights of computational burden shift from the matrix evaluation to the calculation of the ®t integrals. This is notably the case for the Pt complex, which is related to the large number of basis functions on the Pt atom. It should be realized, however, that the calculation of the ®t integrals is only executed once, as is the calculation of the kinetic energy matrix, whereas the other steps are repeated on each cycle (in the so-called direct-SCF mode). Incidentally we note that the diagonalization times are still fairly modest and that the diagonalization is not yet a bottleneck, in spite of its N3 scaling. We conclude that signi®cant advances have been made towards an O(N) DFT method. The calculation of the ®t integrals is still relatively time-consuming but has been scaled down to linear scaling. The calculation of the Coulomb and XC potentials is, in the alkane chains, 4±6 times less expensive, but since its scaling has not been reduced to linear, it will eventually dominate and should be the next target for algorithmic development. Acknowledgements. The authors wish to thank the National Computing Facilities (NCF) of the Netherlands Foundation for Scienti®c Research (NWO) for the granting of computer time. 403 References 1. Baerends EJ, Ellis DE, Ros P (1973) Chem Phys 2: 41 2. Ellis DE, Adachi H, Averill FW (1976) Surf Sci 58: 497 3. RoseÂn A, Ellis DE, Adachi H, Averill FW (1976) J Chem Phys 65: 3629 4. Delley B, Ellis DE (1982) J Chem Phys 76: 1949 5. te Velde G, Baerends EJ (1991) Phys Rev B 44: 7888 6. Delley B (1990) J Chem Phys 92: 508 7. PeÂrez-Jorda JM, Yang W (1995) Chem Phys Lett 241: 469 8. Sambe H, Felton RH (1975) J Chem Phys 62: 1122 9. Kendall RA, FruÈchtl HA (1997) Theor Chim Acta 97: 158 10. Eichkorn K, Weigend F, Treutler O, Ahlrichs R (1997) Theor Chim Acta 97: 119 11. Rohklin V (1985) J Comput Phys 60: 187 12. Greengard L, Rokhlin V (1987) J Comput Phys 73: 325 13. Greengard L, Rohklin V (1991) Commun Pure Appl Math 44: 419 14. White CA, Johnson BG, Gill PMW, Head-Gordon M (1994) Chem Phys Lett 230: 8 15. Strain MC, Scuseria GE, Frisch MJ (1996) Science 51: 271 16. Kudin KN, Scuseria GE (1998) Chem Phys Lett 283: 61 17. Kutteh R, ApraÁ E, Nichols J (1995) Chem Phys Lett 238: 173 18. Adamson RD, Dombroski JP, Gill PMW (1996) Chem Phys Lett 254: 329 19. Dombroski JP, Taylor SW, Gill PMW (1996) J Phys Chem 100: 6272 20. Gill PMW (1997) Chem Phys Lett 270: 193 21. Goedecker S, Ivanov OV (1998) Sol Stat Comm 105: 665 22. PeÂrez-Jorda JM, Yang W (1997) J Chem Phys 107: 1218 23. Challacombe M, Scghwegler E, AlmloÈf J (1996) J Chem Phys 104: 4685 24. Challacombe M, Schwegler E (1997) J Chem Phys 106: 5526 25. Stratman RE, Scuseria GE, Frisch MJ (1996) Chem Phys Lett 257: 213 View publication stats 26. Li XP, Nunes RW, Vanderbilt D (1993) Phys Rev B 47: 1089 27. Galli G, Parrinello M (1992) Phys Rev Lett 69: 3547 28. HernaÂndez E, Gillan MJ (1995) Phys Rev B 51: 10157 29. HernaÂndez E, Gillan MJ, Goringe CM (1996) Phys Rev B 53: 7147 30. OrdejoÂn P, Drabold DA, Martin RM, Grumbach MP (1995) Phys Rev B 51: 1456 31. SaÂnchez-Portal D, OrdejoÂn P, Artacho E, Soler JM (1997) Int J Quantum Chem 65: 453 32. Stewart JP (1996) Int J Quantum Chem 58: 133 33. Xu CH, Scuseria GE (1996) Chem Phys Lett 262: 219 34. Millam JM, Scuseria GE (1997) J Chem Phys 106: 5569 35. Daniels AD, Millam JM, Scuseria GE (1997) J Chem Phys 107: 425 36. Dunlap BI, Connolly JWD, Sabin JR (1979) J Chem Phys 71: 3396 37. Gallant RT, St-Amant A (1996) Chem Phys Lett 256: 569 38. Fonseca Guerra C, Visser O, Snijders JG, te Velde G, Baerends EJ (1995) In: Clementi E, Corongiu G (eds) Methods and techniques for computational chemistry. STEF, Cagliari, p 305 39. Treutler O, Ahlrichs R (1995) J Chem Phys 102: 346 40. Frisch MJ, Trucks GW, Schlegel HB, Gill PMW, Johnson BG, Robb MA, Cheeseman JR, Keith T, Petersson GA, Montgomery JA, Raghavachari K, Al-Laham MA, Zakrzewski VG, Ortiz JV, Foresman JB, Cioslowski J, Stefanov BB, Nanayakkara A, Challacombe M, Peng CY, Ayala PY, Chen W, Wong MW, Andres JL, Replogle ES, Gomperts G, Martin RL, Fox DJ, Binkley JS, Defrees DJ, Baker J, Stewart JP, Head-Gordon M, Gonzalez C, Pople JA (1995) GAUSSIAN 94, Gaussian, Pittsburgh, Pa 41. Boerrigter PM, te Velde G, Baerends EJ (1988) Int J Quantum Chem 33: 87 42. te Velde G, Baerends EJ (1992) J Comput Phys 99: 84 43. Press WH, Flannery BP, Teukolsky SA, Vetterling WT (1986) Numerical recipes. Cambridge University Press, Cambridge