3DReact: Geometric deep learning
for chemical reactions
Abstract
Geometric deep learning models, which incorporate the relevant molecular symmetries within the neural network architecture, have considerably improved the accuracy and data efficiency of predictions of molecular properties. Building on this success, we introduce 3DReact, a geometric deep learning model to predict reaction properties from three-dimensional structures of reactants and products. We demonstrate that the invariant version of the model is sufficient for existing reaction datasets. We illustrate its competitive performance on the prediction of activation barriers on the GDB7-22-TS, Cyclo-23-TS and Proparg-21-TS datasets in different atom-mapping regimes. We show that, compared to existing models for reaction property prediction, 3DReact offers a flexible framework that exploits atom-mapping information, if available, as well as geometries of reactants and products (in an invariant or equivariant fashion). Accordingly, it performs systematically well across different datasets, atom-mapping regimes, as well as both interpolation and extrapolation tasks.
keywords:
machine learning, equivariant neural networks, geometric deep learning, activation energies, chemical reactionsThese authors contributed equally to this work. \alsoaffiliationNational Center for Competence in Research – Catalysis (NCCR-Catalysis), École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland \altaffiliationThese authors contributed equally to this work. \alsoaffiliationLearning & Adaptive Systems Group, Department of Computer Science, ETH Zurich, 8092 Zurich, Switzerland \alsoaffiliationLearning & Adaptive Systems Group, Department of Computer Science, ETH Zurich, 8092 Zurich, Switzerland \alsoaffiliationNational Center for Competence in Research – Catalysis (NCCR-Catalysis), École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland \alsoaffiliationLearning & Adaptive Systems Group, Department of Computer Science, ETH Zurich, 8092 Zurich, Switzerland \alsoaffiliationNational Center for Competence in Research – Catalysis (NCCR-Catalysis), École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
1 Introduction
Physics-inspired representations that take as input the three-dimensional structure1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13 (as well as, in some cases, electronic structure14, 15, 16, 17) of molecules and transform it into a fixed-length vector, while respecting known physical laws, have a rich history in molecular property prediction.2, 8, 4, 6, 7, 13, 18, 19, 20, 5, 1, 21, 3, 9, 10, 22, 23, 24, 25, 12, 26, 27, 28, 29, 30 Common desiderata31, 32, 33, 34 for high-performing representations are (i) smoothness, (ii) encoding of the appropriate symmetries to permutations, rotations and translations,24, 35 (iii) completeness and (iv) additivity to allow for extrapolation to larger systems. Such fingerprints,2, 4, 3, 24, 6, 7, 8, 5, 11, 12, 13 being rooted in fundamental principles, are designed to be property-independent: a single representation can be constructed for a molecule to predict any quantum-chemical target. This is analogous to the molecular Hamiltonian, which specifies the energy and all other properties of a system as a function of atoms’ types and positions in three-dimensional space (assuming the molecules are charge neutral and singlets). These representations are typically used in combination with kernel models due to their data efficiency, ability to deal with high-dimensional feature vectors, and interpretability of the similarity kernel.2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 31, 32, 33 Early works showed that combining such representations2, 4, 6, 8, 36 with simple feed-forward neural networks instead of kernel models did not necessarily led to better performance.37, 38
More recently, end-to-end neural networks have been proposed that learn the representation as part of the (supervised) training process,39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61 based on similar principles to the aforementioned physics-inspired representations: they take as input a three-dimensional structure, as well as in some cases charge and spin information.46, 51, 52, 53 The network may be invariant or equivariant to rotations and translations of the input molecules. The former is typically achieved by operating on distances between atoms,39, 40, 42 and the latter by operating on relative position vectors and angular information processed by rotationally-equivariant convolutional layers.46, 49, 50, 62, 59, 45, 43, 41, 44, 48, 54, 55, 56, 57, 58 Equivariant models are naturally suited to predict vectorial44, 49, 62, 43, 59, 48, 45 or higher order tensorial59, 52, 54, 55, 63 properties. They have also been demonstrated to exhibit improved data efficiency and generalization capabilities compared to their invariant counterparts on predictions of scalar properties,43 albeit at a higher computational cost. Nevertheless, given an expressive enough architecture (i.e., using higher-order messages41, 62, 64, 65, 56, 66, 67 and/or enough convolutional layers43, 52, 56), invariant models are sufficient for many property prediction tasks.56
Despite these advances for molecular property prediction, the prediction of computed reaction properties (principally, reaction barriers68, 36, 38, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82) is still in its infancy.83 Machine learning approaches span from utilizing simple two-dimensional fingerprints of reaction components84, 85 (reactants and products) to physical-organic descriptors86, 87, 76, 88, 89, 75, 90, 91, 92, 93, 94, 95, 80, 96, 97, 98, 82, or electronic structure-inspired features99, to transformer models100, 101 adapted for regression,102 and 2D graph-based approaches71, 103, 70, 81. The latter, particularly the ChemProp model,71, 103 are often best-in-class in predicting reaction properties.103 It has been shown38 that these models achieve their impressive performance by exploiting atom-mapping information,104, 105, 106, 107 which provide information analogous to the reaction mechanism.
Another category of reaction fingerprints arises from discretization of physically-inspired functions2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13 constructed using a cheap estimate of the transition state (TS) structure73 or rather the structures of the reaction components36, 69, 74 The SLATMd representation69, 36 in particular has been shown38 to yield accurate predictions of reaction barriers, particularly for datasets108, 69 relying on subtle changes in the geometry of reactants and/or products. End-to-end models based on three-dimensional structures of reactants and products have also recently emerged.99, 72, 109 In a different vein, several works110, 111, 112, 113, 114, 115 aim to directly predict the TS structure, which together with the reactant structure gives the reaction barrier. These approaches lie outside the scope of the property-prediction focus here.
Due to the diversity of challenges posed by different reaction datasets, neither atom-mapping-based models nor 3D-geometry-based models achieve consistently better performance on reaction property prediction tasks.38 To date, no model has been proposed that can incorporate both chemical (atom-maps) and physical (geometry) priors. To address this gap, we introduce 3DReact, a geometric deep learning model that encodes both the three-dimensional structures of reactants and products as well as atom-mapping information or proxies thereof to predict properties of chemical reactions (showcased here for activation energies).
We demonstrate the performance of 3DReact on three datasets of reaction barriers: GDB7-22-TS,116 Cyclo-23-TS,117 and Proparg-21-TS.108, 69 As discussed in previous works,38 these datasets present a myriad of challenges for ML models, from the dependence on chemical information116 to the distinction of subtle changes in configurations.108, 69 We show that, compared to state-of-the-art models for reaction property prediction,103, 36 3DReact offers accurate and reliable performance across different datasets as well as atom-mapping regimes, reduced dependence on the quality of three-dimensional geometries, and stable extrapolation behavior.
2 Architecture
3DReact is built from -equivariant convolutional networks over point clouds as implemented in e3nn.118 Specifically, we use the tensor field network architecture47 for molecular components as in Corso et al. 119 While the architecture is equivariant by default, it can easily be made invariant (vide infra). The geometries of molecules constituting reactants and products of each reaction are passed through separate channels, detailed in Section 2.1. They are then combined to eventually predict a reaction property, such as the activation energy, as detailed in Section 2.2. The overall architecture is summarized in Figure 1.
2.1 Symmetry-adapted molecular channels
A molecule with atoms is represented as a distance-based graph where nodes describe atoms and edges describe bonds. Instead of explicitly using connectivity information, the “bonds” of atom are formed with all the neighboring atoms within the cutoff . Initial scalar bond (edge) features between atoms and , as well as spherical harmonics filters , are computed from internal coordinates, as detailed in equations LABEL:S-eq:initial-edge-features-start–LABEL:S-eq:initial-edge-features-end. The atom (node) features are initialized with cheminformatics descriptors computed with RDKit.120 These include atomic number, chirality tag (unspecified, tetrahedral, or other, including octahedral, square planar, allene-type), number of directly-bonded neighbors, number of rings, implicit valence, formal charge, number of attached hydrogens, number of unpaired electrons, hybridization, aromaticity, and presence in rings of specified sizes from to . This choice is inspired by EquiBind121 and DiffDock119 and is in line with the improved 72 features used for 2D-based methods.
The initial node and edge features pass through embeddings to give and respectively, the former are then updated by equivariant convolutional layers. Each layer is a fully-connected weighted tensor product, as defined in e3nn118. Equations LABEL:S-eq:convs-start–LABEL:S-eq:convs-end describe the equivariant operations performed by the network (see Section LABEL:S-sec:molecular_channels for mathematical details). The network with equivariant molecular components as described is referred to as EquiReact, where its invariant counterpart InReact uses only the (scalar) spherical harmonics to construct the convolution filters. The output of the molecular channels is the local molecular representation corresponding to atoms associated with features. Depending on the sum_mode hyperparameter, it is constructed either from the node features (node mode) or both node and edge features (both mode).
2.2 Combining molecules for reactions
Once atom-wise molecular representations are learned for reactant and product molecules, they must be combined to form a reaction representation .
For certain datasets, atom-mapping information is available, which correlates individual atoms in reactant molecules to individual atoms in product molecules according to the reaction mechanism. In this setting, the representations and are re-ordered such that the local representation vectors correspond to the same atom in reactants and products. Depending on the combine_mode hyperparameter, either a difference is taken between products’ and reactants’ atom representations, or they are summed, averaged, or passed through a multilayer perceptron (MLP). Thus, the local reaction representation consists of vectors reflecting how the environment changes in the reaction for each atom. We will refer to this variant of the model, which uses atom-mapping information, as 3DReactM. While the current model is unable to treat unbalanced reactions (where there are additional atoms on the left- or right-hand side of the reaction equation), its modification in the spirit of ChemProp71, 103 is straightforward.
With the reaction representation at hand, predictions are made in the so-called vector or energy modes. In vector mode, the atomic vectors constituting the reaction representation are initially passed through an MLP to introduce nonlinearity and then summed up to form a global reaction representation vector . The target is then learned using an MLP. This model pipeline is illustrated in Figure 2a. In energy mode, on the other hand, the local reaction representations are used to learn atomic contributions to the target (Figure 2b). While performing worse in general, in some cases this mode yields the best predictions (see Section 3.2.1).
Atom-mapping provides static information, analogous to a reaction mechanism, to link atoms in reactants to atoms in products. While highly informative, and thought to be critical to the performance of 2D-graph-based models, 71, 103, 72, 70, 81 accurate atom-maps are not available for all reaction datasets.38, 104, 105 To circumvent the need for atom-mapping, but mimic its role in exchanging information between reactants and products, other approaches dynamically (i.e., in a learnable fashion) exchange information between molecular representations. For example, RXNMapper107 is a neural network that learns atom-mappings within the larger self-supervised task of predicting the randomly masked parts in a reaction sequence, using one head of a multi-head transformer architecture. EquiBind,121 a neural network that predicts the rotation and translation of a ligand to a protein, contains a cross-attention module between ligand and receptor. The latter inspires our surrogate for atom-mapping: 3DReactX also uses cross-attention between reactants and products to link their atom indices (Section LABEL:S-sec:cross). The re-ordered representations of reactants and products are combined as for the case of atom-mapped reactions (Figures 2a and 2b). We note that other algorithms could also have been used to exchange information between reactants and products, for example in the form of message passing or equivariant attention.57, 122
3DReact also has a simple “no mapping” variant, called 3DReactS, which does not rely on atom-mapping, nor a surrogate cross-attention module. In vector mode (Figure 2c), the atomic components of molecular representations and are summed up to obtain global vectors and , respectively. Then they are combined, according to the combine_mode parameter, to form a reaction vector which is used to learn the target with an MLP. In energy mode (Figure 2d) individual atomic representations are used to learn their contributions to the quasi-molecular energies of reactants and products, which are later combined (according to the combine_mode parameter) to predict the target. In most cases, this simpler model out-performs 3DReactX (vide infra).
3 Results and Discussion
The performance of 3DReact is reported for three diverse datasets (the GDB7-22-TS,116 Cyclo-23-TS 117 and Proparg-21-TS 108, 69) using both random and extrapolative splits. For details on the datasets, refer to Section 5.1. For details on the extrapolation splits, see Section 5.2.
Models are run in three atom-mapping regimes: (i) with high-quality maps (“True”) derived from the TS structures or heuristic rules;117, 123, 116, 71, 106 (ii) with atom-maps obtained using the open-source RXNMapper107 (“RXNMapper”); and (iii) without any atom-mapping information at all (“None”). As discussed in recent work,124, 38 previously developed graph-based models for reaction property prediction71, 72, 70, 96, 97 including ChemProp71, 103 reported prediction errors only in the “True” atom-mapping regime. The “RXNMapper” regime is important for cases where the reaction mechanism is not known and atom-mapping using heuristic rules is impossible. The “None” regime is critical for all chemistry that falls outside the realm of organic chemistry captured in the patents125 that RXNMapper107 is trained on.
The atom-mapping-based model 3DReactM is used in the “True” and “RXNMapper” regimes. In the “None” regime, 3DReactX and 3DReactS were tested. 3DReactS consistently outperformed 3DReactX, so we include only 3DReactS and refer the reader to Section LABEL:S-sec:cross for their comparison.
3.1 Equivariance vs. invariance
Table 1 compares the relative performance of the invariant (InReact) and the equivariant (EquiReact) implementations of 3DReact with the learning curves of the two models presented in Figure 3. Previous studies43, 56 demonstrated that the equivariant models showed superior extrapolation capabilities on predictions of energies and forces, as well as steeper and shifted learning curves in force prediction tasks. Instead, we find that InReact and EquiReact are practically indistinguishable for the present chemical reaction tasks.
Dataset (property, units) | Atom-mapping regime | InReact | EquiReact |
Random splits | |||
GDB7-22-TS (, kcal/mol) | True | ||
RXNMapper | |||
None | |||
Cyclo-23-TS (, kcal/mol) | True | ||
RXNMapper | |||
None | |||
Proparg-21-TS (, kcal/mol) | True | ||
None | |||
Scaffold splits | |||
GDB7-22-TS (, kcal/mol) | True | ||
RXNMapper | |||
None | |||
Cyclo-23-TS (, kcal/mol) | True | ||
RXNMapper | |||
None | |||
Proparg-21-TS (, kcal/mol) | True | ||
None |
We find that the datasets studied herein do not benefit from the inclusion of equivariant features for molecules. Yet, Figure 4 illustrates that a hypothetical reaction involving conversion between homometric structures of \ceHe4,126 which is mostly characterized by angle changes, clearly benefits from equivariant molecular features. In the reactant (Figure 4a), all atoms are identical and lead to the same learned representation. In the product (Figure 4b), only atoms B2 and B3 have identical environments, different from A1–4. Atoms B2–3 have the same distances to the three neighbors, as in A1–4. Thus, InReact, which uses only interatomic distances, yields very close representations for these atoms (Figure 4c). Still, in each convolutional layer, atoms B2 and B3 receive information from B1 and B4, and with increase of the difference in the representations of B2–3 and A1–A4 becomes more apparent (Figure 4d). However, with smaller radial cutoff , atoms B1–B3 and A1–4 become indistinguishable for any number of layers (Figure 4e). On the other hand, EquiReact, which uses explicit angular information from the spherical harmonics filters, clearly distinguishes all non-equivalent atoms in both cases already for (Figure 4f,g).
While this is a toy example, it illustrates that transformations consisting of changes in angles rather than in bond lengths are better described using EquiReact. In general, the currently available reaction datasets do not pose sufficient challenge to allow distinguishing InReact and EquiReact. For the datasets studied in this work, InReact is sufficient and is the model variation used throughout as 3DReact.
3.2 Benchmark studies
3DReact is compared to previously best baseline models:38 ChemProp,71, 103 a graph neural network that uses atom-mapped SMILES to construct a CGR, and the 3D-structure-based SLATM8 fingerprint adapted to reactions by taking the difference between product and reactant fingerprints (SLATMd),36 combined with KRR models (SLATMd+KRR).
Note that both 3DReact and ChemProp are run without explicit H atoms, for two reasons. First, hydrogen atoms are not always mapped in the “True” and “RXNMapper” regimes, since they are usually implicit in SMILES strings. Second, there is no consistent improvement in including H atoms in the models (Table LABEL:S-tab:models-with-withoutH). SLATMd, built directly from molecular coordinates without using SMILES strings, does however incorporate H atoms by default. For further discussion refer to Section LABEL:S-sec:hydrogens.
3.2.1 Random splits
Performance as measured in mean absolute errors (MAEs) is illustrated in Table 2 for random splits of each dataset, demonstrating the models’ interpolative capabilities. For the equivalent results with root mean squared errors (RMSEs), consult Section LABEL:S-sec:rmse.
Dataset (property, units) | Atom-mapping regime | ChemProp | SLATMd+KRR | 3DReact |
GDB7-22-TS (, kcal/mol) | True | — | ||
RXNMapper | — | |||
None | ||||
Cyclo-23-TS (, kcal/mol) | True | — | ||
RXNMapper | — | |||
None | ||||
Proparg-21-TS (, kcal/mol) | True | — | ||
None |
The GDB7-22-TS dataset is distinct from the other two in that it includes variations in the reaction class (and mechanism), thereby showing a greater dependence on the existence and quality of atom-mapping information in the models. It has already been observed38 for ChemProp that there is stark hierarchy in the predictions from the “True” to “RXNMapper” to “None” regimes.
In the “True” regime, 3DReact does not improve predictive capabilities over the ChemProp model for the GDB7-22-TS set. This points to the importance of the chemical diversity in this dataset, where knowledge of the reaction mechanism (in the form of atom-maps) is sufficient information to predict the reaction barriers without information about the geometries of reactants and products. However, as previously discussed,38 “True” maps are an unrealistic scenario for most datasets. Moving to the “RXNMapper” regime, 3DReact and ChemProp already agree within standard deviations. This highlights that for practical-quality maps, 3DReact is amongst the best models for this dataset. In the “None” regime, 3DReact outperforms ChemProp by more than .
SLATMd+KRR results in similar performance to 3DReact for the GDB7-22-TS set. The SLATMd representation also constructs features from 3D coordinates of the reactants and products using invariant functions, and is therefore more fundamentally similar to 3DReact than ChemProp. Nevertheless, since 3DReact allows for the inclusion of atom-mapping information, predictions are improved in the mapped regimes compared to SLATMd+KRR, which operates in the “None” regime only.
In summary, for the chemically diverse GDB7-22-TS set, while SLATMd allows for good performance in the “None” regime, and ChemProp in the “True” and “RXNMapper” regimes, since 3DReact can incorporate both atom-mapping information and 3D structure information, the model achieves robust performance in all three regimes, with the predicted MAEs ranging from .
The Cyclo-23-TS 117 dataset contains a single reaction class and has been previously illustrated38 to show less dependence on the quality of atom-mapping than the GDB7-22-TS. For this set, 3DReact outperforms or matches the other models in all three regimes. This illustrates that a model based purely on geometry information of reactants and products, without any chemical information in the form of atom-mapping or surrogates thereof, can allow for accurate reaction property prediction. It is worth noting that atom-mapping does not improve predictions at all, i.e. there is no improvement from “None” to “RXNMapper” to “True”, even for the ChemProp model. This points to the different nature of this dataset compared to the GDB7-22-TS.
The best model is obtained with 3DReactS in the energy mode (Figure 2d). As outlined in Section 2.2, in energy mode an energy contribution is learned for reactants’ and products’ atoms separately. In the original publication,117 Stuyver et al. illustrate that the activation barriers () correlate linearly with the reaction energy (). Since is the difference between products’ and reactants’ energies, the energy mode is the best choice for a model learning the reaction energy, and in the case of this dataset, for too, due to its linear correlation with .
Compared to SLATMd+KRR, 3DReact in the “None” regime results in lower prediction errors for this set, illustrating that despite both models using similar information, an end-to-end model can allow for improved predictions.
The Proparg-21-TS 108, 69 is a small dataset for neural network standards (753 points) and therefore constitutes a challenge for the data efficiency of our model. Like the Cyclo-23-TS set, it consists of a single reaction class, i.e. enantioselective propargylation of benzaldehyde. Since the enantioselectivity is related to the barrier through an exponential relationship, it is critical to predict the barrier accurately ( ).69 The “RXNMapper” regime is not available since RXNMapper cannot atom-map the reaction SMILES of this set.
In the other regimes, 3D-structure-based models lead to the best results, outperforming ChemProp by a large margin. Proparg-21-TS is particularly hard for 2D-based models38 since it contains molecules of different stereochemistry but the same SMILES strings. Again trained on a single-reaction class dataset, models do not benefit from being provided the “obvious” chemical information: including true atom-maps does not decrease the error. Competing only in the “None” regime, 3DReact does not allow for a performance improvement compared to SLATMd+KRR. Given the small size of the dataset, it is already a demonstration of data efficiency that the deep-learning model matches the prediction errors of the kernel model. Unlike for Nequip43 however, the data efficiency here is not due to the equivariant molecular components (Section 3.1).
The three datasets illustrate the benefits of the flexibility of 3DReact: depending on the datasets’ particular challenges, the model exploits the available information to yield the best-performing model in almost all cases. Since the model settings (such as vector or energy mode choice) are specified as hyperparameters, the optimized version of 3DReact can emerge with minimal user intervention.
3.2.2 Extrapolative splits
Figure 5 illustrates model performance for extrapolative splits (based on scaffolds, molecular size of reactants/products, and barrier magnitude, detailed in Section 5.2). These different types of extrapolative splits are necessarily more difficult than random splits, as demonstrated by higher MAEs in Figure 5. The relative performance of the models is largely maintained in the three different extrapolation regimes compared to the interpolation regime presented in Table 2.
Bemis–Murcko scaffold127 splitting clusters molecules (reactants for GDB7-22-TS and Proparg-21-TS, products for Cyclo-23-TS) based on ring systems. Test molecules may therefore appear “novel” from the point of view of the reaction graph, but will still feature distances and angles close to what the model has seen during training. Similarly for size-based splits, since there is no correlation between reactant/product size and reaction barriers, using distance information allows for stable predictions on extrapolation. Property-based splits are more challenging than the other two. For the Cyclo-23-TS and Proparg-21-TS sets, 3DReact still offers respectable errors, lower than those of the other models. For the GDB7-22-TS set however, all models result in unreasonable MAEs over . This points to the particular challenges of the GDB7-22-TS set and suggests an avenue for further developments of ML models for extrapolative tasks.81
Again in contrast to previous works that suggested equivariant models might be better at extrapolation tasks,43, 56 here we find that 3DReact offers stable extrapolation performance (particularly for size- and scaffold-based splits), but not necessarily improved extrapolation behavior compared to 2D-graph based models. This points to the different challenges in reaction property prediction. Nevertheless, Figure 5 illustrates that 3DReact is a consistently robust model for the three datasets when moving from interpolation to extrapolation regimes.
3.3 Model behavior
Since the GDB7-22-TS set has the largest chemical diversity amongst the datasets explored, studying 3DReact and baseline models SLATMd and ChemProp on this dataset best captures the different chemical interpretation provided by these models.
Figure 6 compares the (latent) representations of 3DReact “True”, ChemProp “True” and SLATMd using t-SNE128 maps. In the upper panel, we find that the quality of the correlation between the representations and the target property are aligned with the relative performance of the models (Table 2). ChemProp and 3DReact show a smooth transition of the target property, whereas the map of SLATMd does not have a clear structure. The lower panel shows the correlation of the representations with the five most common reaction types defined by bond breaking and formation (see Section 5.4). ChemProp, as a chemically-inspired model, illustrates clear clusters in the reaction type. While SLATMd is a geometry-based model, the binning structure used to create the representation8, 36 results in a clear correlation with the reaction types, since e.g. the pairwise bins naturally cluster features such as \ceC–\ceH bond formation or breaking. 3DReact shows the least distinct “chemical” clustering, due to the interplay of geometry and mapping information exploited in the representation.
Figure 7 shows the error distribution of predictions belonging to the same reaction classes for 3DReact “True”. 3DReact performs universally well across the different reaction types, with consistently low errors and relatively small error spread. The reactions for which the model has higher mean errors and spread (H–H,C–H,C–H (green)) correspond to those involving C–H and H–H features. Since the model is trained without explicit H nodes in the graph, features associated with X–H bonds are included implicitly in the model. Capturing H–H bond changes will be the most challenging as these will be the least explicitly described, occurring only as initial features for neighboring nodes. Since C is the most frequently occurring element in various different configurations, capturing all the C–H features is more challenging than the O–H features for example, which will be more similar to one another. The equivalent plot for the model trained with explicit H nodes is shown in Figure LABEL:S-fig:box_plot_with_H, illustrating that the error spread reduces for the reaction types involving C–H and H–H features. Note that 3DReact without explicit Hs still leads to performance comparable to the variant with explicit Hs (Section LABEL:S-sec:hydrogens).
3.4 Geometry quality
In order to illustrate that 3DReact does not require high-quality molecular structures to be used in an out-of-sample scenario, we train and test a model using lower-quality GFN2-xTB129 (xTB) geometries to predict higher-level barriers (CCSD(T)-F12a/cc-pVDZ-F12//B97X-D3/def2-TZVP for GDB7-22-TS, B3LYP-D3(BJ)/def2-TZVP//B3LYP-D3(BJ)/def2-SVP for Cyclo-23-TS and B97D/TZV(2p,2d) for Proparg-21-TS). The results are illustrated in Figure 8 for the three datasets with DFT and xTB geometries, and compared to the SLATMd+KRR model in the same settings. 3DReact benefits from a lower sensitivity to the geometry quality compared to the pre-designed representation SLATMd combined with KRR, across the three datasets.
For the GDB7-22-TS set, there is a negligible difference in model performance moving from DFT to xTB geometries. The xTB geometries are a good proxy for the DFT ones here, since this set consists of small, charge-neutral organic molecules, which are largely well-described by semi-empirical methods. For the Cyclo-23-TS set, while the molecules are still organic, they are larger than those in the GDB7-22-TS set, and there is a greater divergence between the GFN2-xTB and DFT geometries, resulting in a larger deterioration with these structures. Figure LABEL:S-fig:cyclo_rmsd demonstrates that when using the model trained on xTB geometries, barrier predictions for molecules with poorer geometries (i.e., higher RMSD of xTB vs. DFT geometries) are not necessarily worse than those on molecules with better geometries. Instead, there is a consistent decline in model performance when training with xTB geometries and predicting DFT barriers.
The Proparg-21-TS set is the most complex of the three for GFN2-xTB, since these systems with charged organosilicon compounds differ considerably from those used to parameterize semi-empirical methods or force fields. As described in Section 5.4, unlike for the other datasets where we generate an initial structure from SMILES using force fields, for this set it is impossible and we instead generate xTB geometries from the DFT ones. While this is not a feasible geometry generation pipeline for out-of-sample predictions, it still demonstrates how different methods perform with high and low-quality geometries. Here, we see that 3DReact is less sensitive than SLATMd+KRR and the variation trained with lower quality geometries still offers competitive errors ( kcal/mol for the “None” model).
4 Conclusions
The accurate and reliable prediction of reaction barriers across diverse sets of chemical reactions remains an open challenge in computational chemistry. We contribute to this domain by introducing 3DReact, a geometric deep learning model constructed from the 3D coordinates of reactants and products. We show that the invariant model (vs. the equivariant version) is already sufficient for currently available reaction datasets. Existing models ChemProp and SLATMd+KRR exhibit impressive performance for atom-mapped, chemically diverse datasets and stereochemistry-sensitive datasets, respectively. 3DReact offers a hybrid model that can optionally incorporate mapping information alongside geometries, enabling robust performance across different dataset types and atom-mapping regimes. 3DReact also allows for a reduced sensitivity to the training geometry quality (i.e., xTB vs. DFT level) compared to SLATMd+KRR. Predictions are stable both when moving to molecular size- or scaffold-based splits. Altogether, 3DReact presents a flexible framework for accurate prediction of activation barriers across chemical reaction datasets. Despite the proposed developments, challenges remain for ML predictions of energy barriers, particularly in integrating them within experimental settings. This work is a step toward their reliable application.
5 Methods
5.1 Datasets
We test 3DReact on three datasets of reaction barriers previously used to benchmark reaction representations.38 The term “reaction barrier”, used interchangeably with “activation energy” and “activation barrier” is the energy difference between the energy of the optimized TS and the optimized reactants. Note that depending on the dataset, some provide purely electronic energies (labelled ) and others — Gibbs free energies (labelled ). In all datasets, optimized three-dimensional structures of reactants and products are provided, which are used to train models and make predictions. The activation barrier is not a direct function of these structures, but using the TS structure to make predictions removes the utility of the ML models vs. direct computation of the TS. Thus we use an implicit interpolation of reactants’ and products’ structures as a proxy for the TS as in previous works.36, 69, 38
The GDB7-22-TS 116 dataset consists of close to diverse organic reactions automatically constructed from the GDB7 dataset130, 131, 132 using the growing string method133 along with corresponding energy barriers () computed at the CCSD(T)-F12a/cc-pVDZ-F12//B97X-D3/def2-TZVP level. The dataset provides atom-mapped SMILES, with “True” maps derived from the transition state. For reactions out of , one of the products’ SMILES represents a molecule different from the xyz structure. These reactions were therefore excluded from the dataset, leading to a modified GDB7-22-TS set used here.
While there are no pre-defined classes for all the reactions in the GDB7-20-TS123 or GDB7-22-TS 116 sets, Grambow et al. 70 split the dataset into reactions undergoing certain bond changes: for example, the most common type was breaking of a C–H bond (C–H) and a C–C bond (C–C) in the reactants and formation of a C–H bond (C–H) in the products, giving the reaction type signature C–H,C–C,C–H. Here, we extract similar reaction types by comparing the connectivity matrices from atom-mapped reaction SMILES of reactants and products (ignoring bond orders). The most abundant reaction types in the dataset are C–H,C–C,C–H (1667 reactions), H–N,C–H (633), C–H,C–H (619), H–O,C–H,C–O (599) and H–H,C–H,C–H (517).
The original Cyclo-23-TS 117 dataset encompasses profiles for cycloaddition reactions with activation free energies () computed at the B3LYP-D3(BJ)/def2-TZVP//B3LYP-D3(BJ)/def2-SVP level in water using the SMD continuum solvation model. The dataset provides atom-mapped SMILES with “True” maps for heavy atoms derived from either the transition state structure or heuristic rules. For the regime with explicit hydrogen atoms, we atom-mapped the xyz files by matching the reactants, given in two separate files, to the provided transition state structure, which closely resembles the two reactants and has the same atom order as in the product. This was done with a labelled graph matching algorithm as implemented in NetworkX.134, 135 The algorithm is unaware of chirality, double-bond stereochemistry or conformations, and thus may lead to not exactly correct atom-mappings. We also found that in four reactions, the product SMILES and xyz files depict different species, thus the set was reduced to reactions.
The Proparg-21-TS dataset108, 69 contains 753 structures of intermediates before and after the enantioselective transition state of benzaldehyde propargylation, with activation energies () computed at the B97D/TZV(2p,2d) level. SMILES strings (“fragment-based” SMILES) and “True” atom-maps are not provided with the original dataset, these are taken from Ref. 38.
RXNMapper107-mapped versions of GDB7-22-TS and Cyclo-23-TS were obtained with the python package rxnmapper (version 0.3.0), using the default settings. The Proparg-21-TS set cannot be mapped, because the underlying libraries cannot process its SMILES string.38 Since RXNMapper sorts molecules in case of multiple reactants and/or products, which would complicate SMILES–xyz matching (see Section 5.3 below), we used a locally modified version that does not change the molecule order (the patch file is provided in the project repository at https://github.com/lcmd-epfl/EquiReact/tree/9d78892fe/data-curation/rxnmapper).
5.2 Data splits
For each dataset and splitting type, identical data splits were used for all the models compared. In each case, ten different splits are constructed with different random seeds.
Three different types of extrapolation split were used: scaffold-, molecular size- and property-based. Scaffold splitting136, 137 clusters molecules based on their 2D backbones (such as Bemis–Murcko scaffolds127) and ensures that the clusters (scaffolds) belonging to the training, validation, and test sets do not overlap. Size-based splitting organizes the splits such that the reactions of the smallest molecules are in the training set and the reactions of the largest molecules are in validation and test. With property-based splits, one trains on reactions with higher barriers and predicts on reactions with lower barriers. This choice of splits reflects the relevant out-of-sample cases: larger molecules are more expensive to compute, and reactions with smaller barriers are desirable. Size- and property-based splits can also be organized in reverse order, where larger molecules are in the train set and smaller in test, or reactions with lower barrier in train and higher barrier in test.
For molecular size- and scaffold-based splits, the initial data shuffling affects the composition of the datasets. The non-zero standard deviations for property-based splits with neural networks arise from different organization of the datapoints into batches.
5.3 Matching SMILES strings to xyz geometries
3DReact makes use of both the graph structure of a molecule (as provided in the SMILES string) and the three-dimensional structure (in the xyz). The atoms in the graph are associated with the atomic coordinates provided in the xyz file. Thanks to the way the GDB7-22-TS dataset116 was generated, the atomic coordinates can be easily matched to SMILES which in turn allows to atom-map reactants to products. However, we also tested RXNMapper-mapped SMILES which do not respect the same constraints. Therefore, for consistency, we use a SMILES–xyz matching procedure detailed below.
We construct molecular graphs from xyz using covalent radii and matched them to RDKit120 molecular graphs obtained from SMILES with a labelled graph matching algorithm as implemented in NetworkX.134, 135 This procedure is however unaware of chirality and double-bond stereochemistry, thus some of the matches might be incorrect. Still, it provides a flexible method that can be applied to any dataset consisting of SMILES strings and xyz files.
The same procedure was applied to the Cyclo-23-TS dataset in the few cases when the canonical SMILES have a different atom ordering than xyz.
5.4 xTB geometry generation
For the GDB7-22-TS and Cyclo-23-TS datasets, the starting structures were generated from SMILES using the distance-geometry embedding implemented in RDKit120 with the srETKDGv3 settings.138 Ten conformations were produced per molecule, which were then energy-ranked with the MMFF94 implementation139 in RDKit, defaulting to UFF in case of missing parameters. The lowest energy conformer was retained. For the Proparg-21-TS set, the original B97D/TZV(2p,2d) geometries were used as a starting point, because the stereochemical and conformational diversity of this set cannot be completely encoded with SMILES. Therefore MMFF94 will fail to generate an initial geometry from SMILES.
For all the sets, the starting structures were optimized at the GFN2-xTB semi-empirical level of theory129 at the “loose” convergence level for a maximum of 1000 iterations using xTB140 version 6.2 RC2. For reactions of the GDB7-22-TS set and reactions of the Cyclo-23-TS set, at least one of the participating molecules either could not converge to any reasonable configuration or converged to a structure not matching the SMILES. These reactions were excluded from the geometry quality tests (Section 3.4).
5.5 Model training
3DReact was trained using the Adam optimizer 141 with initial learning rate and weight decay parameters as hyperparameters. The learning rate was reduced by 40% after epochs of no improvement in the validation MAE, as in Ref. 121. Models were trained for max. epochs, using early stopping after epochs of no improvement. The model with the best validation score was then used to make predictions on the test set.
The optimal model hyperparameters were searched within the following values: learning rate ; weight decay parameter ; node and edge features embedding size ; hidden space size ; number of edge features ; number of convolutional layers ; radial cutoff ; maximum number of atom neighbors ; dropout probability ; sum_mode [node, both]; combine_mode [mlp, diff, mean, sum]; graph_mode [energy, vector].
The hyperparameter search was done for the equivariant model EquiReactS (without attention or mapping) using Bayesian search as implemented in Weights & Biases.142 Hydrogen atoms were excluded from the graphs. Sweeps were run for 128 epochs for the GDB7-22-TS and Proparg-21-TS sets, and for 256 epochs for the Cyclo-23-TS set on the first random split. The parameters resulting in the best validation error, summarized in Table LABEL:S-tab:model-params, were used for all the other model settings.
5.6 Baseline models
The ChemProp model103 is based on a CGR built from atom-mapped SMILES strings of reactants and products, which is then passed through the directed message-passing neural network chemprop137, 71, 103 (version 1.5.0). The hyperparameters are taken from Ref. 38.
Molecular SLATM vectors were generated using the qml python package143 before being combined to form the reaction version SLATMd. SLATMd is used with kernel ridge regression (KRR) models. The kernel functions and widths, and regularization parameters, were optimized on the first of the ten random splits, in line with how the hyperparameters were optimized for 3DReact. Unlike 3DReact, the hyperparameters for DFT and xTB geometries were optimized separately.
Data and Software Availability statement
The code is available as a GitHub repository at https://github.com/lcmd-epfl/EquiReact. The versions of the datasets used, as well as any processing applied to them, can be found in the same repository. The unprocessed results are available in the same same repository as well as at https://wandb.ai/equireact.
Supplementary Information is provided in the freely available file equireact_si.pdf, detailing the architecture of the molecular channels (Section LABEL:S-sec:molecular_channels), the 3DReact hyperparameters (Section LABEL:S-sec:model-params), the RMSE analogue of Table 2 (Section LABEL:S-sec:rmse), the discussion of the model with a cross-attention surrogate for atom-mapping (Section LABEL:S-sec:cross), extrapolation studies (Section LABEL:S-sec:extrapolation), some illustrative correlation plots for the GDB7-22-TS set (Section LABEL:S-sec:gdb_outliers_corr), the model performance with and without explicit hydrogen atoms (Section LABEL:S-sec:hydrogens), and the geometry sensitivity analysis for the Cyclo-23-TS set (Section LABEL:S-sec:geom_cyclo).
Author Information
Author contributions
P.v.G., K.R.B., and C.B. conceptualized the project. 3DReact and support codes were written and run by K.R.B. and P.v.G., with design suggestions from C.B., V.R.S., and R.L. Results were analyzed by P.v.G., K.R.B., V.R.S., R.L., and C.B. xTB computations were run by R.L. The original draft was written by P.v.G. and K.R.B. with reviews and edits from all authors. C.C. and A.K. provided supervision and acquired funding.
Conflict of interest
The authors have no conflicts to disclose.
The authors thank Liam Marsh and Yannick Calvino Alonso for helpful discussion and comments on the text. P.v.G., C.B., V.R.S., R.L., A.K., and C.C. acknowledge the National Centre of Competence in Research (NCCR) “Sustainable chemical process through catalysis (Catalysis)”, grant number 180544, of the Swiss National Science Foundation (SNSF) for financial support. K.R.B. and C.C. were supported by the European Research Council (grant number 817977) and by the National Centre of Competence in Research (NCCR) “Materials’ Revolution: Computational Design and Discovery of Novel Materials (MARVEL)”, grant number 205602, of the Swiss National Science Foundation.
References
- Behler and Parrinello 2007 Behler, J.; Parrinello, M. Generalized neural-network representation of high-dimensional potential-energy surfaces. Phys. Rev. Lett. 2007, 98, 146401.
- Rupp et al. 2012 Rupp, M.; Tkatchenko, A.; Müller, K.-R.; von Lilienfeld, O. A. Fast and accurate modeling of molecular atomization energies with machine learning. Phys. Rev. Lett. 2012, 108, 058301.
- Bartók et al. 2013 Bartók, A. P.; Kondor, R.; Csányi, G. On representing chemical environments. Phys. Rev. B 2013, 87, 184115.
- Hansen et al. 2015 Hansen, K.; Biegler, F.; Ramakrishnan, R.; Pronobis, W.; von Lilienfeld, O. A.; Müller, K.-R.; Tkatchenko, A. Machine learning predictions of molecular properties: Accurate many-body potentials and nonlocality in chemical space. J. Phys. Chem. Lett. 2015, 6, 2326–2331.
- Huo and Rupp 2017 Huo, H.; Rupp, M. Unified representation for machine learning of molecules and crystals. arXiv preprint 2017, arXiv:1704.06439.
- Faber et al. 2018 Faber, F. A.; Christensen, A. S.; Huang, B.; von Lilienfeld, O. A. Alchemical and structural distribution based representation for universal quantum machine learning. J. Chem. Phys. 2018, 148, 241717.
- Christensen et al. 2020 Christensen, A. S.; Bratholm, L. A.; Faber, F. A.; von Lilienfeld, O. A. FCHL revisited: Faster and more accurate quantum machine learning. J. Chem. Phys. 2020, 152, 044107.
- Huang and von Lilienfeld 2020 Huang, B.; von Lilienfeld, O. A. Quantum machine learning using atom-in-molecule-based fragments selected on the fly. Nat. Chem. 2020, 12, 945–951.
- Drautz 2019 Drautz, R. Atomic cluster expansion for accurate and transferable interatomic potentials. Phys. Rev. B 2019, 99, 014104.
- Dusson et al. 2022 Dusson, G.; Bachmayr, M.; Csányi, G.; Drautz, R.; Etter, S.; van der Oord, C.; Ortner, C. Atomic cluster expansion: Completeness, efficiency and stability. J. Comput. Phys. 2022, 454, 110946.
- Grisafi and Ceriotti 2019 Grisafi, A.; Ceriotti, M. Incorporating long-range physics in atomic-scale machine learning. J. Chem. Phys. 2019, 151, 204105.
- Grisafi et al. 2021 Grisafi, A.; Nigam, J.; Ceriotti, M. Multi-scale approach for the prediction of atomic scale properties. Chem. Sci. 2021, 12, 2078–2090.
- Nigam et al. 2020 Nigam, J.; Pozdnyakov, S.; Ceriotti, M. Recursive evaluation and iterative contraction of -body equivariant features. J. Chem. Phys. 2020, 153, 121101.
- Fabrizio et al. 2022 Fabrizio, A.; Briling, K. R.; Corminboeuf, C. SPAHM: the Spectrum of Approximated Hamiltonian Matrices representations. Digital Discovery 2022, 1, 286–294.
- Briling et al. 2024 Briling, K. R.; Calvino Alonso, Y.; Fabrizio, A.; Corminboeuf, C. SPAHM(a,b): Encoding the density information from guess Hamiltonian in quantum machine learning representations. J. Chem. Theory Comput. 2024, 20, 1108–1117.
- Karandashev and von Lilienfeld 2022 Karandashev, K.; von Lilienfeld, O. A. An orbital-based representation for accurate quantum machine learning. J. Chem. Phys. 2022, 156, 114101.
- Llenga and Gryn’ova 2023 Llenga, S.; Gryn’ova, G. Matrix of orthogonalized atomic orbital coefficients representation for radicals and ions. J. Chem. Phys. 2023, 158, 214116.
- Li et al. 2015 Li, Z.; Kermode, J. R.; De Vita, A. Molecular dynamics with on-the-fly machine learning of quantum-mechanical forces. Phys. Rev. Lett. 2015, 114, 096405.
- Chmiela et al. 2017 Chmiela, S.; Tkatchenko, A.; Sauceda, H. E.; Poltavsky, I.; Schütt, K. T.; Müller, K.-R. Machine learning of accurate energy-conserving molecular force fields. Sci. Adv. 2017, 3, e1603015.
- Chmiela et al. 2018 Chmiela, S.; Sauceda, H. E.; Müller, K.-R.; Tkatchenko, A. Towards exact molecular dynamics simulations with machine-learned force fields. Nat. Commun. 2018, 9, 3887.
- Behler 2017 Behler, J. First principles neural network potentials for reactive simulations of large molecular and condensed systems. Angew. Chem. Int. Ed. 2017, 56, 12828–12840.
- Smith et al. 2018 Smith, J. S.; Nebgen, B.; Lubbers, N.; Isayev, O.; Roitberg, A. E. Less is more: Sampling chemical space with active learning. J. Chem. Phys. 2018, 148, 241733.
- Bereau et al. 2015 Bereau, T.; Andrienko, D.; von Lilienfeld, O. A. Transferable atomic multipole machine learning models for small organic molecules. J. Chem. Theory Comput. 2015, 11, 3225–3233.
- Grisafi et al. 2018 Grisafi, A.; Wilkins, D. M.; Csányi, G.; Ceriotti, M. Symmetry-adapted machine learning for tensorial properties of atomistic systems. Phys. Rev. Lett. 2018, 120, 036002.
- Wilkins et al. 2019 Wilkins, D. M.; Grisafi, A.; Yang, Y.; Lao, K. U.; DiStasio Jr, R. A.; Ceriotti, M. Accurate molecular polarizabilities with coupled cluster theory and machine learning. Proc. Natl. Acad. Sci. U.S.A. 2019, 116, 3401–3406.
- Montavon et al. 2013 Montavon, G.; Rupp, M.; Gobre, V.; Vazquez-Mayagoitia, A.; Hansen, K.; Tkatchenko, A.; Müller, K.-R.; von Lilienfeld, O. A. Machine learning of molecular electronic properties in chemical compound space. New J. Phys. 2013, 15, 095003.
- Mazouin et al. 2022 Mazouin, B.; Schöpfer, A. A.; von Lilienfeld, O. A. Selected machine learning of HOMO–LUMO gaps with improved data-efficiency. Mater. Adv. 2022, 3, 8306–8316.
- Brockherde et al. 2017 Brockherde, F.; Vogt, L.; Li, L.; Tuckerman, M. E.; Burke, K.; Müller, K.-R. Bypassing the Kohn-Sham equations with machine learning. Nat. Commun. 2017, 8, 872.
- Grisafi et al. 2019 Grisafi, A.; Fabrizio, A.; Meyer, B.; Wilkins, D. M.; Corminboeuf, C.; Ceriotti, M. Transferable machine-learning model of the electron density. ACS Cent. Sci. 2019, 5, 57–64.
- Fabrizio et al. 2019 Fabrizio, A.; Grisafi, A.; Meyer, B.; Ceriotti, M.; Corminboeuf, C. Electron density learning of non-covalent systems. Chem. Sci. 2019, 10, 9424–9432.
- Musil et al. 2021 Musil, F.; Grisafi, A.; Bartók, A. P.; Ortner, C.; Csányi, G.; Ceriotti, M. Physics-inspired structural representations for molecules and materials. Chem. Rev. 2021, 121, 9759–9815.
- Langer et al. 2022 Langer, M. F.; Goessmann, A.; Rupp, M. Representations of molecules and materials for interpolation of quantum-mechanical simulations via machine learning. npj Comput. Mater. 2022, 8, 41.
- Huang and von Lilienfeld 2021 Huang, B.; von Lilienfeld, O. A. Ab initio machine learning in chemical compound space. Chem. Rev. 2021, 121, 10001–10036.
- Kulik et al. 2022 Kulik, H. J.; Hammerschmidt, T.; Schmidt, J.; Botti, S.; Marques, M. A. L.; Boley, M.; Scheffler, M.; Todorović, M.; Rinke, P.; Oses, C. et al. Roadmap on Machine learning in electronic structure. Electron. Struct. 2022, 4, 023004.
- Glielmo et al. 2017 Glielmo, A.; Sollich, P.; De Vita, A. Accurate interatomic force fields via machine learning with covariant kernels. Phys. Rev. B 2017, 95, 214302.
- van Gerwen et al. 2022 van Gerwen, P.; Fabrizio, A.; Wodrich, M. D.; Corminboeuf, C. Physics-based representations for machine learning properties of chemical reactions. Mach. Learn.: Sci. Technol. 2022, 3, 045005.
- Faber et al. 2017 Faber, F. A.; Hutchison, L.; Huang, B.; Gilmer, J.; Schoenholz, S. S.; Dahl, G. E.; Vinyals, O.; Kearnes, S.; Riley, P. F.; von Lilienfeld, O. A. Prediction errors of molecular machine learning models lower than hybrid DFT error. J. Chem. Theory Comput. 2017, 13, 5255–5264.
- van Gerwen et al. 2024 van Gerwen, P.; Briling, K. R.; Calvino Alonso, Y.; Franke, M.; Corminboeuf, C. Benchmarking machine-readable vectors of chemical reactions on computed activation barriers. Digital Discovery 2024, 3, 932–943.
- Schütt et al. 2017 Schütt, K.; Kindermans, P.-J.; Sauceda Felix, H. E.; Chmiela, S.; Tkatchenko, A.; Müller, K.-R. SchNet: A continuous-filter convolutional neural network for modeling quantum interactions. Adv. Neural Inf. Process. Syst. 2017, 30, 991–1001.
- Unke and Meuwly 2019 Unke, O. T.; Meuwly, M. PhysNet: A neural network for predicting energies, forces, dipole moments, and partial charges. J. Chem. Theory Comput. 2019, 15, 3678–3693.
- Gasteiger et al. 2020 Gasteiger, J.; Groß, J.; Günnemann, S. Directional message passing for molecular graphs. arXiv preprint 2020, arXiv:2003.03123.
- Gilmer et al. 2017 Gilmer, J.; Schoenholz, S. S.; Riley, P. F.; Vinyals, O.; Dahl, G. E. Neural message passing for quantum chemistry. Proceedings of the 34th International Conference on Machine Learning. 2017; pp 1263–1272.
- Batzner et al. 2022 Batzner, S.; Musaelian, A.; Sun, L.; Geiger, M.; Mailoa, J. P.; Kornbluth, M.; Molinari, N.; Smidt, T. E.; Kozinsky, B. E(3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials. Nat. Commun. 2022, 13, 2453.
- Gasteiger et al. 2021 Gasteiger, J.; Becker, F.; Günnemann, S. GemNet: Universal directional graph neural networks for molecules. Adv. Neural Inf. Process. Syst. 2021, 34, 6790–6802.
- Haghighatlari et al. 2022 Haghighatlari, M.; Li, J.; Guan, X.; Zhang, O.; Das, A.; Stein, C. J.; Heidar-Zadeh, F.; Liu, M.; Head-Gordon, M.; Bertels, L. et al. Newtonnet: A newtonian message passing network for deep learning of interatomic potentials and forces. Digital Discovery 2022, 1, 333–343.
- Qiao et al. 2020 Qiao, Z.; Welborn, M.; Anandkumar, A.; Manby, F. R.; Miller, T. F. OrbNet: Deep learning for quantum chemistry using symmetry-adapted atomic-orbital features. J. Chem. Phys. 2020, 153, 124111.
- Thomas et al. 2018 Thomas, N.; Smidt, T.; Kearnes, S.; Yang, L.; Li, L.; Kohlhoff, K.; Riley, P. Tensor field networks: Rotation- and translation-equivariant neural networks for 3D point clouds. arXiv preprint 2018, arXiv:1802.08219.
- Townshend et al. 2020 Townshend, R. J.; Townshend, B.; Eismann, S.; Dror, R. O. Geometric prediction: Moving beyond scalars. arXiv preprint 2020, arXiv:2006.14163.
- Anderson et al. 2019 Anderson, B.; Hy, T. S.; Kondor, R. Cormorant: Covariant molecular neural networks. Adv. Neural Inf. Process. Syst. 2019, 32, 14537–14546.
- Satorras et al. 2021 Satorras, V. G.; Hoogeboom, E.; Welling, M. E(n) equivariant graph neural networks. Proceedings of the 38th International Conference on Machine Learning. 2021; pp 9323–9332.
- Christensen et al. 2021 Christensen, A. S.; Sirumalla, S. K.; Qiao, Z.; O’Connor, M. B.; Smith, D. G.; Ding, F.; Bygrave, P. J.; Anandkumar, A.; Welborn, M.; Manby, F. R. et al. OrbNet Denali: A machine learning potential for biological and organic chemistry with semi-empirical cost and DFT accuracy. J. Chem. Phys. 2021, 155, 204103.
- Schütt et al. 2021 Schütt, K.; Unke, O.; Gastegger, M. Equivariant message passing for the prediction of tensorial properties and molecular spectra. Proceedings of the 38th International Conference on Machine Learning. 2021; pp 9377–9388.
- Unke et al. 2021 Unke, O. T.; Chmiela, S.; Gastegger, M.; Schütt, K. T.; Sauceda, H. E.; Müller, K.-R. SpookyNet: Learning force fields with electronic degrees of freedom and nonlocal effects. Nat. Commun. 2021, 12, 7273.
- Zhang et al. 2020 Zhang, Y.; Ye, S.; Zhang, J.; Hu, C.; Jiang, J.; Jiang, B. Efficient and accurate simulations of vibrational and electronic spectra with symmetry-preserving neural network models for tensorial properties. J. Phys. Chem. B 2020, 124, 7284–7290.
- Nguyen and Lunghi 2022 Nguyen, V. H. A.; Lunghi, A. Predicting tensorial molecular properties with equivariant machine learning models. Phys. Rev. B 2022, 105, 165131.
- Batatia et al. 2022 Batatia, I.; Kovacs, D. P.; Simm, G.; Ortner, C.; Csanyi, G. MACE: Higher order equivariant message passing neural networks for fast and accurate force fields. Adv. Neural Inf. Process. Syst. 2022, 35, 11423–11436.
- Liao and Smidt 2022 Liao, Y.-L.; Smidt, T. Equiformer: Equivariant graph attention transformer for 3D atomistic graphs. arXiv preprint 2022, arXiv:2206.11990.
- Fuchs et al. 2020 Fuchs, F.; Worrall, D.; Fischer, V.; Welling, M. SE(3)-Transformers: 3D roto-translation equivariant attention networks. Adv. Neural Inf. Process. Syst. 2020, 33, 1970–1981.
- Simeon and De Fabritiis 2023 Simeon, G.; De Fabritiis, G. TensorNet: Cartesian tensor representations for efficient learning of molecular potentials. Adv. Neural Inf. Process. Syst. 2023, 36, 37334–37353.
- Corso et al. 2024 Corso, G.; Stark, H.; Jegelka, S.; Jaakkola, T.; Barzilay, R. Graph neural networks. Nat. Rev. Methods Primers 2024, 4, 17.
- Duval et al. 2023 Duval, A.; Mathis, S. V.; Joshi, C. K.; Schmidt, V.; Miret, S.; Malliaros, F. D.; Cohen, T.; Liò, P.; Bengio, Y.; Bronstein, M. A hitchhiker’s guide to geometric GNNs for 3D atomic systems. arXiv preprint 2023, arXiv:2312.07511.
- Musaelian et al. 2023 Musaelian, A.; Batzner, S.; Johansson, A.; Sun, L.; Owen, C. J.; Kornbluth, M.; Kozinsky, B. Learning local equivariant representations for large-scale atomistic dynamics. Nat. Commun. 2023, 14, 579.
- Wen et al. 2024 Wen, M.; Horton, M. K.; Munro, J. M.; Huck, P.; Persson, K. A. An equivariant graph neural network for the elasticity tensors of all seven crystal systems. Digital Discovery 2024, 3, 869–882.
- Batatia et al. 2022 Batatia, I.; Batzner, S.; Kovács, D. P.; Musaelian, A.; Simm, G. N. C.; Drautz, R.; Ortner, C.; Kozinsky, B.; Csányi, G. The design space of E(3)-equivariant atom-centered interatomic potentials. arXiv preprint 2022, arXiv:2205.06643.
- Liu et al. 2022 Liu, Y.; Wang, L.; Liu, M.; Zhang, X.; Oztekin, B.; Ji, S. Spherical message passing for 3D graph networks. arXiv preprint 2022, arXiv:2102.05013.
- Kondor 2018 Kondor, R. -body networks: A covariant hierarchical neural network architecture for learning atomic potentials. arXiv preprint 2018, arXiv:1803.01588.
- Bochkarev et al. 2022 Bochkarev, A.; Lysogorskiy, Y.; Ortner, C.; Csányi, G.; Drautz, R. Multilayer atomic cluster expansion for semilocal interactions. Phys. Rev. Res. 2022, 4, L042019.
- Lewis-Atwell et al. 2022 Lewis-Atwell, T.; Townsend, P. A.; Grayson, M. N. Machine learning activation energies of chemical reactions. WIREs Comput. Mol. Sci. 2022, 12, e1593.
- Gallarati et al. 2021 Gallarati, S.; Fabregat, R.; Laplaza, R.; Bhattacharjee, S.; Wodrich, M. D.; Corminboeuf, C. Reaction-based machine learning representations for predicting the enantioselectivity of organocatalysts. Chem. Sci. 2021, 12, 6879–6889.
- Grambow et al. 2020 Grambow, C. A.; Pattanaik, L.; Green, W. H. Deep learning of activation energies. J. Phys. Chem. Lett. 2020, 11, 2992–2997.
- Heid and Green 2022 Heid, E.; Green, W. H. Machine learning of reaction properties via learned representations of the condensed graph of reaction. J. Chem. Inf. Model. 2022, 62, 2101–2110.
- Spiekermann et al. 2022 Spiekermann, K. A.; Pattanaik, L.; Green, W. H. Fast predictions of reaction barrier heights: Toward coupled-cluster accuracy. J. Phys. Chem. A 2022, 126, 3976–3986.
- Zhao et al. 2023 Zhao, Q.; Anstine, D. M.; Isayev, O.; Savoie, B. M. machine learning for reaction property prediction. Chem. Sci. 2023, 14, 13392–13401.
- Heinen et al. 2021 Heinen, S.; von Rudorff, G. F.; von Lilienfeld, O. A. Toward the design of chemical reactions: Machine learning barriers of competing mechanisms in reactant space. J. Chem. Phys. 2021, 155, 064105.
- Singh et al. 2019 Singh, A. R.; Rohr, B. A.; Gauthier, J. A.; Nørskov, J. K. Predicting chemical reaction barriers with a machine learning model. Catal. Lett. 2019, 149, 2347–2354.
- Choi et al. 2018 Choi, S.; Kim, Y.; Kim, J. W.; Kim, Z.; Kim, W. Y. Feasibility of activation energy prediction of gas-phase reactions by machine learning. Chem. Eur. J. 2018, 24, 12354–12358.
- Farrar and Grayson 2022 Farrar, E. H. E.; Grayson, M. N. Machine learning and semi-empirical calculations: A synergistic approach to rapid, accurate, and mechanism-based reaction barrier prediction. Chem. Sci. 2022, 13, 7594–7603.
- Friederich et al. 2020 Friederich, P.; dos Passos Gomes, G.; Bin, R. D.; Aspuru-Guzik, A.; Balcells, D. Machine learning dihydrogen activation in the chemical space surrounding Vaska’s complex. Chem. Sci. 2020, 11, 4584–4601.
- Migliaro and Cundari 2020 Migliaro, I.; Cundari, T. R. Density functional study of methane activation by frustrated lewis pairs with group 13 trihalides and group 15 pentahalides and a machine learning analysis of their barrier heights. J. Chem. Inf. Model. 2020, 60, 4958–4966.
- Lewis-Atwell et al. 2023 Lewis-Atwell, T.; Beechey, D.; Şimşek, Ö.; Grayson, M. N. Reformulating reactivity design for data-efficient machine learning. ACS Catal. 2023, 13, 13506–13515.
- Vadaddi et al. 2024 Vadaddi, S. M.; Zhao, Q.; Savoie, B. M. Graph to activation energy models easily reach irreducible errors but show limited transferability. J. Phys. Chem. A 2024, 128, 2543–2555.
- Ramos et al. 2024 Ramos, J. E. A.; Neeser, R. M. M.; Stuyver, T. Repurposing quantum chemical descriptor datasets for on-the-fly generation of informative reaction representations: Application to hydrogen atom transfer reactions. Digital Discovery 2024, 3, 919–931.
- Schwaller et al. 2022 Schwaller, P.; Vaucher, A. C.; Laplaza, R.; Bunne, C.; Krause, A.; Corminboeuf, C.; Laino, T. Machine intelligence for chemical reaction space. WIREs Comput. Mol. Sci. 2022, 12, e1604.
- Rogers and Hahn 2010 Rogers, D.; Hahn, M. Extended-connectivity fingerprints. J. Chem. Inf. Model. 2010, 50, 742–754.
- Probst et al. 2022 Probst, D.; Schwaller, P.; Reymond, J.-L. Reaction classification and yield prediction using the differential reaction fingerprint DRFP. Digital Discovery 2022, 1, 91–97.
- Ahneman et al. 2018 Ahneman, D. T.; Estrada, J. G.; Lin, S.; Dreher, S. D.; Doyle, A. G. Predicting reaction performance in C–N cross-coupling using machine learning. Science 2018, 360, 186–190.
- Żurański et al. 2021 Żurański, A. M.; Martinez Alvarado, J. I.; Shields, B. J.; Doyle, A. G. Predicting reaction yields via supervised learning. Acc. Chem. Res. 2021, 54, 1856–1865.
- Zahrt et al. 2019 Zahrt, A. F.; Henle, J. J.; Rose, B. T.; Wang, Y.; Darrow, W. T.; Denmark, S. E. Prediction of higher-selectivity catalysts by computer-driven workflow and machine learning. Science 2019, 363, eaau5631.
- Jorner et al. 2021 Jorner, K.; Brinck, T.; Norrby, P.-O.; Buttar, D. Machine learning meets mechanistic modelling for accurate prediction of experimental activation energies. Chem. Sci. 2021, 12, 1163–1175.
- Reid and Sigman 2019 Reid, J. P.; Sigman, M. S. Holistic prediction of enantioselectivity in asymmetric catalysis. Nature 2019, 571, 343–348.
- Gensch et al. 2022 Gensch, T.; dos Passos Gomes, G.; Friederich, P.; Peters, E.; Gaudin, T.; Pollice, R.; Jorner, K.; Nigam, A.; Lindner-D’Addario, M.; Sigman, M. S. et al. A comprehensive discovery platform for organophosphorus ligands for catalysis. J. Am. Chem. Soc. 2022, 144, 1205–1217.
- Santiago et al. 2018 Santiago, C. B.; Guo, J.-Y.; Sigman, M. S. Predictive and mechanistic multivariate linear regression models for reaction development. Chem. Sci. 2018, 9, 2398–2412.
- Jorner 2023 Jorner, K. Putting chemical knowledge to work in machine learning for reactivity. Chimia 2023, 77, 22.
- Gallegos et al. 2021 Gallegos, L. C.; Luchini, G.; St. John, P. C.; Kim, S.; Paton, R. S. Importance of engineered and learned molecular representations in predicting organic reactivity, selectivity, and chemical properties. Acc. Chem. Res. 2021, 54, 827–836.
- Williams et al. 2021 Williams, W. L.; Zeng, L.; Gensch, T.; Sigman, M. S.; Doyle, A. G.; Anslyn, E. V. The evolution of data-driven modeling in organic chemistry. ACS Cent. Sci. 2021, 7, 1622–1637.
- Stuyver and Coley 2023 Stuyver, T.; Coley, C. W. Machine learning-guided computational screening of new candidate reactions with high bioorthogonal click potential. Chem. Eur. J. 2023, 29, e202300387.
- Stuyver and Coley 2022 Stuyver, T.; Coley, C. W. Quantum chemistry-augmented neural networks for reactivity prediction: Performance, generalizability, and explainability. J. Chem. Phys. 2022, 156, 084104.
- Vargas et al. 2024 Vargas, S.; Gee, W.; Alexandrova, A. High-throughput quantum theory of atoms in molecules (QTAIM) for geometric deep learning of molecular and reaction properties. Digital Discovery 2024, 3, 987–998.
- Vijay et al. 2024 Vijay, S.; Venetos, M. C.; Spotte-Smith, E. W. C.; Kaplan, A. D.; Wen, M.; Persson, K. A. CoeffNet: Predicting activation barriers through a chemically-interpretable, equivariant and physically constrained graph neural network. Chem. Sci. 2024, 15, 2923–2936.
- Devlin et al. 2018 Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint 2018, arXiv:1810.04805.
- Schwaller et al. 2019 Schwaller, P.; Laino, T.; Gaudin, T.; Bolgar, P.; Hunter, C. A.; Bekas, C.; Lee, A. A. Molecular transformer: A model for uncertainty-calibrated chemical reaction prediction. ACS Cent. Sci. 2019, 5, 1572–1583.
- Schwaller et al. 2021 Schwaller, P.; Vaucher, A. C.; Laino, T.; Reymond, J.-L. Prediction of chemical reaction yields using deep learning. Mach. Learn.: Sci. Technol. 2021, 2, 015016.
- Heid et al. 2024 Heid, E.; Greenman, K. P.; Chung, Y.; Li, S.-C.; Graff, D. E.; Vermeire, F. H.; Wu, H.; Green, W. H.; McGill, C. J. Chemprop: A machine learning package for chemical property prediction. J. Chem. Inf. Model. 2024, 64, 9–17.
- Chen et al. 2013 Chen, W. L.; Chen, D. Z.; Taylor, K. T. Automatic reaction mapping and reaction center detection. WIREs Comput. Mol. Sci. 2013, 3, 560–593.
- Preciat Gonzalez et al. 2017 Preciat Gonzalez, G. A.; El Assal, L. R.; Noronha, A.; Thiele, I.; Haraldsdóttir, H. S.; Fleming, R. M. Comparative evaluation of atom mapping algorithms for balanced metabolic reactions: Application to Recon 3D. J. Cheminform. 2017, 9, 39.
- Jaworski et al. 2019 Jaworski, W.; Szymkuć, S.; Mikulak-Klucznik, B.; Piecuch, K.; Klucznik, T.; Kaźmierowski, M.; Rydzewski, J.; Gambin, A.; Grzybowski, B. A. Automatic mapping of atoms across both simple and complex chemical reactions. Nat. Commun. 2019, 10, 1434.
- Schwaller et al. 2021 Schwaller, P.; Hoover, B.; Reymond, J.-L.; Strobelt, H.; Laino, T. Extraction of organic chemistry grammar from unsupervised learning of chemical reactions. Sci. Adv. 2021, 7, eabe4166.
- Doney et al. 2016 Doney, A. C.; Rooks, B. J.; Lu, T.; Wheeler, S. E. Design of organocatalysts for asymmetric propargylations through computational screening. ACS Catal. 2016, 6, 7948–7955.
- Nehil-Puleo et al. 2024 Nehil-Puleo, K.; Quach, C. D.; Craven, N. C.; McCabe, C.; Cummings, P. T. equivariant graph neural network for learning interactional properties of molecules. J. Phys. Chem. B 2024, 128, 1108–1117.
- Duan et al. 2023 Duan, C.; Du, Y.; Jia, H.; Kulik, H. J. Accurate transition state generation with an object-aware equivariant elementary reaction diffusion model. arXiv preprint 2023, arXiv:2304.06174.
- Zhang et al. 2021 Zhang, J.; Lei, Y.-K.; Zhang, Z.; Han, X.; Li, M.; Yang, L.; Yang, Y. I.; Gao, Y. Q. Deep reinforcement learning of transition states. Phys. Chem. Chem. Phys. 2021, 23, 6888–6895.
- Pattanaik et al. 2020 Pattanaik, L.; Ingraham, J. B.; Grambow, C. A.; Green, W. H. Generating transition states of isomerization reactions with deep learning. Phys. Chem. Chem. Phys. 2020, 22, 23618–23626.
- Makoś et al. 2021 Makoś, M. Z.; Verma, N.; Larson, E. C.; Freindorf, M.; Kraka, E. Generative adversarial networks for transition state geometry prediction. J. Chem. Phys. 2021, 155, 024116.
- Kim et al. 2024 Kim, S.; Woo, J.; Kim, W. Y. Diffusion-based generative AI for exploring transition states from 2D molecular graphs. Nat. Commun. 2024, 15, 341.
- Choi 2023 Choi, S. Prediction of transition state structures of gas-phase chemical reactions via machine learning. Nat. Commun. 2023, 14, 1168.
- Spiekermann et al. 2022 Spiekermann, K.; Pattanaik, L.; Green, W. H. High accuracy barrier heights, enthalpies, and rate coefficients for chemical reactions. Sci. Data 2022, 9, 417.
- Stuyver et al. 2023 Stuyver, T.; Jorner, K.; Coley, C. W. Reaction profiles for quantum chemistry-computed cycloaddition reactions. Sci. Data 2023, 10, 66.
- Geiger et al. 2022 Geiger, M.; Smidt, T.; M., A.; Miller, B. K.; Boomsma, W.; Dice, B.; Lapchevskyi, K.; Weiler, M.; Tyszkiewicz, M.; Uhrin, M. et al. e3nn/e3nn: 2022-12-12. 2022; https://zenodo.org/records/7430260.
- Corso et al. 2023 Corso, G.; Stärk, H.; Jing, B.; Barzilay, R.; Jaakkola, T. DiffDock: Diffusion steps, twists, and turns for molecular docking. arXiv preprint 2023, arXiv:2210.01776.
- Landrum et al. 2023 Landrum, G.; Tosco, P.; Kelley, B.; Ric,; Sriniker,; Cosgrove, D.; Gedeck,; Vianello, R.; NadineSchneider,; Kawashima, E. et al. rdkit/rdkit: 2023_03_1 (Q1 2023) release. 2023; https://zenodo.org/record/7880616.
- Stärk et al. 2022 Stärk, H.; Ganea, O.; Pattanaik, L.; Barzilay, R.; Jaakkola, T. EquiBind: Geometric deep learning for drug binding structure prediction. Proceedings of the 39th International Conference on Machine Learning. 2022; pp 20503–20521.
- Ganea et al. 2022 Ganea, O.-E.; Huang, X.; Bunne, C.; Bian, Y.; Barzilay, R.; Jaakkola, T.; Krause, A. Independent SE(3)-equivariant models for end-to-end rigid protein docking. arXiv preprint 2022, arXiv:2111.07786.
- Grambow et al. 2020 Grambow, C.; Pattanaik, L.; Green, W. Reactants, products, and transition states of elementary chemical reactions based on quantum chemistry. Sci. Data 2020, 7, 137.
- van Gerwen et al. 2023 van Gerwen, P.; Wodrich, M. D.; Laplaza, R.; Corminboeuf, C. Reply to Comment on ‘Physics-based representations for machine learning properties of chemical reactions’. Mach. Learn.: Sci. Technol. 2023, 4, 048002.
- Lowe 2012 Lowe, D. M. Extraction of chemical structures and reactions from the literature. Ph.D. thesis, University of Cambridge, 2012.
- von Lilienfeld 2013 von Lilienfeld, O. A. First principles view on chemical compound space: Gaining rigorous atomistic control of molecular properties. Int. J. Quantum Chem. 2013, 113, 1676–1689.
- Bemis and Murcko 1996 Bemis, G. W.; Murcko, M. A. The properties of known drugs. 1. Molecular frameworks. J. Med. Chem. 1996, 39, 2887–2893.
- van der Maaten and Hinton 2008 van der Maaten, L.; Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605.
- Bannwarth et al. 2019 Bannwarth, C.; Ehlert, S.; Grimme, S. GFN2-xTB–An accurate and broadly parametrized self-consistent tight-binding quantum chemical method with multipole electrostatics and density-dependent dispersion contributions. J. Chem. Theory Comput. 2019, 15, 1652–1671.
- Blum and Reymond 2009 Blum, L. C.; Reymond, J.-L. 970 million druglike small molecules for virtual screening in the chemical universe database GDB-13. J. Am. Chem. Soc. 2009, 131, 8732–8733.
- Reymond 2015 Reymond, J.-L. The chemical space project. Acc. Chem. Res. 2015, 48, 722–730.
- Ramakrishnan et al. 2014 Ramakrishnan, R.; Dral, P. O.; Rupp, M.; von Lilienfeld, O. A. Quantum chemistry structures and properties of 134 kilo molecules. Sci. Data 2014, 1, 140022.
- Zimmerman 2015 Zimmerman, P. M. Single-ended transition state finding with the growing string method. J. Comput. Chem. 2015, 36, 601–611.
- Cordella et al. 2001 Cordella, L. P.; Foggia, P.; Sansone, C.; Vento, M. An improved algorithm for matching large graphs. 3rd IAPR-TC15 workshop on graph-based representations in pattern recognition. 2001; pp 149–159.
- Hagberg et al. 2008 Hagberg, A. A.; Schult, D. A.; Swart, P. J. Exploring network structure, dynamics, and function using NetworkX. Proceedings of the 7th Python in Science Conference. 2008; pp 11–15.
- Wu et al. 2018 Wu, Z.; Ramsundar, B.; Feinberg, E. N.; Gomes, J.; Geniesse, C.; Pappu, A. S.; Leswing, K.; Pande, V. MoleculeNet: A benchmark for molecular machine learning. Chem. Sci. 2018, 9, 513–530.
- Yang et al. 2019 Yang, K.; Swanson, K.; Jin, W.; Coley, C.; Eiden, P.; Gao, H.; Guzman-Perez, A.; Hopper, T.; Kelley, B.; Mathea, M. et al. Analyzing learned molecular representations for property prediction. J. Chem. Inf. Model. 2019, 59, 3370–3388.
- Riniker and Landrum 2015 Riniker, S.; Landrum, G. A. Better informed distance geometry: Using what we know to improve conformation generation. J. Chem. Inf. Model. 2015, 55, 2562–2574.
- Tosco et al. 2014 Tosco, P.; Stiefl, N.; Landrum, G. Bringing the MMFF force field to the RDKit: implementation and validation. J. Cheminform. 2014, 6, 37.
- Atkinson et al. 2019 Atkinson, P.; Bannwarth, C.; Bohle, F.; Brandenburg, G.; Caldeweyher, E.; Checinski, M.; Dohm, S.; Ehlert, S.; Ehrlich, S.; Gerasimov, I. et al. Semiempirical Extended Tight-Binding Program Package. https://github.com/grimme-lab/xtb, 2019.
- Kingma and Ba 2014 Kingma, D. P.; Ba, J. Adam: A method for stochastic optimization. arXiv preprint 2014, arXiv:1412.6980.
- Biewald 2020 Biewald, L. Experiment Tracking with Weights and Biases. 2020; https://www.wandb.com/, Software available from wandb.com.
- Christensen et al. 2017 Christensen, A. S.; Faber, F.; Huang, B.; Bratholm, L.; Tkatchenko, A.; Müller, K.-R.; von Lilienfeld, O. A. QML: A Python toolkit for quantum machine learning. https://github.com/qmlcode/qml, 2017.