^†^†thanks: These authors contributed equally

Accurate Crystal Structure Prediction of New 2D Hybrid Organic Inorganic Perovskites

Nima Karimitari Department of Chemistry and Biochemistry, University of South Carolina, South Carolina 29208, United States William J. Baldwin wjb48@cam.ac.uk Department of Engineering, University of Cambridge, Cambridge CB2 1PZ, UK

{}^{,\hskip 2.0pt\ast}

Evan W. Muller UES, Inc., Beavercreek, Ohio 45432, United States Zachary J. L. Bare Department of Chemistry and Biochemistry, University of South Carolina, South Carolina 29208, United States W. Joshua Kennedy Materials and Manufacturing Directorate, Air Force Research Laboratory, Wright-Patterson AFB, Dayton, Ohio 45433, United States Gábor Csányi Department of Engineering, University of Cambridge, Cambridge CB2 1PZ, UK Christopher Sutton cs113@mailbox.sc.edu Department of Chemistry and Biochemistry, University of South Carolina, South Carolina 29208, United States

(March 11, 2024)

Abstract

Low dimensional hybrid organic-inorganic perovskites (HOIPs) represent a promising class of electronically active materials for both light absorption and emission. The design space of HOIPs is extremely large, since a diverse space of organic cations can be combined with different inorganic frameworks. This immense design space allows for tunable electronic and mechanical properties, but also necessitates the development of new tools for in silico high throughput analysis of candidate materials. In this work, we present an accurate, efficient, transferable and widely applicable machine learning interatomic potential (MLIP) for predicting the structure of new 2D HOIPs. Using the MACE architecture, an MLIP is trained on 86 diverse experimentally reported HOIP materials. The MLIP is tested on 73 unseen perovskite compositions (that were previously reported experimentally), and achieves chemical accuracy with respect to the reference electronic structure method. Our model is then combined with a simple random structure search algorithm to predict the structure of new HOIPs given only the proposed composition as input. Success is demonstrated by correctly and reliably recovering the crystal structure of a set of experimentally known 2D perovskites. Such a random structure search is impossible with ab initio methods due to the associated computational cost, but is relatively inexpensive with the MLIP. Finally, the procedure is used to predict the structure formed by a new organic cation with no previously known corresponding perovskite. Laboratory synthesis of the new hybrid perovskite confirms the accuracy of our prediction using the combined MLIP and structure-search algorithm. This capability will enable the efficient and accurate screening of thousands of combinations of organic cations and inorganic layers for further investigation.

Hybrid organic-inorganic perovskites (HOIPs) belong to a broad category of materials, generally represented by the chemical formula \ceABX3. The \ceB-site and \ceX-site ions form a network of corner-sharing \ceBX6 octahedra. Although the \ceA-site can be a large inorganic cation, such as caesium, using an organic cation has proved extremely successful, resulting in the development of state of the art solution-processed optoelectronic materials [1]. Provided the organic cation is small, the typical perovskite structure is retained. For larger cations, however, the network of corner sharing octahedra is disrupted, leading to ‘low dimensional’ structures such as one-dimensional chains or two-dimensional sheets of octahedra (see Fig. 1b).

Two-dimensional HOIPs are formed when the organic cations separate the inorganic layers in the (100), (110) or (111) direction, giving the modified general formula \ceA’_mA_n-1B_nX_3n+1. The constants $n$ and $m$ determine the number of connected inorganic layers and the charge of the organic cation. They are further categorized into two main types: Dion–Jacobson (DJ) [2] with $m=1$ (one sheet of interlayer cations with +2 charge) and Ruddlesden–Popper (RP) [3, 4] with $m=2$ (two sheets of cations with +1 charge) [5]. Two dimensional HOIPs have the advantages of enhanced stability under ambient conditions and structural tunability. This makes them promising candidates for applications in photoluminescence (PL), photovoltaics, photodetection, and light emmitting diodes (LEDs) [6, 7, 8, 9].

Due to the breadth of the design space of 2D (as well as 1D and 0D) perovskites, in silico property screening is desirable. However, in order to calculate properties with ab initio electronic structure methods, one first needs to know the crystal structure. A similar task has been tackled in the field of organic crystal structure prediction (CSP): typically, CSP methods involve generating many hundreds or thousands of candidate structures, and selecting the lowest energy structures using an empirical force field [10]. For general inorganic crystals, the related Random Structure Search (RSS) method has been successful for unit cells of up to a couple of dozen atoms, wherein candidate structures are generated and the geometry is subsequently relaxed to the nearest local minima in the potential energy landscape [11].

A particular difficulty in the case of 2D HOIPs is that they can have extremely large unit cells containing up to 1000 atoms. Furthermore, they are structurally complex (see Fig. 1) with the organic molecules having many potentially quite flexible degrees of freedom. Direct Density Functional Theory (DFT) geometry relaxations or molecular dynamics simulations are therefore extremely expensive, while empirical force fields which are accurate across the desired range of chemical interactions do not exist presently.

An alternative to doing ab initio calculations is to use machine-learned interatomic potentials (MLIPs) [12, 13, 14, 15]. MLIPs can be trained to predict the potential energy of a configuration of atoms directly from the atomic coordinates, allowing for simulations of hundreds of thousands of atoms at DFT accuracy [16]. Many MLIP architectures have been developed in recent years. Key developments in this area have been the focus on atom-centered energy contributions enabling linear scaling models, the incorporation of physical symmetries into model architectures [17, 18, 19] and efficient construction of many-body representations of atomic environments [20, 21, 22]. Furthermore, the introduction of graph models to MLIP development has lead to greatly improved accuracy and transferability [23, 24, 25]. MLIPs have already been used to perform structure prediction for large scale screening tasks, for example in a computational study searching for novel stable inorganic materials [26].

In this work, the MACE [27] message passing architecture was used to build a transferable MLIP for HOIPs. MACE is a graph tensor network which constructs many-body equivariant messages at each node (nodes correspond to atoms in this case) via the atomic cluster expansion [21], which are then passed onto neighbouring nodes. The architecture has been shown to be accurate, efficient and transferable [28], and has recently been used to create a state of the art ML organic force field[29] and a “foundation model” for materials chemistry[30]. The model in this work is fitted to data collected from several publicly available databases of experimentally reported HOIPs. Starting from structures reported in these databases, an extensive training dataset was generated by running an active learning protocol based on molecular dynamics. Collected configurations were labelled with DFT calculations. The final model achieves excellent accuracy across an independent set of perovskites with unseen compositions taken from the same sources.

To use the model effectively, we present a simple random structure search procedure designed for 2D HOIPs and we show that the trained MLIP accurately captures the complex potential energy landscape encountered during a random structure search task. Furthermore, the combination of the structure searching algorithm and the MACE model is an accurate and efficient structure prediction tool. This is shown by ‘re-discovering’ the ground state structure of a set of experimentally reported HOIPs not seen by the model during fitting, given only the most basic information of the perovskite - the identity of the organic cation and the composition of the inorganic layer.

Finally, we predict the crystal structure of a previously unknown 2D HOIP. We then synthesize the material in the laboratory, and verify that the structure agrees with our prediction. The process reveals a large number of competing low energy minima, with subtly different orientations and stacking patterns of the organic cation. Due to the high degree of similarity between these structures, an accurate and efficient search tool offers many insights beyond just prediction of the ground state structure.

I Dataset Construction

A dataset was compiled from three sources: The 2D perovskites database of the laboratory of new materials for solar energy (NMSE) [31], the Cambridge Structural Database [32], and a recent research article by Tremblay et al. reporting numerous 2D HOIP structures [33].

The occurrence of different chemical elements and structural features in these sources was quite non-uniform. Several simplifying restrictions have therefore been placed on the scope of our model. Firstly, the set of chemical elements considered for the inorganic layer was restricted to only include \cePb, I, Br and \ceCl. As a result, the resulting MLIP can be applied to only Pb-based perovskites, with X = I, Br, or Cl. Furthermore, we restricted the composition of the organic cation to include only C, H, N and O. These restrictions were imposed due to the occurrence of different chemical elements in the available 2D HOIP datasets: of the structures we collected, more than 80% were lead-based, and the majority contained only C, H, and N elements in the organic cation. Applying these filters resulted in an initial dataset of 159 experimentally reported structures. Fig. 1a presents some key statistics of this dataset including the number of atoms in the unit cell and organic cation, as well as a breakdown of the elements present at the X-site, number of inorganic layers, organic cation charge and whether the organic cation contains oxygen. Four representative example structures are shown in Fig. 1b to illustrate the diversity of the perovskites that are included.

Refer to caption — Figure 1: (a): Key properties of the perovskites in the compiled dataset. Note that some structures have multiple organic cations, but the upper right histogram shows only the size of the largest cation in each structure. (b): Examples of 2D HOIP structures in the dataset.

II Model Development and Performance

II.1 Active Learning for Dataset Expansion

In the following, composition will refer to a perovskite as determined by the chemical formula and the experimentally reported unit cell, while configuration will refer to a specific non-equilibrium set of atomic positions, for which one could compute a reference energy using DFT. The dataset described above serves as a starting point for fitting a MLIP. In practice, however, fitting accurate and stable models requires a database with many non-equilibrium configurations for each target composition or phase. One popular method for database construction is to sample configurations from molecular dynamics trajectories. In this study, a different approach is taken in which a database of reference configurations is grown iteratively in an active learning procedure[34, 35].

Before beginning the active learning procedure, the dataset of experimentally reported compositions was first divided in two, by randomly sampling 86 perovskites to form the core of the training set. The remaining test set compositions will be used to assess the transferability of the final model to new unseen perovskites.

The key principle of active learning is to use a model which can estimate the uncertainty of its own prediction on a given configuration. If this estimate is reliable, one can search for configurations on which the model is uncertain, and add only these configurations to the dataset. Several methods exist for constructing MLIPs with an in-built measure of prediction uncertainty. For MACE, the uncertainty estimate can be obtained as follows: given a dataset, one fits several independent models with the same hyperparameters, but with a different random initialisation of weights. These models are then referred to as a committee, and in this case, we use only 3 models to form the committee. On a new configuration, the disagreement between the committee members can be treated as an uncertainty estimate. As will be shown below, this comparatively inexpensive procedure leads to remarkably useful uncertainty estimates.

With this method for assessing the uncertainty of a model, the active learning procedure is as follows:

1.

Given an initial dataset of configurations, calculate reference energies and atomic forces using DFT. Fit a committee of 3 MACE models on this dataset.
2.

For each composition in the training set, run an MD simulation starting from the experimentally reported structure and using the average of the force predictions of the committee members to propagate the dynamics. At each time step, test the uncertainty of the potential by calculating the disagreement in the prediction of the atomic forces between the committee members.
3.

If the relative force uncertainty of any atom, defined as the standard deviation of the committee force predictions divided by the mean of the forces, is larger than a specific threshold (in our study this threshold is to 0.2, see also section X.2) the MD simulation is terminated. DFT energy and forces are calculated for the configuration for which the uncertainty exceeded the threshold, and added to the training set. If the uncertainty does not exceed the threshold within 10 ps, terminate the MD and do not collect any new configurations.
4.

Refit the committee of models with the expanded dataset. It is expected that the configurations where the models previously disagreed are now well described with low uncertainty.
5.

Repeat steps 2-5 until no new configurations are collected for any of the compositions.

In each cycle of active learning, we ran committee MD for each of the 86 compositions in the training set. Additional configurations are therefore collected at a rate of 86 per cycle if the uncertainty for all compositions exceed the threshold. However, as the dataset grows, many compositions quickly become well described and do not trigger new DFT calculations, resulting in few new training configurations per cycle. For this reason, in later cycles step 3 was repeated 10 times for each training set composition before retraining the model. The final potential is fitted once all the unique perovskite compositions in the training set are stable, meaning that in 10 independent MD simulations lasting 10 ps each, the 0.2 relative force uncertainty threshold is not exceeded. Additional details on the active learning procedure are given in section X.2

Key to this method is the reliability of the uncertainty measure. An example of the evolution of the uncertainty measure during a committee MD simulation is shown in Fig. 2. To assess the uncertainty measure, all configurations in the trajectory were evaluated with DFT, and the actual force error made by the model at each time step is also shown. Two models from different stages in the active learning procedure are shown - an ‘unstable’ model from an early point in the active learning process, and the the final model.

For the unstable potential, the uncertainty exceeds the threshold (the solid red line) at multiple instances, and eventually increases to 1.0 implying total uncertainty in force predictions. By contrast, the final potential has both a consistently lower uncertainty and a lower force error. The key result shown in Fig. 2 is that the difference in force error between the final model and unstable model is clearly reflected in the estimated uncertainty. Also important is that the spikes in the force error of the unstable model closely correlate with the spikes of the relative force uncertainty.

In general, the highest uncertainty occurs for atomic configurations that are less represented in the training set. In particular, there are 61 unique organic cations in the 86 compositions of the training dataset, while there are just 7 types of inorganic layer. Therefore, the highest force uncertainty typically occurred on the organic cations.

II.2 Final Model Performance

In total, 18 cycles of active learning were performed. The final training dataset contains a total of 2457 configurations. To test the final potential, MD simulations of 73 unseen test set compositions were ran for 10 ps and samples were taken every 1 ps. The energy and force predictions for all the training and test samples are shown in Fig. 3. The RMSE of training (test) dataset for energy and forces are 0.76 (1.84) meV/atom and 10.7 (31.7) meV/Å, respectively. In addition, the errors categorized based on the halide atoms are shown in Table. 1. An energy error of 1 kcal/mol (typically called “chemical accuracy”) corresponds to 43.4 meV per formula unit.

Table 1: Energy and force errors for seen and unseen configurations categorized based on the halides

	Seen Compositions		Unseen Compositions
	Energy (meV/atom)	Forces (meV/Å)	Energy (meV/atom)	Forces (meV/Å)
Cl	0.86	9.25	1.86	30.4
I	0.74	10.96	1.34	29.88
Br	0.78	10.39	2.12	48.53
Total	0.76	10.71	1.84	31.67

III Relaxation of Experimentally Reported Structures

Experimentally reported structures are typically close to the global minima of the potential energy surface. For the trained MACE model to be useful for structure searching, it must relax these structures to the same local minima as would be obtained by a DFT geometry relaxation. To assess whether this is the case, we considered 137 perovskite compositions in the dataset that have less than 200 atoms, with 58 from the training set and 79 from the test set. For all of these compositions, the experimentally reported structure was relaxed independently with DFT and MACE calculators, until the forces were less than 10 meV/Å.

One way to quantify the difference between the MACE and DFT relaxed structures is to measure the root mean square displacement (RMSD) of the atoms between the two structures. The distribution of RMSD for all 137 compositions is shown in Fig. 4a. For the majority of the samples, the RMSD is less than 0.1 Å. Several outliers are present with larger RMSDs on the order of 0.3-0.5 Å. These outliers generally correspond to cases in which long, flexible organic molecules move slightly with respect to each other.

The independently obtained DFT and MLIP relaxed structures can also be compared using the total radial distribution function (RDF), which contains information about the bond lengths, intermolecular distances and organic-inorganic distances in the structure. A comparison between the RDFs of a MACE and DFT relaxed structure is shown in Fig. 4b. For $r<3$ Å, which mostly corresponds to the intramolecular bond distances, the differences between MACE and DFT are negligible. For $r>3$ Å, which contains both the intermolecular distances and inorganic bonds, some differences are apparent, however the structures relaxed with MACE and DFT share many of the larger features.

To quantify the difference between the RDFs of MACE and DFT relaxed structures, we used the first Wasserstein distance (the earth mover’s distance, EMD) between these two distributions, which calculates the least amount work required to change one distribution to the other [36]. A histogram of the Wasserstein distances for 63 randomly selected compositions in the train and test sets is shown in Fig. 4c. One can see that the final MLIP performs similarly for both training and test sets using this metric.

IV High Throughput Structure Prediction for New HOIPs

Calculating properties of known perovskite structures with ab initio methods is expensive, but not impossible. On the other hand, high throughput structure prediction for many new compositions is potentially an infeasibly expensive task, particularly for structures with large unit cells. This is because crystal structure prediction protocols typically involve a very large number of either geometry relaxations or single point evaluations to predict the structure of just one chemical composition.

In particular, organic crystal structure prediction involves first generating many (thousands) of candidate crystal structures by enumerating over key variables, such as space groups, and employing heuristics. Single point evaluations with empirical force fields are used to select good candidates, based on lowest potential energy[10].

Ab initio random structure search (AIRSS), is another approach that has been explored [11], particularly for inorganic crystals. In this approach, crystal structures are determined by first guessing random positions of atoms within the unit cell, followed by geometry relaxations with DFT. Again, the lowest energy structure is chosen as the most probable structure. AIRSS has been employed successfully to find ground state structures of materials, molecules and features such as defects [11]. This process is powerful, but limited to small unit cells due to the poor scaling of DFT.

In the following we introduce a simple structure search procedure inspired by these ideas, which is appropriate for 2D HOIPs.

IV.1 A Random Structure Search Procedure for 2D HOIPs

Our proposed structure searching workflow is summarised as follows: For a given organic cation and inorganic layer, generate a fixed number of candidate structures, which cover the space of feasible molecular and atomic arrangements. The geometry of all structures is then relaxed to a local minimum using the MLIP and the lowest potential energy structure is declared as the most probable crystal structure. The process for generating random candidate structures is key, and a scheme was designed based on several simple heuristics. The steps are summarised as follows and shown visually in Fig. 5:

1.

The starting information is the identity of the organic cation, the choice of halide, and the desired size of the unit cell. The size is determined by the number of organic/inorganic layers, and the number of octahedra per layer in the unit cell.
2.

For the given composition, construct the 3D geometry of the organic cation (enumerating or sampling conformers if necessary). Also construct the untilted, strain-free inorganic layer from lead and the chosen halide. This determines the periodicity of the system in the in-plane directions.
3.

Identify ‘reference points’ on the cation and the inorganic layer. On the cation, reference points are defined as formally charged atoms or salient atoms. On the inorganic layer, the reference points are chosen to be the midpoints between protruding halides, as shown in Fig. 5.
4.

Based on the charge of the organic cation, determine the number of cations per layer required for charge neutrality. For each organic cation in the unit cell, randomly generate a set of reflections and rotations to apply to the organic cation. Subsequently, place the transformed cations onto the inorganic layer by pairing reference points on the two geometries.
5.

Check for any intersections between cations, or intersections of cations with the inorganic layer. Discard samples for which these components intersect one another.
6.

Fix the lattice constant in the out-of-plane direction to remove most of the vacuum region from the cell, including some amount of shear of the unit cell. If more than one inorganic/organic layer per unit cell is desired, repeat the above procedure and stack the resulting geometries.

This process gives structures which sample the configuration space well but which can contain high energy features, such as regions of vacuum or atoms at energetically unfavourable separations. Crucially, the configurations are sufficiently sensible that geometry relaxation leads to reasonable structures.

A python package was written to implement this algorithm which is available at https://github.com/WillBaldwin0/LDHP-builder. The algorithm is specific to 2D corner-sharing HOIPs, since it relies on heuristics when placing molecules onto the inorganic layer. In practice, it was found that these heuristics perform remarkably well. Further details are provided in section X.4.

IV.2 Validation of the Model on Randomly Generated Structures

For an MLIP to be useful for the structure prediction task, the model must be accurate for the randomly generated structures and must not exhibit many spurious local minima. Crucially, it should reliably relax the structures to nearby DFT minima.

To demonstrate the accuracy of the model and structure searching method, we present the results of the process applied to a known 2D perovskite in Fig. 6. Specifically, we take the the perovskite formed by \cePbI6 octahedra in the inorganic layer and the organic cation \ceNH3+[C]6NH3+. The (geometry relaxed) experimentally reported structure is shown on the right of Fig. 6(c).

Given the composition, 100 random structures were generated using the random generation procedure. Three examples of such structures are shown in Fig. 6b. To simplify this demonstration, only the correct conformer of the organic cation was used to generate samples. Subsequently, these 100 structures were relaxed using the MACE model. Since the initial samples are relatively high in energy - often containing a considerable amount of vacuum or non-physical molecular arrangements - these geometry relaxations require several hundreds or even thousands of steps. Fig. 6(a) shows the distribution of energy of the resulting structures, ordered by increasing energy, relative to that of the experimentally reported structure. Also shown is the energy of the relaxed samples after re-evaluation with DFT.

Several important features can be noted. Firstly, due to the nature of the long organic cation, which can stack in a variety of ways, the relaxation process reveals many local minima in the potential energy landscape. These appear as plateaus in the energy plot (bottom panel of Fig. 6(a)). After re-calculation of these structures with DFT, we see that the MLIP energy landscape is broadly correct in that these minima are correctly ordered with respect to DFT. The absolute energy error is also very low, being around 1 meV/atom which is roughly the accuracy of the model. Furthermore, the top of panel of Fig. 6(a) shows the root mean square force, according to DFT, of the MLIP identified minima. For all but the highest energy configurations, the DFT forces are less than 10 meV/Å, suggesting these are close to DFT minima.

The lowest energy structures identified by this procedure (the first five blue marks in Fig. 6a) have energy equal to that of the experimentally reported structure. This suggests that the process has indeed re-discovered the the experimentally reported structure. This was confirmed by examining the five lowest energy relaxed structures. Up to rotations, reflections and cell reductions, these structures are identical and match the experimentally reported structure as shown in Fig. 6(c).

IV.3 Structure Prediction Performance across the Dataset

We now demonstrate the usefulness of this procedure across a wider variety of 2D perovskites. The method described above has been applied to 13 structures in the dataset. Fig. 7 summarises the results of this process. The lower rows identify the perovskite structure, via the halide in the inorganic layer and the organic cation. The upper panel shows the distribution of energies of the relaxed structures, following the random generation and relaxation process, with respect to that of the experimentally known structures. For this demonstration, 200 random structures were generated for each halide/cation combination. Only 200 samples were required, since all but the last two structures in figure 7 have unit cells containing only 2 organic cations.

Fig. 7 also highlights which structures were present in the training set of the MLIP model. For the left-most structures, samples of these perovskites acquired from molecular dynamics during the active learning process are present in the training set of our model. For the next set of structures, the organic cation is present somewhere in the dataset, but the combination of cation and inorganic layer is not present. For the four right-most structures, the organic cation is not present in the dataset.

In all but three cases the identified structures with lowest energy correspond to the energy of the experimental structure. Subsequent comparison showed that these structures did indeed match the experimentally reported version. Therefore, the combination of a simple random structure searching scheme with the developed MLIP can successfully identify the ground state structure of these complex systems.

In the three cases for which the lowest energy structure does not match the experimentally reported structure, one structure search failed to find any structures with energy as low as that of the experimental structure within the 200 searches (the lowest energy found was about 2 meV/atom higher than the energy of the experimental structure). In the other cases, we confirmed that the procedure found the experimental minima as well as a lower energy structure. Subsequent evaluation with DFT revealed that these structures were also assigned a lower energy than the experimental structure by DFT.

IV.4 Prediction and Synthesis of new 2D Hybrid Perovskites

Finally, the structure search process was performed for a new organic cation with no previously known corresponding perovskite. Specifically, the combination of cis-1,3-cyclohexanediamine with a Pb–I inorganic layer was studied. This molecule is not present in our dataset, but consists of chemical groups which are well represented.

The structure searching procedure was conducted with a unit cell containing 8 copies of the organic cation, across two layers. In total, 6000 samples were generated, with the large number being required due to the large number of molecules present in the unit cell. The perovskite was synthesised via slow hydrothermal growth and the resulting structure was determined, at 200 K, using a diffractometer as described in section X.5.

Figures 8a and 8b show the resulting lowest energy structure (denoted as “minimum 0”), as well as the 5 next lowest energy structures that were predicted. As shown in the figure, the energy differences between the lowest lying minima are extremely small, with the 5 next best minima being only 0.5 meV/atom higher in energy the ground state. This energy difference is smaller than the likely error in our model, as well as the error of DFT due to finite k-point sampling.

Several interesting points can be made about these results: firstly, the lowest lying minimum found by the structure search process agrees with the experimentally measured structure. Since our model is fitted to DFT data which does not perfectly match reality, differences are unavoidable in quantities such as equilibrium bond lengths, where the PBE functional makes an error. However, we can confirm that we predict the right structure by performing a geometry relaxation, with our model, of the experimentally reported structure. This resulted in exactly minimum 0, and the relaxation trajectory only involved only minor changes in bond length, as shown in the supplementary information.

The 5 next lowest energy minima all involve similar orientations of the organic cation, but differ in the set of reflections applied to the cations or the out of plane stacking vector. It is interesting to examine how easily one could differentiate between these structures using different experimental techniques. This was done by measuring the powder x-ray diffraction pattern (pXRD) of the synthesised perovksite, and comparing to the computed pattern of the lowest energy minima, as shown in Fig. 8c. We compare the experimental result to that predicted from the experimentally reported structure, as well as minima 0, 2 and 5. The differences between the predicted XRD of the experimentally reported structure and minimum 0 (orange and green in Fig. 8c) come only from the aforementioned small differences in bond lengths. Interestingly, the spectra of the three numbered minima are almost indistinguishable; it would be extremely difficult to robustly differentiate these structures from the pXRD alone.

Furthermore, the small differences in molecular stacking lead to different optical properties. For example, out of the six structures in Fig. 8, only minimum 0 is centrosymmetric. This means that it will not exhibit circular dichroism which is necessary for certain applications of 2D perovskites. When targeting certain properties, a full picture of the landscape of low energy minimum is clearly important. Our structure searching method offers a window into this landscape, which could be used to choose experimental methods, or gain confidence in conclusions.

Further predictions were made for 4 other organic cations which had no previously known perovskite structures. These are discussed in the supplementary information.

IV.5 Scalability and Computational Cost

Performing the above process requires geometry relaxation of many large crystal structures, starting from high energy configurations. Typically, several hundred relaxations are required with hundreds to thousands of dynamics steps for each relaxation.

In the structure searching process for the 13 structures in Fig. 7, the average unit cell size was 78 atoms and 200 samples were generated for each system. The entire set of calculations used to produce Fig. 7 was performed in only 20 hours on a single A100 GPU. This suggests that wide searches can be performed using modest computational resources. By comparison, a single DFT relaxation of one sample of the randomly generated structures shown in Fig. 6 (similar in size and complexity to Fig 7 structures), performed on two nodes (256 cores) of AMD EPYC 7742, can take more than one day. Furthermore, the cubic scaling of the DFT calculation makes the same task for much larger systems infeasible.

One can also see the computational advantage of using our model in the structure search for the newly synthesised perovskite (section IV.4), which has 8 molecules and 232 atoms in the unit cell. For each sample of the 6000 generated structures, relaxation took between 2000 to 4000 steps, leading to a total computational cost of 240 GPU hours. We estimate that performing the same relaxations with DFT would require approximately 1.2 million CPU node-hours. In this case the speedup corresponds to a factor of $10^{4}$ . Note that the absolute times of course depend on the type of GPU and CPU hardware making a direct comparison not straightforward, but in terms of costing computational resources, an A100 GPU hour is approximately comparable to a node-hour with 128 CPU cores, and hence is the basis for the figures given above.

V Extrapolation to Underrepresented Organics

We have demonstrated good performance of the trained MLIP both in terms of single point accuracy and in relaxing randomly generated HOIPs structures to global minima. However, it is the case that many of the organic molecules in the test set are structurally and chemically similar to the organics in the training set. Here we demonstrate an example of organic cation in our test set, cyclopropanaminium (shown in Fig. 9), for which the model performs poorly and suggest an efficient way retraining the model to improve the predictions.

For the original MLIP, which has not been trained on any examples of cyclopropanaminium, committee MD simulations immediately exceed the prescribed uncertainty threshold, indicating an uncertainty of the model in predicting forces. On samples that are taken from this MD, the model makes a large error with respect to DFT with a force RMSE of 228.7 meV/Å.

One approach would be to add some of these uncertain high-error samples to the training set and use the AL cycle to improve the potential for that specific organic. This is not possible when no experimental structure is known, since an initial structure is needed for running the active learning. As shown in the supplementary information Section II, we tested several approaches in which DFT calculations of only the isolated organic molecule were added to the training set, but these failed to improve the accuracy of the model to an acceptable level.

Another way of approaching this problem is to use the structure prediction algorithm to generate HOIP structures with the new organic cation. One can relax these candidates with the model, take samples from the relaxation trajectories, and add them to the training set. As shown in Fig. 9, when the model is trained with 200 distinct samples from relaxation trajectories of cyclopropanaminium lead iodide, the resulting force RMSE is 95.3 meV/Å, a slight improvement over the original model. The meager improvement may be because many of the predicted relaxed structures are very similar, in terms of bond distances and orientation of the organic and inorganic components. A successful approach is to combine the structure prediction tool with MD simulations. Instead of taking 200 samples from the relaxations trajectories, we take only the 10 most stable structures predicted by the structure prediction algorithm, run short MD simulations (5 ps) and take samples uniformly every 1 ps from the MD trajectories. Note that we do not terminate the MD simulations based on uncertainty. Using this to add new data, the error in forces drops to 14.6 meV/Å, within the range of previously seen compositions in the training set. This is achieved with only a total of 50 new samples, and in one cycle of retraining.

This approach works because the original MLIP predicts physically reasonable structures, despite the large error in forces with respect to DFT. In particular, the model relaxes the randomly generated configurations to sensible structures, in terms of the organic-inorganic stacking pattern. Similarly, MD simulations, while they may exhibit high committee uncertainty in forces, do not lead to unrealistic structures (no bond-breaking or coalescence of atoms).

VI Conclusions

We have presented an efficient, accurate and general machine learning force field (MLIP) using the MACE architecture for lead based 2D HOIPs involving organic cations containing carbon, hydrogen, nitrogen and oxygen. Our model performs well on single point energy and force predictions on samples taken from molecular dynamics simulations and can extrapolate to unseen organic cations.

Furthermore, the model is appropriate for performing high throughput screening of this class of materials. A simple random structure search procedure has been presented which, when paired with the MACE model, is able to rediscover the experimentally reported structure for a number of 2D perovskites in our database. The model is demonstrably accurate during this process, correctly reproducing the complex energy landscape, as shown by exploring specific examples with DFT. The computational cost of our structure generation process and model is small enough that this procedure can be applied at scale.

Finally, our method was validated by synthesising a new perovskite composition. Besides predicting the correct structure, the model revealed a delicate landscape of low-lying energy minima, which in its self could be a useful investigative capability.

VII Data Availablility Statement

The committee of MACE models trained on the full training set is available in a zenodo repository 10.5281/zenodo.10729400. The full train and test sets are also available as a python pandas dataframe. The experimentally determined newly synthesized structure, as well as the five predicted lowest energy structures found by our process, are also available. The random structure generation algorithm was implemented in a python package which can be found at https://github.com/WillBaldwin0/LDHP-builder.

VIII Acknowledgements

WB, CS, and CG thank the AFRL for partial funding of this project through grant FA8655-21-1-7010. CS and NK gratefully acknowledge the University of South Carolina for the support provided through start-up funds and support from the U.S. Department of Energy, Office of Science, Basic Energy Sciences (DE-SC0022247). This work utilized computational resources from the ARCHER2 UK National Supercomputing Service (http://www.archer2.ac.uk) which is funded by EPSRC via the membership of the UK Car-Parrinello Consortium, the Cambridge Service for Data Driven Discovery (CSD3), and University of South Carolina Hyperion HPC cluster. The authors thank Volker Blum for discussions about the utility of structure searching for these materials.

IX Conflict of Interest

CG is director of Symmetric Group LLP, which markets some MACE models commercially. The other authors declare that they have no conflicts of interest.

X Methods

X.1 MACE Machine Learning Interatomic Potentials

This work has utilised the MACE framework for constructing machine learned interatomic potentials [37]. MACE is a recently developed equivariant message passing tensor network which offers state of the art accuracy. The MACE architecture has been described and evaluated in detail previously [27, 28, 29, 30]. Therefore, the following description simply discusses some key aspects of the model design.

A MACE model predicts the total energy of a system as a sum of atom centered contributions. The environment around an atom is described by the atomic number and relative positions of neighbouring atoms, up to some fixed cutoff: $\mathcal{N}(i)=\{z_{j},\mathbf{r}_{ij}\}_{j|r_{ij}<r_{cut}}$ . The MACE architecture utilises ideas from the atomic cluster expansion to efficiently construct atom centered features based on the local environment. These atom centered features are many-body, in that they depend simultaneously on atomic numbers and positions of several neighbours in a non-trivial way. These features are iteratively updated, and the final energetic contribution from each atom is expressed as a learnable function of these features.

The specifications of the MACE models used in this work are, in the nomenclature of reference [29], given in table 2.

Table 2: Specifications of MACE models used in this study

number of chemical channels	128
maximum equivariance order L	1
single layer cutoff radius (Å)	5
number of layers	2

X.2 Molecular-Dynamics and Geometry Relaxations with MACE potentials

All molecular dynamics (MD) simulations were carried using the atomic simulation environment (ASE) package [38] in the NPT ensemble at 300 K and 1 atmosphere. A Nosé–Hoover thermostat [39, 40] was used. During active learning, MD simulations were propagated using the average prediction of 3 committee members. The relative force uncertainty $f_{rel}^{i}$ is defined as

f_{rel}^{i}=\frac{\sigma_{i}}{\left|\bar{F}_{i}\right|+\epsilon},

(1)

where $\sigma_{i}$ and $\bar{F}_{i}$ denote the standard deviation and mean of forces over the committee members on atom i. $\epsilon$ is a regularizer to avoid diverging ratios for small forces. At each MD step the atom with the greatest $f_{rel}^{i}$ is selected, and this value is compared against the predefined threshold of 0.2. If this uncertainty indicator exceeds the threshold, the simulation is terminated. The regularizer $\epsilon$ for all the simulations was set to 0.2 eV/Å.

All geometry relaxations have been done using preconditioned LBFGS as the optimiser[41]. During relaxations, both cell sizes and atomic positions are allowed to change and the relaxed cell is achieved when the maximum force on each atom is less than 1 meV/Å.

X.3 Electronic Structure Calculations

All the electronic structure calculations for either relaxation or single point calculations are performed using Vienna Ab initio Simulations Package (VASP) [42, 43] with the PBE [44] for the exchange-correlation functional and the projector augmented-wave (PAW) pseudopotentials [45, 46]. Dispersion energy-corrections are applied using D3 approximation [47]. A $\Gamma$ -centered Monkhorst-Pack [48] k-point grids are used to sample the Brillouin zones, with a density of 1000 k-points per number of atoms, with divisions along each reciprocal lattice vector proportional to its length, as implemented in pymatgen [49]. The electronic wave functions were expanded in a plane wave basis set with an energy cutoff of 600 eV.

X.4 An Algorithm for Random Structure Generation of 2D HOIPs

A random structure generation algorithm has been developed to demonstrate the usefulness of the MACE model. Our algorithm is not intended to be completely general and relies on several simplifications. Future developments could utilise methods from organic crystal structure prediction for more generality. The code used in this project is available as a python package at https://github.com/WillBaldwin0/LDHP-builder.

The procedure is as follows: The inorganic layer is first generated from lead and the chosen halide as a monolayer of regular lead–hailde octahedra. The lead–halide bond length is chosen to be the average of such bonds across our training set.

For each organic cation to be placed into the unit cell, the following process is performed. We assume that the molecule joins to the layer in a certain way: Salient points on the molecule are defined as the heavy atoms on the ‘extremities’. In practice this is done by first finding the moment of inertia tensor of the molecule and interpreting the eigenvectors as a local coordinate basis for the molecule. For molecules which are longest in a certain direction, the eigenvector with the smallest eigenvalue is generally directed along this direction. The extremities of the molecule are defined as the heavy atoms which have the largest relative distance between one another when projected onto this vector. One of these heavy atoms serves as a reference atom which is placed onto a given point on the inorganic layer. The orientation of the cation is then determined by first applying a random rotation about the selected atom, and subsequently applying up to two reflections in planes normal to the lattice vectors of the inorganic monolayer.

After repeating the above for each molecule in the unit cell, the molecules are checked for intersections. Assuming there are no intersections, the out of plane lattice constant is fixed to remove as much vacuum from the cell as possible. The result of this procedure is a monolayer with a set of organic cations at a random orientation on the layer. See Fig. 6b for example structures from this procedure.

X.5 Synthesis and Characterization of Perovskite Materials for Verification of Modeled Results

The perovskite \ce1,3-(cis)-cyclohexanediamine-PbI4 was synthesized in order to compare the observed crystal structure with the results obtained via the computational methods described previously. Crystals of the perovskite were obtained through slow hydrothermal growth by dissolving equimolar amounts of the amine and lead (II) iodide in concentrated hydriodic acid in a sealed pressure vessel at 150 ${}^{\circ}$ C and cooled at a rate of 5 ${}^{\circ}$ C per hour, resulting in the formation of mm-scale orange crystalline chunks. Residual hydriodic acid was removed by washing with methylene chloride and diethyl ether, followed by drying under vacuum for several days. The crystal pieces are highly stable to ambient atmosphere and demonstrate no signs of decomposition over weeks of storage.

The crystal structure was determined using a Rigaku XtaLAB Synergy diffractometer. Crystal samples were mounted in oil on a ring-loop and placed in a cryo N ${}_{2}$ stream at 200 K. CrysAlis Pro was used to screen and collect diffraction patterns using Mo K ${}_{\alpha}$ ( $\lambda=0.71073$ Å). A full sphere of diffraction data was collected, and multiscan empirical absorption correction was applied. The maximum resolution that was achieved was $\Theta$ = 31.00 ${}^{\circ}$ (0.69 Å). The structures were solved with the ShelXT (Sheldrick, 2016) structure solution program using the Intrinsic Phasing solution method and by using Olex2 (Dolomanov et al., 2009) as the graphical interface. The model was refined with version 2016/6 of ShelXL 2016/6 (Sheldrick, 2015) using Least Squares minimisation. The crystal structure was determined with minimal guidance beyond initial atomic assignment and the resulting solved structure featured a low R value indicating that the solved structure aligned well with the atomic positions observed in the diffraction pattern.

Powder XRD was measured on a Rigaku SmartLab as an additional point of comparison between both the predicted and experimental crystal structures to assess the presence of any additional crystal phases at room temperature that may contribute to different structural behavior. Samples were prepared from the as-grown \ceABX4 perovskite crystals by grinding in a mortar and pestle to ensure uniform distribution of powder particle size and orientation. All measurements were performed at room temperature under ambient atmosphere. The $\theta/2\theta$ spectra of the perovskite powders were then compared to predicted patterns generated from either the experimental or as-modeled crystal structures.

Simulations of pXRD were performed in the VESTA structure visualization software package [50].

References

[1] D. Weber, “Ch3nh3pbx3, ein pb(ii)-system mit kubischer perowskitstruktur / ch3nh3pbx3, a pb(ii)-system with cubic perovskite structure,” Zeitschrift für Naturforschung B, vol. 33, no. 12, pp. 1443–1445, 1978. [Online]. Available: https://doi.org/10.1515/znb-1978-1214
[2] L. Mao, W. Ke, L. Pedesseau, Y. Wu, C. Katan, J. Even, M. R. Wasielewski, C. C. Stoumpos, and M. G. Kanatzidis, “Hybrid dion–jacobson 2d lead iodide perovskites,” Journal of the American Chemical Society, vol. 140, no. 10, pp. 3775–3783, 2018, pMID: 29465246. [Online]. Available: https://doi.org/10.1021/jacs.8b00542
[3] C. C. Stoumpos, D. H. Cao, D. J. Clark, J. Young, J. M. Rondinelli, J. I. Jang, J. T. Hupp, and M. G. Kanatzidis, “Ruddlesden–popper hybrid lead iodide perovskite 2d homologous semiconductors,” Chemistry of Materials, vol. 28, no. 8, pp. 2852–2867, 2016. [Online]. Available: https://doi.org/10.1021/acs.chemmater.6b00847
[4] C. C. Stoumpos, C. M. M. Soe, H. Tsai, W. Nie, J.-C. Blancon, D. H. Cao, F. Liu, B. Traoré, C. Katan, J. Even, A. D. Mohite, and M. G. Kanatzidis, “High members of the 2d ruddlesden-popper halide perovskites: Synthesis, optical properties, and solar cells of (ch3(ch2)3nh3)2(ch3nh3)4pb5i16,” Chem, vol. 2, no. 3, pp. 427–440, 2017. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S2451929417300736
[5] L. Mao, C. C. Stoumpos, and M. G. Kanatzidis, “Two-dimensional hybrid halide perovskites: Principles and promises,” Journal of the American Chemical Society, vol. 141, no. 3, pp. 1171–1190, 2019, pMID: 30399319. [Online]. Available: https://doi.org/10.1021/jacs.8b10851
[6] I. C. Smith, E. T. Hoke, D. Solis-Ibarra, M. D. McGehee, and H. I. Karunadasa, “A layered hybrid perovskite solar-cell absorber with enhanced moisture stability,” Angewandte Chemie International Edition, vol. 53, no. 42, pp. 11 232–11 235, 2014. [Online]. Available: https://onlinelibrary.wiley.com/doi/abs/10.1002/anie.201406466
[7] D. H. Cao, C. C. Stoumpos, O. K. Farha, J. T. Hupp, and M. G. Kanatzidis, “2d homologous perovskites as light-absorbing materials for solar cell applications,” Journal of the American Chemical Society, vol. 137, no. 24, pp. 7843–7850, 2015, pMID: 26020457. [Online]. Available: https://doi.org/10.1021/jacs.5b03796
[8] G. P. P. Pun, R. Batra, R. Ramprasad, and Y. Mishin, “Physically informed artificial neural networks for atomistic modeling of materials,” Nature Communications, vol. 10, no. 1, p. 2339, may 2019. [Online]. Available: https://doi.org/10.1038/s41467-019-10343-5
[9] E. R. Dohner, E. T. Hoke, and H. I. Karunadasa, “Self-assembly of broadband white-light emitters,” Journal of the American Chemical Society, vol. 136, no. 5, pp. 1718–1721, 2014, pMID: 24422494. [Online]. Available: https://doi.org/10.1021/ja411045r
[10] G. M. Day, “Current approaches to predicting molecular organic crystal structures,” Crystallography Reviews, vol. 17, no. 1, pp. 3–52, 2011. [Online]. Available: https://doi.org/10.1080/0889311X.2010.517526
[11] C. J. Pickard and R. J. Needs, “Ab initio random structure searching,” Journal of Physics: Condensed Matter, vol. 23, no. 5, p. 053201, jan 2011. [Online]. Available: https://dx.doi.org/10.1088/0953-8984/23/5/053201
[12] A. P. Bartók, S. De, C. Poelking, N. Bernstein, J. R. Kermode, G. Csányi, and M. Ceriotti, “Machine learning unifies the modeling of materials and molecules,” Science Advances, vol. 3, no. 12, p. e1701816, 2017. [Online]. Available: https://www.science.org/doi/abs/10.1126/sciadv.1701816
[13] F. Musil, A. Grisafi, A. P. Bartók, C. Ortner, G. Csányi, and M. Ceriotti, “Physics-inspired structural representations for molecules and materials,” Chemical Reviews, vol. 121, pp. 9759–9815, 2021.
[14] K. T. Butler, D. W. Davies, H. Cartwright, O. Isayev, and A. Walsh, “Machine learning for molecular and materials science,” Nature, vol. 559, no. 7715, pp. 547–555, Jul 2018. [Online]. Available: https://doi.org/10.1038/s41586-018-0337-2
[15] V. L. Deringer, A. P. Bartók, N. Bernstein, D. M. Wilkins, M. Ceriotti, and G. Csányi, “Gaussian process regression for materials and molecules,” Chemical Reviews, vol. 121, no. 16, pp. 10 073–10 141, 2021.
[16] V. L. Deringer, N. Bernstein, G. Csányi, C. B. Mahmoud, M. Ceriotti, M. Wilson, D. A. Drabold, and S. R. Elliott, “Origins of structural and electronic transitions in disordered silicon,” Nature, vol. 589, pp. 59–64, 1 2021.
[17] J. Behler and M. Parrinello, “Generalized neural-network representation of high-dimensional potential-energy surfaces,” Physical Review Letters, vol. 98, no. 14, 4 2007.
[18] A. P. Bartók, M. C. Payne, R. Kondor, and G. Csányi, “Gaussian approximation potentials: The accuracy of quantum mechanics, without the electrons,” Physical Review Letters, vol. 104, no. 13, 4 2010.
[19] A. P. Bartók, R. Kondor, and G. Csányi, “On representing chemical environments,” Phys. Rev. B, vol. 87, p. 184115, May 2013. [Online]. Available: https://link.aps.org/doi/10.1103/PhysRevB.87.184115
[20] A. V. Shapeev, “Moment tensor potentials: A class of systematically improvable interatomic potentials,” Multiscale Modeling Simulation, vol. 14, no. 3, pp. 1153–1173, 2016. [Online]. Available: https://doi.org/10.1137/15M1054183
[21] R. Drautz, “Atomic cluster expansion for accurate and transferable interatomic potentials,” Phys. Rev. B, vol. 99, p. 014104, Jan 2019. [Online]. Available: https://link.aps.org/doi/10.1103/PhysRevB.99.014104
[22] ——, “Atomic cluster expansion of scalar, vectorial, and tensorial properties including magnetism and charge transfer,” Phys. Rev. B, vol. 102, p. 024104, Jul 2020. [Online]. Available: https://link.aps.org/doi/10.1103/PhysRevB.102.024104
[23] J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, and G. E. Dahl, “Neural message passing for quantum chemistry,” 2017. [Online]. Available: https://arxiv.org/abs/1704.01212
[24] S. Batzner, A. Musaelian, L. Sun, M. Geiger, J. P. Mailoa, M. Kornbluth, N. Molinari, T. E. Smidt, and B. Kozinsky, “E(3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials,” Nature Communications, vol. 13, no. 1, p. 2453, 2022.
[25] I. Batatia, S. Batzner, D. P. Kovács, A. Musaelian, G. N. C. Simm, R. Drautz, C. Ortner, B. Kozinsky, and G. Csanyi, “The design space of e(3)-equivariant atom-centered interatomic potentials,” 5 2022. [Online]. Available: http://arxiv.org/abs/2205.06643
[26] A. Merchant, S. Batzner, S. S. Schoenholz, M. Aykol, G. Cheon, and E. D. Cubuk, “Scaling deep learning for materials discovery,” Nature, vol. 624, no. 7990, pp. 80–85, Dec 2023. [Online]. Available: https://doi.org/10.1038/s41586-023-06735-9
[27] I. Batatia, D. P. Kovacs, G. Simm, C. Ortner, and G. Csányi, “Mace: Higher order equivariant message passing neural networks for fast and accurate force fields,” Advances in Neural Information Processing Systems, vol. 35, pp. 11 423–11 436, 2022.
[28] D. P. Kovács, I. Batatia, E. S. Arany, and G. Csányi, “Evaluation of the MACE force field architecture: From medicinal chemistry to materials science,” The Journal of Chemical Physics, vol. 159, no. 4, p. 044118, 07 2023. [Online]. Available: https://doi.org/10.1063/5.0155322
[29] D. P. Kovács, J. H. Moore, N. J. Browning, I. Batatia, J. T. Horton, V. Kapil, W. C. Witt, I.-B. Magdău, D. J. Cole, and G. Csányi, “Mace-off23: Transferable machine learning force fields for organic molecules,” 2023. [Online]. Available: https://arxiv.org/abs/2312.15211
[30] I. Batatia, P. Benner, Y. Chiang, A. M. Elena, D. P. Kovács, J. Riebesell, X. R. Advincula, M. Asta, M. Avaylon, W. J. Baldwin, F. Berger, N. Bernstein, A. Bhowmik, S. M. Blau, V. Cărare, J. P. Darby, S. De, F. D. Pia, V. L. Deringer, R. Elijošius, Z. El-Machachi, F. Falcioni, E. Fako, A. C. Ferrari, A. Genreith-Schriever, J. George, R. E. A. Goodall, C. P. Grey, P. Grigorev, S. Han, W. Handley, H. H. Heenen, K. Hermansson, C. Holm, J. Jaafar, S. Hofmann, K. S. Jakob, H. Jung, V. Kapil, A. D. Kaplan, N. Karimitari, J. R. Kermode, N. Kroupa, J. Kullgren, M. C. Kuner, D. Kuryla, G. Liepuoniute, J. T. Margraf, I.-B. Magdău, A. Michaelides, J. H. Moore, A. A. Naik, S. P. Niblett, S. W. Norwood, N. O’Neill, C. Ortner, K. A. Persson, K. Reuter, A. S. Rosen, L. L. Schaaf, C. Schran, B. X. Shi, E. Sivonxay, T. K. Stenczel, V. Svahn, C. Sutton, T. D. Swinburne, J. Tilly, C. van der Oord, E. Varga-Umbrich, T. Vegge, M. Vondrák, Y. Wang, W. C. Witt, F. Zills, and G. Csányi, “A foundation model for atomistic materials chemistry,” 2024. [Online]. Available: https://arxiv.org/abs/2401.00096
[31] “2d perovskites database - the laboratory of new materials for solar energetics.” [Online]. Available: http://pdb.nmse-lab.ru/
[32] C. R. Groom, I. J. Bruno, M. P. Lightfoot, and S. C. Ward, “The Cambridge Structural Database,” Acta Crystallographica Section B, vol. 72, no. 2, pp. 171–179, Apr 2016. [Online]. Available: https://doi.org/10.1107/S2052520616003954
[33] M.-H. Tremblay, A. Boyington, S. Rigin, J. Jiang, J. Bacsa, K. Al Kurdi, V. N. Khrustalev, R. Pachter, T. V. Timofeeva, N. Jui et al., “Hybrid organic lead iodides: Role of organic cation structure in obtaining 1d chains of face-sharing octahedra vs 2d perovskites,” Chemistry of Materials, vol. 34, no. 3, pp. 935–946, 2022.
[34] A. M. Tokita and J. Behler, “How to train a neural network potential,” The Journal of Chemical Physics, vol. 159, no. 12, p. 121501, 09 2023. [Online]. Available: https://doi.org/10.1063/5.0160326
[35] C. van der Oord, M. Sachs, D. P. Kovács, C. Ortner, and G. Csányi, “Hyperactive learning for data-driven interatomic potentials,” npj Computational Materials, vol. 9, no. 1, p. 168, Sep 2023. [Online]. Available: https://doi.org/10.1038/s41524-023-01104-6
[36] Y. Rubner, C. Tomasi, and L. J. Guibas, “The Earth Mover’s Distance as a Metric for Image Retrieval,” International Journal of Computer Vision, vol. 40, no. 2, pp. 99–121, 2000. [Online]. Available: https://doi.org/10.1023/A:1026543900054
[37] I. Batatia, D. P. Kovacs, G. Simm, C. Ortner, and G. Csanyi, “Mace: Higher order equivariant message passing neural networks for fast and accurate force fields,” in Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, Eds., vol. 35. Curran Associates, Inc., 2022, pp. 11 423–11 436. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2022/file/4a36c3c51af11ed9f34615b81edb5bbc-Paper-Conference.pdf
[38] A. H. Larsen, J. J. Mortensen, J. Blomqvist, I. E. Castelli, R. Christensen, M. Dułak, J. Friis, M. N. Groves, B. Hammer, C. Hargus, E. D. Hermes, P. C. Jennings, P. B. Jensen, J. Kermode, J. R. Kitchin, E. L. Kolsbjerg, J. Kubal, K. Kaasbjerg, S. Lysgaard, J. B. Maronsson, T. Maxson, T. Olsen, L. Pastewka, A. Peterson, C. Rostgaard, J. Schiøtz, O. Schütt, M. Strange, K. S. Thygesen, T. Vegge, L. Vilhelmsen, M. Walter, Z. Zeng, and K. W. Jacobsen, “The atomic simulation environment—a python library for working with atoms,” Journal of Physics: Condensed Matter, vol. 29, no. 27, p. 273002, jun 2017. [Online]. Available: https://dx.doi.org/10.1088/1361-648X/aa680e
[39] S. Nosé, “A unified formulation of the constant temperature molecular dynamics methods,” The Journal of Chemical Physics, vol. 81, no. 1, pp. 511–519, 07 1984. [Online]. Available: https://doi.org/10.1063/1.447334
[40] W. G. Hoover, “Canonical dynamics: Equilibrium phase-space distributions,” Phys. Rev. A, vol. 31, pp. 1695–1697, Mar 1985. [Online]. Available: https://link.aps.org/doi/10.1103/PhysRevA.31.1695
[41] D. Packwood, J. Kermode, L. Mones, N. Bernstein, J. Woolley, N. Gould, C. Ortner, and G. Csányi, “A universal preconditioner for simulating condensed phase materials,” The Journal of Chemical Physics, vol. 144, no. 16, p. 164109, 04 2016. [Online]. Available: https://doi.org/10.1063/1.4947024
[42] G. Kresse and J. Furthmüller, “Efficiency of ab-initio total energy calculations for metals and semiconductors using a plane-wave basis set,” Computational Materials Science, vol. 6, no. 1, pp. 15–50, 1996.
[43] G. Kresse and J. Furthmüller, “Efficient iterative schemes for ab initio total-energy calculations using a plane-wave basis set,” Phys. Rev. B, vol. 54, pp. 11 169–11 186, Oct 1996. [Online]. Available: https://link.aps.org/doi/10.1103/PhysRevB.54.11169
[44] J. P. Perdew, K. Burke, and M. Ernzerhof, “Generalized Gradient Approximation Made Simple,” Physical Review Letters, vol. 77, pp. 3865–3868, Oct 1996. [Online]. Available: https://link.aps.org/doi/10.1103/PhysRevLett.77.3865
[45] P. E. Blöchl, “Projector augmented-wave method,” Physical Review B, vol. 50, no. 24, p. 17953, 1994.
[46] G. Kresse and D. Joubert, “From ultrasoft pseudopotentials to the projector augmented-wave method,” Physical Review B, vol. 59, no. 3, p. 1758, 1999.
[47] S. Grimme, J. Antony, S. Ehrlich, and H. Krieg, “A consistent and accurate ab initio parametrization of density functional dispersion correction (DFT-D) for the 94 elements H-Pu,” The Journal of Chemical Physics, vol. 132, no. 15, p. 154104, 04 2010. [Online]. Available: https://doi.org/10.1063/1.3382344
[48] H. J. Monkhorst and J. D. Pack, “Special points for brillouin-zone integrations,” Phys. Rev. B, vol. 13, pp. 5188–5192, Jun 1976. [Online]. Available: https://link.aps.org/doi/10.1103/PhysRevB.13.5188
[49] S. P. Ong, W. D. Richards, A. Jain, G. Hautier, M. Kocher, S. Cholia, D. Gunter, V. L. Chevrier, K. A. Persson, and G. Ceder, “Python materials genomics (pymatgen): A robust, open-source python library for materials analysis,” Computational Materials Science, vol. 68, pp. 314–319, 2013. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0927025612006295
[50] K. Momma and F. Izumi, “VESTA3 for three-dimensional visualization of crystal, volumetric and morphology data,” Journal of Applied Crystallography, vol. 44, no. 6, pp. 1272–1276, Dec 2011. [Online]. Available: https://doi.org/10.1107/S0021889811038970