Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Crystal-LSBO: Automated Design of De Novo Crystals with Latent Space Bayesian Optimization

Onur Boyar
Nagoya University
RIKEN
boyar.onur.nagoyaml@gmail.com
&Yanheng Gu
Nagoya University
gu.yanheng.nagoyaml@gmail.com
&Yuji Tanaka
Nagoya University
tanaka.yuji.nagoyaml@gmail.com
&Shunsuke Tonogai
DENSO CORP.
shunsuke.tonogai.j3y@jp.denso.com
&Tomoya Itakura
DENSO CORP.
tomoya.itakura.j8w@jp.denso.com
&Ichiro Takeuchi
Nagoya University
RIKEN
takeuchi.ichiro.n6@f.mail.nagoya-u.ac.jp
Corresponding author
Abstract

Generative modeling of crystal structures is significantly challenged by the complexity of input data, which constrains the ability of these models to explore and discover novel crystals. This complexity often confines de novo design methodologies to merely small perturbations of known crystals and hampers the effective application of advanced optimization techniques. One such optimization technique, Latent Space Bayesian Optimization (LSBO) has demonstrated promising results in uncovering novel objects across various domains, especially when combined with Variational Autoencoders (VAEs). Recognizing LSBO’s potential and the critical need for innovative crystal discovery, we introduce Crystal-LSBO—a de novo design framework for crystals specifically tailored to enhance explorability within LSBO frameworks. Crystal-LSBO employs multiple VAEs, each dedicated to a distinct aspect of crystal structure—lattice, coordinates, and chemical elements, orchestrated by an integrative model that synthesizes these components into a cohesive output. This setup not only streamlines the learning process but also produces explorable latent spaces thanks to the decreased complexity of the learning task for each model, enabling LSBO approaches to operate. Our study pioneers the use of LSBO for de novo crystal design, demonstrating its efficacy through optimization tasks focused mainly on formation energy values. Our results highlight the effectiveness of our methodology, offering a new perspective for de novo crystal discovery.

1 Introduction

The discovery and design of novel crystals are crucial for advancements in a range of scientific and industrial fields. The ability to engineer crystals with specific, desired properties holds the potential to influence future technological developments significantly. Therefore, establishing a framework to automate the discovery of de novo crystals is vital for various industries. A promising approach to achieving this goal involves the application of machine learning, particularly through generative modeling.

The field of generative models for crystal design has emerged as a focal point for researchers from diverse disciplines. Recent literature highlights various efforts employing techniques such as diffusion models [24], generative adversarial networks (GANs) [13], autoencoders (AEs) [17], and variational autoencoders (VAEs) [20]. Each of these methods addresses the challenge of capturing the complex nature of crystals, which encompasses factors like lattice structures, types of elements, and their coordinates. This complexity contributes to an expansive search space for designing new crystals with desired properties. Consequently, a generative model capable of learning lower-dimensional representations of input instances proves invaluable. Such a model can significantly reduce the search space, enabling more efficient exploration for target crystals with specific features. Among the aforementioned methods, VAEs stand out for their ability to map complex crystal structures into a manageable, lower-dimensional latent space and generate new instances from it, enabling efficient search space for de novo crystal designs. Thus, our study employs a VAE-based approach, leveraging VAEs’ potential and their effective integration with Bayesian Optimization (BO) techniques, referred to as Latent Space Bayesian Optimization (LSBO) [7]. LSBO operates in the latent space of VAE to balance exploration and exploitation efficiently, with demonstrated success in the discovery of novel instances in diverse fields. However, the existing straightforward application of LSBO encounters difficulties in the exploration phase with the complexity of crystal structures, typically focusing only on exploitation, i.e., being able to generate valid outputs only within the immediate vicinity of latent representations of known crystals. Given the considerable time and costs involved in crystal design, there is an urgent need for a sample-efficient methodology that enables extensive exploration of de novo crystals.

To this end, we introduce Crystal-LSBO, a framework to design de novo crystals that integrates a specialized set of VAEs, explicitly tailored to facilitate LSBO methodologies. Crystal-LSBO is designed to simplify the learning of crystal structures by employing multiple VAEs, each responsible for distinct components of the crystal structure: lattices, coordinates, and elements. After each VAE captures the latent representations of different crystal components, these are then combined by an integrative VAE into a unified and comprehensive material latent space, which forms the search space for LSBO algorithms. This structured approach not only streamlines the learning process but also significantly improves our ability to navigate and explore latent spaces—crucial for the discovery of novel and optimal crystal designs via LSBO in a sample-efficient manner.

To showcase the effectiveness of the Crystal-LSBO framework, we applied it to design de novo crystals with optimal formation energy values, a property that is related to the overall stability and functionality of materials. Results show the effectiveness of Crystal-LSBO, paving the way for new advancements in material science. Our contributions are listed as follows:

  1. 1.

    We introduce Crystal-LSBO, a unified framework using specialized VAE models for de novo discovery of crystals.

  2. 2.

    We demonstrate the explorability of Crystal-LSBO’s latent space, showcasing its ability to generate valid crystals across a broad region.

  3. 3.

    We apply LSBO to crystal design through the Crystal-LSBO framework, validating its effectiveness through de novo design experiments focusing on formation energy values.

2 Related Works

In crystal generative modeling, the intricate nature of crystal structure representations often requires employing complex generative models or using multiple models. For instance, Hoffmann et al. [11] employed a VAE paired with a U-Net to learn from 3D density maps, and Court et al. [6] proposed a Conditional-VAE and U-Net pairing. Another approach explored by Noh et al. [17] utilizes two AEs and a VAE, where embeddings of lattice parameters and chemical elements are learned by separate AEs and used as input to the VAE model to form the generative model. Besides, the literature offers diverse methods for representing crystal structure data. Chiba et al. [5] introduced neural structure fields for use in AE-based architectures. Ren et al. [19] developed an invertible representation combining real and reciprocal space features for VAE training. Building on this, Ren et al. [20] refined this method by incorporating Fourier-transformed crystal properties (FTCP) into the VAE, referred to as FTCP-VAE. Diverging from VAEs, GANs also found themselves applications in crystal generation tasks with examples like [13, 25]. Besides, in [24], a diffusion model based crystal generation and optimization method is proposed, and Xie et al. [23] employed a diffusion model in conjunction with a VAE. Although generative models and representation techniques of the crystals differ among these studies, the common strategy for generating new crystals is similar, which is to sample from the close neighborhood of the known crystals in the latent space. This is mainly because invalid crystals are generated if we sample from a wide range across the latent space.

LSBO has garnered significant attention, particularly in the field of organic molecule design. It was first introduced in this domain through the seminal work by Gomez-Bombarelli et al. [7]. They represented organic molecules as sequences and developed a VAE to explore the latent space for de novo design. Subsequently, numerous studies have focused on method development and practical applications to improve and extend the LSBO framework [9, 21, 16, 8, 1]. The primary reason LSBO is particularly advanced in organic molecular design is that organic molecules can be represented as sequences using approaches such as SMILES [22] and SELFIES [15]. Unfortunately, for our target area of crystal design, there is no simple representation like SMILES, making it difficult to apply LSBO directly. To our knowledge, no existing studies have successfully applied LSBO to crystal design. In this study, we develop a method for LSBO in crystal design by constructing latent spaces for each component of a crystal, namely the lattice, coordinate, and element, and then developing a unified latent space that integrates these components, within which the LSBO is conducted.

3 Preliminaries and Problem Setup

3.1 VAEs

A VAE [14] consists of an encoder fϕenc:𝒳𝒵:superscriptsubscript𝑓italic-ϕenc𝒳𝒵f_{\phi}^{\text{enc}}:\mathcal{X}\to\mathcal{Z}italic_f start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT enc end_POSTSUPERSCRIPT : caligraphic_X → caligraphic_Z and a decoder fθdec:𝒵𝒳:superscriptsubscript𝑓𝜃dec𝒵𝒳f_{\theta}^{\text{dec}}:\mathcal{Z}\to\mathcal{X}italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT dec end_POSTSUPERSCRIPT : caligraphic_Z → caligraphic_X, where 𝒳𝒳\mathcal{X}caligraphic_X represents the input space and 𝒵𝒵\mathcal{Z}caligraphic_Z the latent space. The encoder maps an input 𝒙𝒙\bm{x}bold_italic_x to a latent representation 𝒛𝒛\bm{z}bold_italic_z, while the decoder reconstructs the input from the latent space.

The encoder models the conditional probability qϕ(𝒛|𝒙)subscript𝑞italic-ϕconditional𝒛𝒙q_{\phi}(\bm{z}|\bm{x})italic_q start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT ( bold_italic_z | bold_italic_x ) as an approximation of the true posterior pθ(𝒛|𝒙)subscript𝑝𝜃conditional𝒛𝒙p_{\theta}(\bm{z}|\bm{x})italic_p start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( bold_italic_z | bold_italic_x ), and the decoder models pθ(𝒙|𝒛)subscript𝑝𝜃conditional𝒙𝒛p_{\theta}(\bm{x}|\bm{z})italic_p start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( bold_italic_x | bold_italic_z ). The VAE is trained by optimizing the following objective function:

VAE(θ,ϕ;𝒙)=𝔼qϕ(𝒛|𝒙)[logpθ(𝒙|𝒛)]βDKL(qϕ(𝒛|𝒙)p(𝒛)),superscriptVAE𝜃italic-ϕ𝒙subscript𝔼subscript𝑞italic-ϕconditional𝒛𝒙delimited-[]subscript𝑝𝜃conditional𝒙𝒛𝛽subscript𝐷KLconditionalsubscript𝑞italic-ϕconditional𝒛𝒙𝑝𝒛\mathcal{L}^{\text{VAE}}(\theta,\phi;\bm{x})=\mathbb{E}_{q_{\phi}(\bm{z}|\bm{x% })}[\log p_{\theta}(\bm{x}|\bm{z})]-\beta\,D_{\text{KL}}(q_{\phi}(\bm{z}|\bm{x% })\|p(\bm{z})),caligraphic_L start_POSTSUPERSCRIPT VAE end_POSTSUPERSCRIPT ( italic_θ , italic_ϕ ; bold_italic_x ) = blackboard_E start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT ( bold_italic_z | bold_italic_x ) end_POSTSUBSCRIPT [ roman_log italic_p start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( bold_italic_x | bold_italic_z ) ] - italic_β italic_D start_POSTSUBSCRIPT KL end_POSTSUBSCRIPT ( italic_q start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT ( bold_italic_z | bold_italic_x ) ∥ italic_p ( bold_italic_z ) ) , (1)

where DKLsubscript𝐷KLD_{\text{KL}}italic_D start_POSTSUBSCRIPT KL end_POSTSUBSCRIPT denotes the Kullback-Leibler divergence, β𝛽\betaitalic_β balances the reconstruction and regularization terms [10], and p(𝒛)𝑝𝒛p(\bm{z})italic_p ( bold_italic_z ) is typically set as a standard multivariate normal distribution 𝒩(𝟎,𝑰)𝒩0𝑰\mathcal{N}(\bm{0},\bm{I})caligraphic_N ( bold_0 , bold_italic_I ).

3.2 LSBO

In LSBO, we start with numerous unlabeled instances {𝒙i}i[𝒰]subscriptsubscript𝒙𝑖𝑖delimited-[]𝒰\{{\bm{x}_{i}}\}_{i\in[{\mathcal{U}}]}{ bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i ∈ [ caligraphic_U ] end_POSTSUBSCRIPT and a smaller set of labeled instances (𝒙i,yi)i[]subscriptsubscript𝒙𝑖subscript𝑦𝑖𝑖delimited-[]{(\bm{x}_{i},y_{i})}_{i\in[{\mathcal{L}}]}( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i ∈ [ caligraphic_L ] end_POSTSUBSCRIPT, where 𝒙i𝒳subscript𝒙𝑖𝒳\bm{x}_{i}\in{\mathcal{X}}bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ caligraphic_X represents inputs like crystal structures, and yi𝒴subscript𝑦𝑖𝒴y_{i}\in{\mathcal{Y}}\subseteq\mathbb{R}italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ caligraphic_Y ⊆ blackboard_R are their labels, such as formation energy. The sets of indices of the unlabeled and labeled instances are denoted as 𝒰𝒰{\mathcal{U}}caligraphic_U and {\mathcal{L}}caligraphic_L, respectively. BO optimizes a costly black-box (BB) function fBB:𝒳𝒴:superscript𝑓BB𝒳𝒴f^{\rm BB}:{\mathcal{X}}\to{\mathcal{Y}}italic_f start_POSTSUPERSCRIPT roman_BB end_POSTSUPERSCRIPT : caligraphic_X → caligraphic_Y. The goal is to maximize fBBsuperscript𝑓BBf^{\rm BB}italic_f start_POSTSUPERSCRIPT roman_BB end_POSTSUPERSCRIPT with minimal evaluations by employing a Gaussian Process (GP) as a surrogate model to predict the function’s behavior across 𝒳𝒳{\mathcal{X}}caligraphic_X. Effective BO relies on a surrogate model to guide the selection of candidate inputs 𝒙𝒙\bm{x}bold_italic_x that might yield values exceeding maxiyisubscript𝑖subscript𝑦𝑖\max_{i\in{\mathcal{L}}}y_{i}roman_max start_POSTSUBSCRIPT italic_i ∈ caligraphic_L end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. However, creating an effective GP surrogate model in high-dimensional input spaces like those for crystal structures is challenging. LSBO addresses this by using a VAE to reduce dimensionality, training the VAE on the unlabeled set 𝒰𝒰{\mathcal{U}}caligraphic_U and fitting the GP model to the latent space 𝒵𝒵{\mathcal{Z}}caligraphic_Z. This approach simplifies the surrogate model fitting because 𝒵𝒵{\mathcal{Z}}caligraphic_Z is of lower dimensionality than 𝒳𝒳{\mathcal{X}}caligraphic_X. During LSBO iterations, the acquisition function (AF) applied to the GP model’s predictions selects new points in 𝒵𝒵{\mathcal{Z}}caligraphic_Z to evaluate. The selected latent variable 𝒛i=argmax𝒛𝒵fAF(𝒛)subscript𝒛superscript𝑖subscriptargmax𝒛𝒵superscript𝑓AF𝒛\bm{z}_{i^{\prime}}=\mathop{\rm argmax}_{\bm{z}\in{\mathcal{Z}}}f^{\rm AF}(\bm% {z})bold_italic_z start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = roman_argmax start_POSTSUBSCRIPT bold_italic_z ∈ caligraphic_Z end_POSTSUBSCRIPT italic_f start_POSTSUPERSCRIPT roman_AF end_POSTSUPERSCRIPT ( bold_italic_z ) is then decoded into a new input instance 𝒙i=fθdec(𝒛i)subscript𝒙superscript𝑖superscriptsubscript𝑓𝜃decsubscript𝒛superscript𝑖\bm{x}_{i^{\prime}}=f_{\theta}^{\rm dec}(\bm{z}_{i^{\prime}})bold_italic_x start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_dec end_POSTSUPERSCRIPT ( bold_italic_z start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ). This new instance is evaluated by fBBsuperscript𝑓BBf^{\rm BB}italic_f start_POSTSUPERSCRIPT roman_BB end_POSTSUPERSCRIPT, and the results update the labeled set {\mathcal{L}}caligraphic_L and refine the GP model. Optionally, retraining the VAE with the updated data can integrate new findings into the model. This cycle repeats until achieving optimal results or exhausting resources. In contexts like de novo crystal design, LSBO aims to discover crystal structures with optimal properties, efficiently navigating the reduced dimensionality of latent spaces.

3.3 Property Optimization

Our objective is to generate a crystal that optimizes a specific property 𝒫𝒫{\mathcal{P}}caligraphic_P of the crystal structure, which is determined by the BB function fBBsuperscript𝑓BBf^{\text{BB}}italic_f start_POSTSUPERSCRIPT BB end_POSTSUPERSCRIPT. The crystal exists within the input space 𝒳𝒳{\mathcal{X}}caligraphic_X, leading us to define our optimization challenge as finding 𝒙𝒳superscript𝒙𝒳\bm{x^{*}}\in{\mathcal{X}}bold_italic_x start_POSTSUPERSCRIPT bold_∗ end_POSTSUPERSCRIPT ∈ caligraphic_X that maximizes fBB(𝒙)superscript𝑓BB𝒙f^{\text{BB}}(\bm{x})italic_f start_POSTSUPERSCRIPT BB end_POSTSUPERSCRIPT ( bold_italic_x ), expressed as:

𝒙=argmax𝒙𝒳fBB(𝒙).superscript𝒙subscript𝒙𝒳superscript𝑓BB𝒙\bm{x^{*}}=\arg\max_{\bm{x}\in{\mathcal{X}}}f^{\text{BB}}(\bm{x}).bold_italic_x start_POSTSUPERSCRIPT bold_∗ end_POSTSUPERSCRIPT = roman_arg roman_max start_POSTSUBSCRIPT bold_italic_x ∈ caligraphic_X end_POSTSUBSCRIPT italic_f start_POSTSUPERSCRIPT BB end_POSTSUPERSCRIPT ( bold_italic_x ) . (2)

However, due to the high dimensionality of 𝒳𝒳{\mathcal{X}}caligraphic_X and costly evaluation of fBBsuperscript𝑓BBf^{\text{BB}}italic_f start_POSTSUPERSCRIPT BB end_POSTSUPERSCRIPT, direct optimization in 𝒳𝒳{\mathcal{X}}caligraphic_X is impractical. Therefore, we instead perform optimization in the latent space 𝒵𝒵{\mathcal{Z}}caligraphic_Z, utilizing BO to navigate this space efficiently. The optimization problem in the latent space is therefore formulated as:

𝒛=argmax𝒛𝒵g(𝒛),superscript𝒛subscript𝒛𝒵𝑔𝒛\bm{z^{*}}=\arg\max_{\bm{z}\in{\mathcal{Z}}}g(\bm{z}),bold_italic_z start_POSTSUPERSCRIPT bold_∗ end_POSTSUPERSCRIPT = roman_arg roman_max start_POSTSUBSCRIPT bold_italic_z ∈ caligraphic_Z end_POSTSUBSCRIPT italic_g ( bold_italic_z ) , (3)

where g(𝒛)=fBB(fdec(𝒛))𝑔𝒛superscript𝑓BBsuperscript𝑓dec𝒛g(\bm{z})=f^{\text{BB}}(f^{\text{dec}}(\bm{z}))italic_g ( bold_italic_z ) = italic_f start_POSTSUPERSCRIPT BB end_POSTSUPERSCRIPT ( italic_f start_POSTSUPERSCRIPT dec end_POSTSUPERSCRIPT ( bold_italic_z ) ) is the composition of the objective function with the decoder, mapping a latent space point back to the input space to evaluate its property 𝒫𝒫\mathcal{P}caligraphic_P via the BB function.

Refer to caption
Refer to caption
Figure 1: UMAP plots show the latent spaces for invalid and valid generation. A details the FTCP-VAE model’s latent space, mostly leading to invalid generations with only a 49% validity rate. B displays the Combined-VAE’s material latent space within the Crystal-LSBO framework, highlighting its ability to generate valid crystals even from sparse regions, with a high validity rate of 93%.

3.4 Latent Space Explorability

Extensive exploration of the latent space and the ability to generate valid crystal structures throughout this space is crucial for successful de novo crystal design. In the machine learning for materials science literature, the term “validity” is defined in various ways. In this study, a crystal structure is considered valid if it can be converted into a Crystallographic Information File (CIF), which is the standard format for crystal structures in materials science [26]. The generated structure fails to be converted into a CIF if the lattice does not form a valid 3-dimensional structure, the coordinates are not within the correct range, or the lattice does not contain an element.

LSBO’s success hinges on its ability to explore the latent space, relying on the decoder’s capability to accurately generate valid crystal structures from any latent point 𝒛𝒵𝒛𝒵\bm{z}\in\mathcal{Z}bold_italic_z ∈ caligraphic_Z. Therefore, the ability to generate valid materials from a broad region in the latent space is critical. Despite the wide range of use cases of VAEs in material generation tasks, existing models predominantly adhere to local search strategies, largely due to an underdeveloped capacity for wide-ranging exploration within their latent spaces, or due to using VAEs in combination with other non-generative models like AEs, U-Nets [17, 11, 6]. Among these studies, to our knowledge, FTCP-VAE stands out as the sole VAE-based method that does not integrate the VAE with non-VAE architectures, aligning it with the focus of our research. Hence, we are particularly interested in explorability of its latent space and generative capabilities. In order to make an evaluation, we trained an FTCP-VAE model111The code provided by the FTCP-VAE authors is used. and performed 1000 generations using latent vectors drawn from the standard normal distribution, 𝒩(𝟎,𝑰)𝒩0𝑰\mathcal{N}(\bm{0},\bm{I})caligraphic_N ( bold_0 , bold_italic_I ). Figure 1(A) shows a 2-dimensional UMAP plot of the model’s distribution of valid and invalid generations in the latent space, highlighting challenges in consistently producing valid crystal structures, with only 49% of the generated crystals meeting validity requirements. The FTCP-VAE, while utilizing a standalone VAE model to process crystal structure data, evidently struggles with the inherent complexity of such data, leading to limited generative and de novo design performance. On the other hand, Fig. 1(B) shows the latent space of the proposed Combined-VAE model (see §5 for more details), in which there are a much smaller number of invalid generations than the FTCP-VAE model in A.

3.5 Latent Consistent-Aware LSBO

In order to enhance exploration in the latent space of LSBO, we adopt the Latent Consistent-Aware LSBO (LCA-LSBO) [1], recently proposed in the field of organic molecule generation. Retraining of the VAE discussed above involves updating the VAE’s training dataset with new instances from LSBO queries, and periodically retraining the VAE with this updated dataset. However, due to the expensive nature of BB function evaluations, it is unrealistic to expect a high number of new instances that are enough to bring a meaningful update to the VAE. LCA-LSBO uses label-free data augmentations in the latent space to address this challenge. Synthetic latent variables 𝒛^^𝒛\hat{\bm{z}}over^ start_ARG bold_italic_z end_ARG are generated from a probability distribution. These augmentations focus on regions of interest in the latent space that decode into instances with high property values or areas identified as promising by the AF of LSBO. Augmented variables are decoded, re-encoded, and discrepancies are penalized during VAE retraining. Namely, given augmented latent variable 𝒛^bold-^𝒛\bm{\hat{z}}overbold_^ start_ARG bold_italic_z end_ARG, LCA-LSBO adds the additional term 𝒛^fϕenc(fθdec(𝒛^))2superscriptnormbold-^𝒛subscriptsuperscript𝑓encitalic-ϕsubscriptsuperscript𝑓dec𝜃bold-^𝒛2||\bm{\hat{z}}-f^{\text{enc}}_{\phi}(f^{\text{dec}}_{\theta}(\bm{\hat{z}}))||^% {2}| | overbold_^ start_ARG bold_italic_z end_ARG - italic_f start_POSTSUPERSCRIPT enc end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT ( italic_f start_POSTSUPERSCRIPT dec end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( overbold_^ start_ARG bold_italic_z end_ARG ) ) | | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT to the objective function of VAE during retraining. This can be considered as the reconstruction of augmented latent variables at targeted regions. It provides the effective incorporation of data augmentations into the retraining process to address the problem of limited new data to use in retraining. Centered on the region of interest, these augmentations enable LSBO to more effectively explore and exploit promising regions in the latent space. The label-free nature of these augmentations and their targeted selection accelerate de novo discovery and enhance sample efficiency. For more detail on LCA-LSBO, see the appendix §A.1 and [1].

4 Proposed Method

In this section, we present the proposed Crystal-LSBO framework. We detail the VAE components of the Crystal-LSBO framework in §4.1, and we detail our overall LSBO approaches in §4.2. Lastly, we detail the structure relaxation process for the generated crystals.

Refer to caption
Refer to caption
Figure 2: The VAE model architecture in the Crystal-LSBO framework operates as follows: Step A categorizes input crystals into Lattice, Coordinate, and Element parts. Step B trains separate VAEs for these parts, obtaining their latent representations. Step C merges these representations into a unified latent space through the Combined-VAE.

4.1 VAE Components of the Crystal-LSBO Framework

To address the problem of latent space explorability discussed in §3.4, we adopted a multi-model approach, where each specialized VAE model focuses on a distinct aspect of the crystal structure to simplify the learning process. We used space group 1 representations of crystals for their simplicity and comprehensiveness, as all crystals can be represented in this form222Details regarding the selection of space group 1 and its implications are provided in §A.2.. With such representation, we dissect the crystal information obtained from the CIF formats of crystals into three key components: lattice parameters (angles [α,β,γ]𝛼𝛽𝛾[\alpha,\beta,\gamma][ italic_α , italic_β , italic_γ ] and cell lengths [a,b,c]𝑎𝑏𝑐[a,b,c][ italic_a , italic_b , italic_c ]), coordinates of the elements in crystals, and a one-hot encoding matrix for the elements alongside their occupancy information. These VAEs, referred to as Lattice-VAE parameterized by fLatt-VAEϕencsubscriptsuperscript𝑓encsubscriptLatt-VAEitalic-ϕf^{\text{enc}}_{\text{Latt-VAE}_{\phi}}italic_f start_POSTSUPERSCRIPT enc end_POSTSUPERSCRIPT start_POSTSUBSCRIPT Latt-VAE start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT end_POSTSUBSCRIPT and fLatt-VAEθdecsubscriptsuperscript𝑓decsubscriptLatt-VAE𝜃f^{\text{dec}}_{\text{Latt-VAE}_{\theta}}italic_f start_POSTSUPERSCRIPT dec end_POSTSUPERSCRIPT start_POSTSUBSCRIPT Latt-VAE start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT end_POSTSUBSCRIPT, Coordinate-VAE parameterized by fCoord-VAEϕencsubscriptsuperscript𝑓encsubscriptCoord-VAEitalic-ϕf^{\text{enc}}_{\text{Coord-VAE}_{\phi}}italic_f start_POSTSUPERSCRIPT enc end_POSTSUPERSCRIPT start_POSTSUBSCRIPT Coord-VAE start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT end_POSTSUBSCRIPT and fCoord-VAEθdecsubscriptsuperscript𝑓decsubscriptCoord-VAE𝜃f^{\text{dec}}_{\text{Coord-VAE}_{\theta}}italic_f start_POSTSUPERSCRIPT dec end_POSTSUPERSCRIPT start_POSTSUBSCRIPT Coord-VAE start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT end_POSTSUBSCRIPT, and Element-VAE parameterized by fElem-VAEϕencsubscriptsuperscript𝑓encsubscriptElem-VAEitalic-ϕf^{\text{enc}}_{\text{Elem-VAE}_{\phi}}italic_f start_POSTSUPERSCRIPT enc end_POSTSUPERSCRIPT start_POSTSUBSCRIPT Elem-VAE start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT end_POSTSUBSCRIPT and fElem-VAEθdecsubscriptsuperscript𝑓decsubscriptElem-VAE𝜃f^{\text{dec}}_{\text{Elem-VAE}_{\theta}}italic_f start_POSTSUPERSCRIPT dec end_POSTSUPERSCRIPT start_POSTSUBSCRIPT Elem-VAE start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT end_POSTSUBSCRIPT, are trained on these components. Subsequently, the latent representations obtained from these VAEs are used to train another model, referred to as Combined-VAE, parameterized by fCombined-VAEϕencsubscriptsuperscript𝑓encsubscriptCombined-VAEitalic-ϕf^{\text{enc}}_{\text{Combined-VAE}_{\phi}}italic_f start_POSTSUPERSCRIPT enc end_POSTSUPERSCRIPT start_POSTSUBSCRIPT Combined-VAE start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT end_POSTSUBSCRIPT and fCombined-VAEθdecsubscriptsuperscript𝑓decsubscriptCombined-VAE𝜃f^{\text{dec}}_{\text{Combined-VAE}_{\theta}}italic_f start_POSTSUPERSCRIPT dec end_POSTSUPERSCRIPT start_POSTSUBSCRIPT Combined-VAE start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT end_POSTSUBSCRIPT. This model synthesizes the diverse input components into a cohesive material latent space. Each VAE is co-trained with a neural network-based property predictor (PP) model to organize the distribution of the crystals in the latent space by their property values 𝒚𝒚\bm{y}bold_italic_y. The model architecture is provided in Fig. 2. The objective function for each VAE in the Crytal-LSBO framework is defined as:

Crystal-LSBOVAE(θ,ϕ;𝒙)=𝔼qϕ(𝒛|𝒙)[logpθ(𝒙|𝒛)]βDKL(qϕ(𝒛|𝒙)p(𝒛))w||𝒚PPψ(𝒚|𝒛))||2,\mathcal{L}^{\text{VAE}}_{\text{Crystal-LSBO}}(\theta,\phi;\bm{x})=\mathbb{E}_% {q_{\phi}(\bm{z}|\bm{x})}[\log p_{\theta}(\bm{x}|\bm{z})]-\beta\,D_{\text{KL}}% (q_{\phi}(\bm{z}|\bm{x})\|p(\bm{z}))-w||\bm{y}-\text{PP}_{\psi}(\bm{y}|\bm{z})% )||^{2},caligraphic_L start_POSTSUPERSCRIPT VAE end_POSTSUPERSCRIPT start_POSTSUBSCRIPT Crystal-LSBO end_POSTSUBSCRIPT ( italic_θ , italic_ϕ ; bold_italic_x ) = blackboard_E start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT ( bold_italic_z | bold_italic_x ) end_POSTSUBSCRIPT [ roman_log italic_p start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( bold_italic_x | bold_italic_z ) ] - italic_β italic_D start_POSTSUBSCRIPT KL end_POSTSUBSCRIPT ( italic_q start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT ( bold_italic_z | bold_italic_x ) ∥ italic_p ( bold_italic_z ) ) - italic_w | | bold_italic_y - PP start_POSTSUBSCRIPT italic_ψ end_POSTSUBSCRIPT ( bold_italic_y | bold_italic_z ) ) | | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (4)

where w𝑤witalic_w is the weight assigned to the property predictor.333Information on the model architectures of the VAEs, the PP, and training details are provided in §A.3.

Refer to caption
Figure 3: In the Crystal-LSBO framework, crystal generation unfolds as follows: First, latent variables 𝒛^^𝒛\hat{\bm{z}}over^ start_ARG bold_italic_z end_ARG are sampled from the material latent space. The Combined-VAE’s decoder then produces specific latent representations for each VAE. Next, these representations are used by the Lattice, Coordinate, and Element-VAE decoders to generate the respective crystal components. The final structure is assembled from these components. During LSBO, this generation process is guided by the AF of BO.

4.2 Optimization with LSBO in the Crystal-LSBO Framework

In this subsection, we describe the LSBO algorithms considered in the Crystal-LSBO framework.

4.2.1 Crystal-Standard-LSBO

We start by detailing how the standard LSBO algorithm, described in §3.2, is incorporated into our Crystal-LSBO framework, which we refer to as Crystal-Standard-LSBO. In this setup, the latent space created by the Combined-VAE establishes the search domain for Crystal-Standard-LSBO. Within this domain, a GP model navigates the landscape, with the sampling process guided by the AF. When the AF algorithm is maximized at the region 𝒛superscript𝒛\bm{z^{*}}bold_italic_z start_POSTSUPERSCRIPT bold_∗ end_POSTSUPERSCRIPT, the Combined-VAE’s decoder transforms it into an output matrix that holds the latent variables for the Lattice, Coordinate, and Element parts. These latent variables 𝒛Lattice,𝒛Coordinate,𝒛Elementsuperscriptsubscript𝒛Latticesuperscriptsubscript𝒛Coordinatesuperscriptsubscript𝒛Element\bm{z}_{\text{Lattice}}^{*},\bm{z}_{\text{Coordinate}}^{*},\bm{z}_{\text{% Element}}^{*}bold_italic_z start_POSTSUBSCRIPT Lattice end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , bold_italic_z start_POSTSUBSCRIPT Coordinate end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , bold_italic_z start_POSTSUBSCRIPT Element end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT are then input into their respective VAEs to generate the corresponding components of the crystal structure, as demonstrated in Fig. 3. The subsequent step involves assembling these components to form a complete crystal structure, which is then assessed using the BB function. Following this evaluation, the GP model is updated with the new BB function value and 𝒛superscript𝒛\bm{z^{*}}bold_italic_z start_POSTSUPERSCRIPT bold_∗ end_POSTSUPERSCRIPT, and AF uses the updated GP to find the next query point. The Crystal-Standard-LSBO approach is outlined in Algorithm 1.

Algorithm 1 Crystal-Standard-LSBO
1:Trained Crystal-LSBO VAEs, Labeled instances \mathcal{L}caligraphic_L, BB Function fBBsuperscript𝑓BBf^{\text{BB}}italic_f start_POSTSUPERSCRIPT BB end_POSTSUPERSCRIPT, Experiment count J𝐽Jitalic_J
2:for t=1𝑡1t=1italic_t = 1 to J𝐽Jitalic_J do
3:     Fit GP model using {(fϕenc(xi),yi)}isubscriptsuperscriptsubscript𝑓italic-ϕencsubscript𝑥𝑖subscript𝑦𝑖𝑖\{(f_{\phi}^{\text{enc}}(x_{i}),y_{i})\}_{i\in\mathcal{L}}{ ( italic_f start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT enc end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) } start_POSTSUBSCRIPT italic_i ∈ caligraphic_L end_POSTSUBSCRIPT
4:     Find 𝒛argmax𝒛𝒵f^AF(𝒛i)superscript𝒛subscript𝒛𝒵superscript^𝑓AFsubscript𝒛𝑖\bm{z}^{*}\leftarrow\arg\max_{\bm{z}\in\mathcal{Z}}\hat{f}^{\text{AF}}(\bm{z}_% {i})bold_italic_z start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ← roman_arg roman_max start_POSTSUBSCRIPT bold_italic_z ∈ caligraphic_Z end_POSTSUBSCRIPT over^ start_ARG italic_f end_ARG start_POSTSUPERSCRIPT AF end_POSTSUPERSCRIPT ( bold_italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT )
5:     Separate fCombined-VAEθdec(𝒛)subscriptsuperscript𝑓decsubscriptCombined-VAE𝜃superscript𝒛f^{\text{dec}}_{\text{Combined-VAE}_{\theta}}(\bm{z}^{*})italic_f start_POSTSUPERSCRIPT dec end_POSTSUPERSCRIPT start_POSTSUBSCRIPT Combined-VAE start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_italic_z start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) into 𝒛Lattice,𝒛Coordinate,𝒛Elementsuperscriptsubscript𝒛Latticesuperscriptsubscript𝒛Coordinatesuperscriptsubscript𝒛Element\bm{z}_{\text{Lattice}}^{*},\bm{z}_{\text{Coordinate}}^{*},\bm{z}_{\text{% Element}}^{*}bold_italic_z start_POSTSUBSCRIPT Lattice end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , bold_italic_z start_POSTSUBSCRIPT Coordinate end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , bold_italic_z start_POSTSUBSCRIPT Element end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT
6:     𝒙LatticefLatt-VAEθdec(𝒛Lattice)superscriptsubscript𝒙Latticesubscriptsuperscript𝑓decsubscriptLatt-VAE𝜃superscriptsubscript𝒛Lattice\bm{x}_{\text{Lattice}}^{*}\leftarrow f^{\text{dec}}_{\text{Latt-VAE}_{\theta}% }(\bm{z}_{\text{Lattice}}^{*})bold_italic_x start_POSTSUBSCRIPT Lattice end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ← italic_f start_POSTSUPERSCRIPT dec end_POSTSUPERSCRIPT start_POSTSUBSCRIPT Latt-VAE start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_italic_z start_POSTSUBSCRIPT Lattice end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT )
7:     𝒙CoordinatefCoord-VAEθdec(𝒛Coordinate)superscriptsubscript𝒙Coordinatesubscriptsuperscript𝑓decsubscriptCoord-VAE𝜃superscriptsubscript𝒛Coordinate\bm{x}_{\text{Coordinate}}^{*}\leftarrow f^{\text{dec}}_{\text{Coord-VAE}_{% \theta}}(\bm{z}_{\text{Coordinate}}^{*})bold_italic_x start_POSTSUBSCRIPT Coordinate end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ← italic_f start_POSTSUPERSCRIPT dec end_POSTSUPERSCRIPT start_POSTSUBSCRIPT Coord-VAE start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_italic_z start_POSTSUBSCRIPT Coordinate end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT )
8:     𝒙ElementfElem-VAEθdec(𝒛Element)superscriptsubscript𝒙Elementsubscriptsuperscript𝑓decsubscriptElem-VAE𝜃superscriptsubscript𝒛Element\bm{x}_{\text{Element}}^{*}\leftarrow f^{\text{dec}}_{\text{Elem-VAE}_{\theta}% }(\bm{z}_{\text{Element}}^{*})bold_italic_x start_POSTSUBSCRIPT Element end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ← italic_f start_POSTSUPERSCRIPT dec end_POSTSUPERSCRIPT start_POSTSUBSCRIPT Elem-VAE start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_italic_z start_POSTSUBSCRIPT Element end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT )
9:     Form the crystal by combining 𝒙^=[𝒙Lattice𝒙Coordinate𝒙Element]superscript^𝒙delimited-[]superscriptsubscript𝒙Latticesuperscriptsubscript𝒙Coordinatesuperscriptsubscript𝒙Element\hat{\bm{x}}^{*}=[\bm{x}_{\text{Lattice}}^{*}\cup\bm{x}_{\text{Coordinate}}^{*% }\cup\bm{x}_{\text{Element}}^{*}]over^ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = [ bold_italic_x start_POSTSUBSCRIPT Lattice end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∪ bold_italic_x start_POSTSUBSCRIPT Coordinate end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∪ bold_italic_x start_POSTSUBSCRIPT Element end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ]
10:     Evaluate label: y=fBB(𝒙^)superscript𝑦superscript𝑓BBsuperscript^𝒙y^{*}=f^{\text{BB}}(\hat{\bm{x}}^{*})italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = italic_f start_POSTSUPERSCRIPT BB end_POSTSUPERSCRIPT ( over^ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT )
11:     Update {(x^,y)}superscript^𝑥superscript𝑦\mathcal{L}\leftarrow\mathcal{L}\cup\{(\hat{x}^{*},y^{*})\}caligraphic_L ← caligraphic_L ∪ { ( over^ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) }
12:end for

4.2.2 Crystal-LCA-LSBO

As discussed in §3.5, LCA-LSBO employs label-free data augmentations within the latent space to facilitate rapid updates to the model after each retraining. In this section, we explain the implementation of LCA-LSBO within our Crystal-LSBO framework, referred to as Crystal-LCA-LSBO. In Crystal-LCA-LSBO, different than Crystal-Standard-LSBO, each of the four distinct VAE models undergoes periodic retraining. This retraining is guided by the identification of regions of interest based on the target property values of the crystals. Specifically, when the target property values of queried instances exceed a defined threshold τ𝜏\tauitalic_τ, such regions are deemed as the regions of interest, and data augmentations in the latent space are generated to explore these promising areas. Specifically, when a query point results in the generation of a crystal whose target property value exceeds a predefined threshold, the latent variables used to generate this crystal—𝒛Latticesuperscriptsubscript𝒛Lattice\bm{z}_{\text{Lattice}}^{*}bold_italic_z start_POSTSUBSCRIPT Lattice end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT in Lattice-VAE, 𝒛Coordinatesuperscriptsubscript𝒛Coordinate\bm{z}_{\text{Coordinate}}^{*}bold_italic_z start_POSTSUBSCRIPT Coordinate end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT in Coordinate-VAE, and 𝒛Elementsuperscriptsubscript𝒛Element\bm{z}_{\text{Element}}^{*}bold_italic_z start_POSTSUBSCRIPT Element end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT in Element-VAE models—are designated as regions of interest in latent space of their respective models. The next steps include simultaneous retraining of the Lattice-VAE, Coordinate-VAE, and Element-VAE. This process involves generating synthetic latent variables in the neighborhood of the regions of interest, and retraining with the penalization term for augmented latent variable reconstructions, as detailed in §3.5 and §A.1. Specifically, we consider a normal distribution pref𝒩(𝝁ref,σref)similar-tosuperscript𝑝ref𝒩superscript𝝁refsuperscript𝜎refp^{\rm ref}\sim\mathcal{N}(\bm{\mu}^{\rm ref},\sigma^{\rm ref})italic_p start_POSTSUPERSCRIPT roman_ref end_POSTSUPERSCRIPT ∼ caligraphic_N ( bold_italic_μ start_POSTSUPERSCRIPT roman_ref end_POSTSUPERSCRIPT , italic_σ start_POSTSUPERSCRIPT roman_ref end_POSTSUPERSCRIPT ), which is centered around latent variables that exceed a specific property value threshold, and use randomly generated latent variables from this distribution as augmented latent variables (The distributions in the latent spaces of Lattice, Coordinate, and Element VAE are respectively referred to as pLatticerefsubscriptsuperscript𝑝refLatticep^{\rm ref}_{\rm Lattice}italic_p start_POSTSUPERSCRIPT roman_ref end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_Lattice end_POSTSUBSCRIPT, pCoordinaterefsubscriptsuperscript𝑝refCoordinatep^{\rm ref}_{\rm Coordinate}italic_p start_POSTSUPERSCRIPT roman_ref end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_Coordinate end_POSTSUBSCRIPT, and pElementrefsubscriptsuperscript𝑝refElementp^{\rm ref}_{\rm Element}italic_p start_POSTSUPERSCRIPT roman_ref end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_Element end_POSTSUBSCRIPT). Next, the Combined-VAE is also retrained using the latent representations of the training instances obtained from the retrained individual VAEs. After Combined-VAE is retrained, the process continues with finding the next query point via AF and its evaluation. The algorithm of Crystal-LCA-LSBO is provided in Algorithm 2.

Algorithm 2 Crystal-LCA-LSBO
1:Trained Crystal-LSBO VAEs, Unlabeled instances: 𝒰𝒰\mathcal{U}caligraphic_U, Labeled instances: \mathcal{L}caligraphic_L, fAF(𝒛)superscript𝑓AF𝒛f^{\text{AF}}(\bm{z})italic_f start_POSTSUPERSCRIPT AF end_POSTSUPERSCRIPT ( bold_italic_z ), BB Function fBBsuperscript𝑓BBf^{\text{BB}}italic_f start_POSTSUPERSCRIPT BB end_POSTSUPERSCRIPT, Experiment count J𝐽Jitalic_J, σrefsuperscript𝜎ref\sigma^{\text{ref}}italic_σ start_POSTSUPERSCRIPT ref end_POSTSUPERSCRIPT, Sample size of 𝒛^^𝒛\hat{\bm{z}}over^ start_ARG bold_italic_z end_ARG: Nsuperscript𝑁N^{*}italic_N start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, Region of Interest threshold τ𝜏\tauitalic_τ
2:for j=1,,J𝑗1𝐽j=1,\dots,Jitalic_j = 1 , … , italic_J do
3:     Fit GP model using {(fϕenc(xi),yi)}isubscriptsuperscriptsubscript𝑓italic-ϕencsubscript𝑥𝑖subscript𝑦𝑖𝑖\{(f_{\phi}^{\text{enc}}(x_{i}),y_{i})\}_{i\in\mathcal{L}}{ ( italic_f start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT enc end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) } start_POSTSUBSCRIPT italic_i ∈ caligraphic_L end_POSTSUBSCRIPT
4:     Find 𝒛argmax𝒛𝒵f^AF(𝒛)superscript𝒛subscript𝒛𝒵superscript^𝑓AF𝒛\bm{z}^{*}\leftarrow\arg\max_{\bm{z}\in\mathcal{Z}}\hat{f}^{\text{AF}}(\bm{z})bold_italic_z start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ← roman_arg roman_max start_POSTSUBSCRIPT bold_italic_z ∈ caligraphic_Z end_POSTSUBSCRIPT over^ start_ARG italic_f end_ARG start_POSTSUPERSCRIPT AF end_POSTSUPERSCRIPT ( bold_italic_z )
5:     Separate fCombined-VAEθdec(𝒛)subscriptsuperscript𝑓decsubscriptCombined-VAE𝜃superscript𝒛f^{\text{dec}}_{\text{Combined-VAE}_{\theta}}(\bm{z}^{*})italic_f start_POSTSUPERSCRIPT dec end_POSTSUPERSCRIPT start_POSTSUBSCRIPT Combined-VAE start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_italic_z start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) into 𝒛Lattice,𝒛Coordinate,𝒛Elementsuperscriptsubscript𝒛Latticesuperscriptsubscript𝒛Coordinatesuperscriptsubscript𝒛Element\bm{z}_{\text{Lattice}}^{*},\bm{z}_{\text{Coordinate}}^{*},\bm{z}_{\text{% Element}}^{*}bold_italic_z start_POSTSUBSCRIPT Lattice end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , bold_italic_z start_POSTSUBSCRIPT Coordinate end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , bold_italic_z start_POSTSUBSCRIPT Element end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT.
6:     𝒙LatticefLatt-VAEθdec(𝒛Lattice)superscriptsubscript𝒙Latticesubscriptsuperscript𝑓decsubscriptLatt-VAE𝜃superscriptsubscript𝒛Lattice\bm{x}_{\text{Lattice}}^{*}\leftarrow f^{\text{dec}}_{\text{Latt-VAE}_{\theta}% }(\bm{z}_{\text{Lattice}}^{*})bold_italic_x start_POSTSUBSCRIPT Lattice end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ← italic_f start_POSTSUPERSCRIPT dec end_POSTSUPERSCRIPT start_POSTSUBSCRIPT Latt-VAE start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_italic_z start_POSTSUBSCRIPT Lattice end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT )
7:     𝒙CoordinatefCoord-VAEθdec(𝒛Coordinate)superscriptsubscript𝒙Coordinatesubscriptsuperscript𝑓decsubscriptCoord-VAE𝜃superscriptsubscript𝒛Coordinate\bm{x}_{\text{Coordinate}}^{*}\leftarrow f^{\text{dec}}_{\text{Coord-VAE}_{% \theta}}(\bm{z}_{\text{Coordinate}}^{*})bold_italic_x start_POSTSUBSCRIPT Coordinate end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ← italic_f start_POSTSUPERSCRIPT dec end_POSTSUPERSCRIPT start_POSTSUBSCRIPT Coord-VAE start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_italic_z start_POSTSUBSCRIPT Coordinate end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT )
8:     𝒙ElementfElem-VAEθdec(𝒛Element)superscriptsubscript𝒙Elementsubscriptsuperscript𝑓decsubscriptElem-VAE𝜃superscriptsubscript𝒛Element\bm{x}_{\text{Element}}^{*}\leftarrow f^{\text{dec}}_{\text{Elem-VAE}_{\theta}% }(\bm{z}_{\text{Element}}^{*})bold_italic_x start_POSTSUBSCRIPT Element end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ← italic_f start_POSTSUPERSCRIPT dec end_POSTSUPERSCRIPT start_POSTSUBSCRIPT Elem-VAE start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_italic_z start_POSTSUBSCRIPT Element end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT )
9:     Form the crystal by combining 𝒙^=[𝒙Lattice𝒙Coordinate𝒙Element]superscript^𝒙delimited-[]superscriptsubscript𝒙Latticesuperscriptsubscript𝒙Coordinatesuperscriptsubscript𝒙Element\hat{\bm{x}}^{*}=[\bm{x}_{\text{Lattice}}^{*}\cup\bm{x}_{\text{Coordinate}}^{*% }\cup\bm{x}_{\text{Element}}^{*}]over^ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = [ bold_italic_x start_POSTSUBSCRIPT Lattice end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∪ bold_italic_x start_POSTSUBSCRIPT Coordinate end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∪ bold_italic_x start_POSTSUBSCRIPT Element end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ]
10:     Evaluate label: y=fBB(𝒙^)superscript𝑦superscript𝑓BBsuperscript^𝒙y^{*}=f^{\text{BB}}(\hat{\bm{x}}^{*})italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = italic_f start_POSTSUPERSCRIPT BB end_POSTSUPERSCRIPT ( over^ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT )
11:     Update {(𝒙^,y)}superscript^𝒙superscript𝑦\mathcal{L}\leftarrow\mathcal{L}\cup\{(\hat{\bm{x}}^{*},y^{*})\}caligraphic_L ← caligraphic_L ∪ { ( over^ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) }
12:     if |y|>|τ|superscript𝑦𝜏|y^{*}|>|\tau|| italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT | > | italic_τ | then
13:         Set 𝝁Latticeref𝒛Latticesubscriptsuperscript𝝁refLatticesuperscriptsubscript𝒛Lattice\bm{\mu}^{\text{ref}}_{\text{Lattice}}\leftarrow\bm{z}_{\text{Lattice}}^{*}bold_italic_μ start_POSTSUPERSCRIPT ref end_POSTSUPERSCRIPT start_POSTSUBSCRIPT Lattice end_POSTSUBSCRIPT ← bold_italic_z start_POSTSUBSCRIPT Lattice end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT and obtain {𝒛^Latticei}i=1NpLatticerefsimilar-tosuperscriptsubscriptsubscriptsuperscript^𝒛𝑖Lattice𝑖1superscript𝑁subscriptsuperscript𝑝refLattice\{\hat{\bm{z}}^{i}_{\text{Lattice}}\}_{i=1}^{N^{*}}\sim p^{\text{ref}}_{\text{% Lattice}}{ over^ start_ARG bold_italic_z end_ARG start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT Lattice end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ∼ italic_p start_POSTSUPERSCRIPT ref end_POSTSUPERSCRIPT start_POSTSUBSCRIPT Lattice end_POSTSUBSCRIPT
14:         Set 𝝁Coordinateref𝒛Coordinatesubscriptsuperscript𝝁refCoordinatesuperscriptsubscript𝒛Coordinate\bm{\mu}^{\text{ref}}_{\text{Coordinate}}\leftarrow\bm{z}_{\text{Coordinate}}^% {*}bold_italic_μ start_POSTSUPERSCRIPT ref end_POSTSUPERSCRIPT start_POSTSUBSCRIPT Coordinate end_POSTSUBSCRIPT ← bold_italic_z start_POSTSUBSCRIPT Coordinate end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT and obtain {𝒛^Coordinatei}i=1NpCoordinaterefsimilar-tosuperscriptsubscriptsubscriptsuperscript^𝒛𝑖Coordinate𝑖1superscript𝑁subscriptsuperscript𝑝refCoordinate\{\hat{\bm{z}}^{i}_{\text{Coordinate}}\}_{i=1}^{N^{*}}\sim p^{\text{ref}}_{% \text{Coordinate}}{ over^ start_ARG bold_italic_z end_ARG start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT Coordinate end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ∼ italic_p start_POSTSUPERSCRIPT ref end_POSTSUPERSCRIPT start_POSTSUBSCRIPT Coordinate end_POSTSUBSCRIPT
15:         Set 𝝁Elementref𝒛Elementsubscriptsuperscript𝝁refElementsuperscriptsubscript𝒛Element\bm{\mu}^{\text{ref}}_{\text{Element}}\leftarrow\bm{z}_{\text{Element}}^{*}bold_italic_μ start_POSTSUPERSCRIPT ref end_POSTSUPERSCRIPT start_POSTSUBSCRIPT Element end_POSTSUBSCRIPT ← bold_italic_z start_POSTSUBSCRIPT Element end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT and obtain {𝒛^Elementi}i=1NpElementrefsimilar-tosuperscriptsubscriptsubscriptsuperscript^𝒛𝑖Element𝑖1superscript𝑁subscriptsuperscript𝑝refElement\{\hat{\bm{z}}^{i}_{\text{Element}}\}_{i=1}^{N^{*}}\sim p^{\text{ref}}_{\text{% Element}}{ over^ start_ARG bold_italic_z end_ARG start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT Element end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ∼ italic_p start_POSTSUPERSCRIPT ref end_POSTSUPERSCRIPT start_POSTSUBSCRIPT Element end_POSTSUBSCRIPT
16:         ReTrain Lattice-VAE with {(𝒰)Lattice}subscript𝒰Lattice\{(\mathcal{L}\cup\mathcal{U})_{\text{Lattice}}\}{ ( caligraphic_L ∪ caligraphic_U ) start_POSTSUBSCRIPT Lattice end_POSTSUBSCRIPT } and {𝒛^Latticei}subscriptsuperscript^𝒛𝑖Lattice\{\hat{\bm{z}}^{i}_{\text{Lattice}}\}{ over^ start_ARG bold_italic_z end_ARG start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT Lattice end_POSTSUBSCRIPT }
17:         ReTrain Coordinate-VAE with {(𝒰)Coordinate}subscript𝒰Coordinate\{(\mathcal{L}\cup\mathcal{U})_{\text{Coordinate}}\}{ ( caligraphic_L ∪ caligraphic_U ) start_POSTSUBSCRIPT Coordinate end_POSTSUBSCRIPT } and {𝒛^Coordinatei}subscriptsuperscript^𝒛𝑖Coordinate\{\hat{\bm{z}}^{i}_{\text{Coordinate}}\}{ over^ start_ARG bold_italic_z end_ARG start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT Coordinate end_POSTSUBSCRIPT }
18:         ReTrain Element-VAE with {(𝒰)Element}subscript𝒰Element\{(\mathcal{L}\cup\mathcal{U})_{\text{Element}}\}{ ( caligraphic_L ∪ caligraphic_U ) start_POSTSUBSCRIPT Element end_POSTSUBSCRIPT } and {𝒛^Elementi}subscriptsuperscript^𝒛𝑖Element\{\hat{\bm{z}}^{i}_{\text{Element}}\}{ over^ start_ARG bold_italic_z end_ARG start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT Element end_POSTSUBSCRIPT }
19:         𝒛LatticefLatt-VAEϕenc({(𝒰)}Lattice)subscript𝒛Latticesubscriptsuperscript𝑓encsubscriptLatt-VAEitalic-ϕsubscript𝒰Lattice\bm{z}_{\text{Lattice}}\leftarrow f^{\text{enc}}_{\text{Latt-VAE}_{\phi}}(\{(% \mathcal{L}\cup\mathcal{U})\}_{\text{Lattice}})bold_italic_z start_POSTSUBSCRIPT Lattice end_POSTSUBSCRIPT ← italic_f start_POSTSUPERSCRIPT enc end_POSTSUPERSCRIPT start_POSTSUBSCRIPT Latt-VAE start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( { ( caligraphic_L ∪ caligraphic_U ) } start_POSTSUBSCRIPT Lattice end_POSTSUBSCRIPT )
20:         𝒛CoordinatefCoord-VAEϕenc({(𝒰)}Coordinate)subscript𝒛Coordinatesubscriptsuperscript𝑓encsubscriptCoord-VAEitalic-ϕsubscript𝒰Coordinate\bm{z}_{\text{Coordinate}}\leftarrow f^{\text{enc}}_{\text{Coord-VAE}_{\phi}}(% \{(\mathcal{L}\cup\mathcal{U})\}_{\text{Coordinate}})bold_italic_z start_POSTSUBSCRIPT Coordinate end_POSTSUBSCRIPT ← italic_f start_POSTSUPERSCRIPT enc end_POSTSUPERSCRIPT start_POSTSUBSCRIPT Coord-VAE start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( { ( caligraphic_L ∪ caligraphic_U ) } start_POSTSUBSCRIPT Coordinate end_POSTSUBSCRIPT )
21:         𝒛ElementfElem-VAEϕenc({(𝒰)}Element)subscript𝒛Elementsubscriptsuperscript𝑓encsubscriptElem-VAEitalic-ϕsubscript𝒰Element\bm{z}_{\text{Element}}\leftarrow f^{\text{enc}}_{\text{Elem-VAE}_{\phi}}(\{(% \mathcal{L}\cup\mathcal{U})\}_{\text{Element}})bold_italic_z start_POSTSUBSCRIPT Element end_POSTSUBSCRIPT ← italic_f start_POSTSUPERSCRIPT enc end_POSTSUPERSCRIPT start_POSTSUBSCRIPT Elem-VAE start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( { ( caligraphic_L ∪ caligraphic_U ) } start_POSTSUBSCRIPT Element end_POSTSUBSCRIPT )
22:         ReTrain Combined-VAE with 𝒛Coordinate,𝒛Coordinate,𝒛Elementsubscript𝒛Coordinatesubscript𝒛Coordinatesubscript𝒛Element\bm{z}_{\text{Coordinate}},\bm{z}_{\text{Coordinate}},\bm{z}_{\text{Element}}bold_italic_z start_POSTSUBSCRIPT Coordinate end_POSTSUBSCRIPT , bold_italic_z start_POSTSUBSCRIPT Coordinate end_POSTSUBSCRIPT , bold_italic_z start_POSTSUBSCRIPT Element end_POSTSUBSCRIPT
23:     end if
24:end for

Post-Processing with M3GNet Structural Relaxation: After completing the Crystal-LSBO algorithms, we use M3GNet [3], a popular machine learning model for structural relaxation. M3GNet refines crystals with desired properties identified by LSBO. However, success in structural relaxation is not guaranteed; issues like non-convergence or elemental overlaps can cause failures. Thus, the success rate of relaxation offers insights into the generative model’s ability to produce stable crystals.

5 Experiments

In our study, we used a dataset of ternary crystals with primitive cell representations sourced from the Materials Project [12], focusing on structures with no more than 40 sites, energy above hull of less than 0.08 eV/atom. Band gap and formation energy values of the crystals serve as target outputs for the property prediction models in each VAE. For LSBO experiments, we employed a GP with an RBF kernel featuring Automatic Relevance Determination to dynamically adjust the lengthscale for each latent dimension within the search space of [3,3]33[-3,3][ - 3 , 3 ]. Expected Improvement was utilized as the AF.

5.1 Optimizing Electronegativity for Model Selection

It is difficult to objectively determine the optimal latent dimensions and hyperparameters for LSBO due to inherent trade-offs: lower latent dimensions increase VAE reconstruction errors, while higher latent dimensions make BO more challenging. To address this, we utilize a surrogate optimization problem based on electronegativity, a property that influences a crystal’s stability and reactivity. This property can be efficiently evaluated using the pymatgen library [18]. We then conduct LSBO experiments using VAEs trained with different configurations. It is important to note that achieving optimal results with the surrogate problem does not necessarily guarantee similar outcomes with the actual target problem. This approach, however, assists in objectively identifying feasible latent dimensions and hyperparameters. For simplicity, we set the Lattice-VAE model to use a 3-dimensional latent space, while the Element-VAE and Coordinate-VAE models utilized 16-dimensional latent spaces with fixed β𝛽\betaitalic_β and w𝑤witalic_w values. The Combined-VAE, incorporating learned latent representations from these models, is trained across latent dimensions of {16,20,24}162024\{16,20,24\}{ 16 , 20 , 24 } and hyperparameters β𝛽\betaitalic_β and w𝑤witalic_w ranging from {0.01,0.1,1,5}0.010.115\{0.01,0.1,1,5\}{ 0.01 , 0.1 , 1 , 5 }, creating 48 model variations. GP is trained on 100 instances with the highest electronegativity values from our dataset, and LSBO experiments for each Combined-VAE model are conducted using the Crystal-Standard-LSBO method outlined in Algorithm 1, across 100 iterations and 10 different seeds.

Our selection criteria favored models with the lowest latent dimension among the top-performing options. In Fig. 4(A), we showcase the top-performing Combined-VAEs across 16, 20, and 24-dimensional latent spaces, in which the average of the results from different seeds and the standard errors are displayed (Higher is the better). We found that the best performing 16-dimensional model lagged in LSBO tasks, while the 20 and 24-dimensional models exhibited similar efficacy, in which both can generate crystals with higher electronegativity values than the highest value among known crystals in our dataset within 100 iterations. Following our selection criteria, we chose the Combined-VAE with 20-dimensional latent space, which is trained with β=0.01𝛽0.01\beta=0.01italic_β = 0.01 and w=5𝑤5w=5italic_w = 5.

Validity of Generations: In §3.4, we discussed the problem of latent space explorability and pointed out that the existing models suffer from invalid generations and provided an example using the FTCP-VAE model, which is trained on the same dataset as ours. Using the selected model, we conducted a similar analysis and provided the results in Fig. 1(B). In this setting, 1000 latent vectors are sampled from 𝒩(𝟎,𝑰)𝒩0𝑰\mathcal{N}(\bm{0},\bm{I})caligraphic_N ( bold_0 , bold_italic_I ), and crystals are generated as demonstrated in Fig. 3. Among 1000 generations, 930 of them were successfully converted into a CIF format, resulting in a 93% validity rate. It indicates a clear improvement in the validity rate over FTCP-VAE, showcasing the model’s capacity to generate valid instances from a wide range in the latent space.

Refer to caption
Refer to caption
Figure 4: Panel A showcases the outcomes of using the Crystal-Standard-LSBO method for the LSBO task focused on designing de novo crystals with enhanced electronegativity values. The 20-dimensional latent space model emerged as optimal. Panel B focuses on the optimization of predicted formation energies, comparing the performance of Crystal-Standard-LSBO, Crystal-LCA-LSBO, Random Search, FTCP-VAE Random Search, and FTCP-VAE Local Search. Crystal-LCA-LSBO significantly outperformed the others, while Crystal-Standard-LSBO also showed effective results.

5.2 Optimizing Formation Energy

This section focuses on our main goal: designing crystals with optimal formation energy values. Due to the high resource demands of exact formation energy calculations, we employed a proxy model as our BB function for estimating the formation energies of generated crystals during LSBO. We utilized an Xgboost [4] model trained on our crystal structure data and their formation energies, achieving a test Mean Squared Error (MSE) of 0.04 in predicting formation energies. Using the selected model in §5.1, we implemented the proposed Crystal-Standard-LSBO and Crystal-LCA-LSBO algorithms444Details on hyperparameter selection for Crystal-LCA-LSBO are provided in the §A.4.. The GP for each of the LSBO experiments was trained on the top 100 instances with the lowest predicted formation energies. To benchmark the effectiveness of the LSBO methods, we implemented a random search strategy using both our VAEs and the FTCP-VAE model, maintaining the same search bounds as those used in the LSBO experiments. Additionally, we adopted the local search methodology proposed by the FTCP-VAE authors, which involves sampling from the vicinity of the latent representation of the crystal with the best predicted formation energy. Each method was evaluated over 1,000 iterations across 10 different seeds.

Figure 4(B) displays the averaged results and standard errors across different seeds, where lower values indicate better performance. Remarkably, both the Crystal-Standard-LSBO and Crystal-LCA-LSBO methods exceeded initial benchmarks by identifying crystals with reduced predicted formation energies consistently across all seeds. In contrast, the FTCP-VAE Local Search approach only marginally surpassed the initial benchmarks in 2 of the 10 seeds, failing to deliver consistent improvements. Furthermore, while random search strategies using our model and the FTCP-VAE model failed to provide improvements, our model demonstrated comparatively better results. These results highlight the effectiveness of our Crystal-LSBO methodology in efficiently exploring the latent space to pinpoint crystals that align with our desired criteria. Among the two proposed LSBO approaches, Crystal-LCA-LSBO showcased superior capability in locating crystals with even lower predicted formation energies. Table 1 lists the three lowest predicted formation energies obtained by each method, consolidating results from all seeds. Here, the leading values achieved by Crystal-LCA-LSBO surpass those from the other methods, emphasizing its enhanced performance.

As detailed in §4, we apply structural relaxation to the structures generated through LSBO. From our Crystal-LCA-LSBO experiments, we identified 38 crystals with predicted formation energies lower than the initial benchmarks. These crystal structures are then processed using the M3GNet model for structural relaxation. Among these, 36 successfully underwent the structural relaxation process, demonstrating a success rate of 94.7%. Figure 5 illustrates examples from these crystals.

Table 1: Comparison of TOP 3 crystals with lowest predicted formation energies.
Method 𝟏𝐬𝐭superscript1𝐬𝐭\mathbf{1^{st}}bold_1 start_POSTSUPERSCRIPT bold_st end_POSTSUPERSCRIPT 𝟐𝐧𝐝superscript2𝐧𝐝\mathbf{2^{nd}}bold_2 start_POSTSUPERSCRIPT bold_nd end_POSTSUPERSCRIPT 𝟑𝐫𝐝superscript3𝐫𝐝\mathbf{3^{rd}}bold_3 start_POSTSUPERSCRIPT bold_rd end_POSTSUPERSCRIPT
Database -4.50 -4.43 -4.14
FTCP-VAE Random Search -3.93 -3.59 -3.59
Random Search -4.26 -4.25 -4.17
FTCP-VAE Local Search -4.57 -4.56 -4.46
Crystal-Standard-LSBO -6.05 -5.53 -5.45
Crystal-LCA-LSBO -6.87 -6.49 -6.31
Refer to caption
(a) Y3 O4
Refer to caption
(b) Sr2 O4
Refer to caption
(c) Er5
Refer to caption
(d) Sr4 O2
Figure 5: Examples of de novo crystals and their chemical compositions generated by Crystal-LCA-LSBO, showcasing diverse configurations.

6 Conclusion

In this study, we demonstrated the effectiveness of the Crystal-LSBO framework for generative modeling and the de novo design of crystal structures. Our approach produced explorable latent spaces, enabling the pioneering application of LSBO algorithms to crystal design. The Crystal-Standard-LSBO method, enhanced by data augmentations in the latent space through the Crystal-LCA-LSBO method, effectively explored and exploited these spaces. This led to the successful discovery of crystals with target properties, showcasing the potential of Crystal-LSBO to significantly enhance automated crystal design and impact material discovery across various industries.

Limitations and Future Work: As discussed in §4.1, our study uses space group 1 representations of crystals for their simplicity and comprehensiveness, which comes from omitting the symmetry information of crystals. While using this representation streamlines model training, it sacrifices a structural information which can limit the generative and de novo design capabilities of the Crystal-LSBO framework. Future work could explore integrating other space group representations into our framework for richer and more targeted exploration of the crystallographic landscape.

References

  • Boyar and Takeuchi [2024] Onur Boyar and Ichiro Takeuchi. Latent space bayesian optimization with latent data augmentation for enhanced exploration. arXiv preprint arXiv:2302.02399, 2024.
  • Brock et al. [2016] Carolyn Pratt Brock, T Hahn, H Wondratschek, U Müller, U Shmueli, E Prince, A Authier, V Kopskỳ, D Litvin, E Arnold, et al. International tables for crystallography volume a: Space-group symmetry, 2016.
  • Chen and Ong [2022] Chi Chen and Shyue Ping Ong. A universal graph deep learning interatomic potential for the periodic table. Nature Computational Science, 2(11):718–728, 11 2022. ISSN 2662-8457. doi: 10.1038/s43588-022-00349-3. URL https://doi.org/10.1038/s43588-022-00349-3.
  • Chen and Guestrin [2016] Tianqi Chen and Carlos Guestrin. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pages 785–794, 2016.
  • Chiba et al. [2022] Naoya Chiba, Yuta Suzuki, Tatsunori Taniai, Ryo Igarashi, Yoshitaka Ushiku, Kotaro Saito, and Kanta Ono. Neural structure fields with application to crystal structure autoencoders. arXiv preprint arXiv:2212.13120, 2022.
  • Court et al. [2020] Callum J Court, Batuhan Yildirim, Apoorv Jain, and Jacqueline M Cole. 3-d inorganic crystal structure generation and property prediction via representation learning. Journal of Chemical Information and Modeling, 60(10):4518–4535, 2020.
  • Gómez-Bombarelli et al. [2018] Rafael Gómez-Bombarelli, Jennifer N Wei, David Duvenaud, José Miguel Hernández-Lobato, Benjamín Sánchez-Lengeling, Dennis Sheberla, Jorge Aguilera-Iparraguirre, Timothy D Hirzel, Ryan P Adams, and Alán Aspuru-Guzik. Automatic chemical design using a data-driven continuous representation of molecules. ACS central science, 4(2):268–276, 2018.
  • Griffiths and Hernández-Lobato [2020] Ryan-Rhys Griffiths and José Miguel Hernández-Lobato. Constrained bayesian optimization for automatic chemical design using variational autoencoders. Chem. Sci., 11:577–586, 2020. doi: 10.1039/C9SC04026A.
  • Grosnit et al. [2021] Antoine Grosnit, Rasul Tutunov, Alexandre Max Maraval, Ryan-Rhys Griffiths, Alexander Imani Cowen-Rivers, Lin Yang, Lin Zhu, Wenlong Lyu, Zhitang Chen, Jun Wang, Jan Peters, and Haitham Bou-Ammar. High-dimensional bayesian optimisation with variational autoencoders and deep metric learning. ArXiv, abs/2106.03609, 2021.
  • Higgins et al. [2016] Irina Higgins, Loïc Matthey, Arka Pal, Christopher P. Burgess, Xavier Glorot, Matthew M. Botvinick, Shakir Mohamed, and Alexander Lerchner. beta-vae: Learning basic visual concepts with a constrained variational framework. In International Conference on Learning Representations, 2016.
  • Hoffmann et al. [2019] Jordan Hoffmann, Louis Maestrati, Yoshihide Sawada, Jian Tang, Jean Michel Sellier, and Yoshua Bengio. Data-driven approach to encoding and decoding 3-d crystal structures. arXiv preprint arXiv:1909.00949, 2019.
  • Jain et al. [2013] Anubhav Jain, Shyue Ping Ong, Geoffroy Hautier, Wei Chen, William Davidson Richards, Stephen Dacek, Shreyas Cholia, Dan Gunter, David Skinner, Gerbrand Ceder, and Kristin A. Persson. Commentary: the materials project: a materials genome approach to accelerating materials innovation. APL Materials, 1:011002, 2013. doi: 10.1063/1.4812323. URL https://doi.org/10.1063/1.4812323.
  • Kim et al. [2020] Sungwon Kim, Juhwan Noh, Geun Ho Gu, Alan Aspuru-Guzik, and Yousung Jung. Generative adversarial networks for crystal structure prediction. ACS central science, 6(8):1412–1420, 2020.
  • Kingma and Welling [2014] Diederik P. Kingma and Max Welling. Auto-encoding variational bayes. In Yoshua Bengio and Yann LeCun, editors, 2nd International Conference on Learning Representations, ICLR 2014, 2014.
  • Krenn et al. [2020] Mario Krenn, Florian Häse, AkshatKumar Nigam, Pascal Friederich, and Alan Aspuru-Guzik. Self-referencing embedded strings (selfies): A 100% robust molecular string representation. Machine Learning: Science and Technology, 1(4):045024, oct 2020. doi: 10.1088/2632-2153/aba947.
  • Maus et al. [2022] Natalie Maus, Haydn Jones, Juston Moore, Matt J Kusner, John Bradshaw, and Jacob Gardner. Local latent space bayesian optimization over structured inputs. Advances in Neural Information Processing Systems, 35:34505–34518, 2022.
  • Noh et al. [2019] Juhwan Noh, Jaehoon Kim, Helge S. Stein, Benjamin Sanchez-Lengeling, John M. Gregoire, Alan Aspuru-Guzik, and Yousung Jung. Inverse design of solid-state materials via a continuous representation. Matter, 1(5):1370–1384, 2019. ISSN 2590-2385. doi: https://doi.org/10.1016/j.matt.2019.08.017. URL https://www.sciencedirect.com/science/article/pii/S2590238519301754.
  • Ong et al. [2013] Shyue Ping Ong, William Davidson Richards, Anubhav Jain, Geoffroy Hautier, Michael Kocher, Shreyas Cholia, Dan Gunter, Vincent L. Chevrier, Kristin A. Persson, and Gerbrand Ceder. Python materials genomics (pymatgen): A robust, open-source python library for materials analysis. Computational Materials Science, 68:314–319, 2013. ISSN 0927-0256. doi: https://doi.org/10.1016/j.commatsci.2012.10.028. URL https://www.sciencedirect.com/science/article/pii/S0927025612006295.
  • Ren et al. [2020] Zekun Ren, Juhwan Noh, Siyu Tian, Felipe Oviedo, Guangzong Xing, Qiaohao Liang, Armin Aberle, Yi Liu, Qianxiao Li, Senthilnath Jayavelu, et al. Inverse design of crystals using generalized invertible crystallographic representation. arXiv preprint arXiv:2005.07609, 3(6):7, 2020.
  • Ren et al. [2022] Zekun Ren, Siyu Isaac Parker Tian, Juhwan Noh, Felipe Oviedo, Guangzong Xing, Jiali Li, Qiaohao Liang, Ruiming Zhu, Armin G Aberle, Shijing Sun, et al. An invertible crystallographic representation for general inverse design of inorganic crystals with targeted properties. Matter, 5(1):314–335, 2022.
  • Tripp et al. [2020] Austin Tripp, Erik Daxberger, and José Hernández-Lobato. Sample-efficient optimization in the latent space of deep generative models via weighted retraining. In Advances in Neural Information Processing Systems, 06 2020.
  • Weininger [1988] David Weininger. Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of Chemical Information and Computer Sciences, 28(1):31–36, 1988. doi: 10.1021/ci00057a005.
  • Xie et al. [2021] Tian Xie, Xiang Fu, Octavian-Eugen Ganea, Regina Barzilay, and Tommi Jaakkola. Crystal diffusion variational autoencoder for periodic material generation. arXiv preprint arXiv:2110.06197, 2021.
  • Zeni et al. [2023] Claudio Zeni, Robert Pinsler, Daniel Zügner, Andrew Fowler, Matthew Horton, Xiang Fu, Sasha Shysheya, Jonathan Crabbé, Lixin Sun, Jake Smith, et al. Mattergen: a generative model for inorganic materials design. arXiv preprint arXiv:2312.03687, 2023.
  • Zhao et al. [2021] Yong Zhao, Mohammed Al-Fahdi, Ming Hu, Edirisuriya MD Siriwardane, Yuqi Song, Alireza Nasiri, and Jianjun Hu. High-throughput discovery of novel cubic crystal materials using deep generative neural networks. Advanced Science, 8(20):2100566, 2021.
  • Zhao et al. [2023] Yong Zhao, Edirisuriya M Dilanga Siriwardane, Zhenyao Wu, Nihang Fu, Mohammed Al-Fahdi, Ming Hu, and Jianjun Hu. Physics guided deep learning for generative design of crystal materials with symmetry constraints. npj Computational Materials, 9(1):38, 2023.

Appendix A Appendix / supplemental material

A.1 LCA-LSBO

Along with the problem of limited labeled data obtained from BB function evaluations to update the model discussed in §3.5, LCA-LSBO addresses another issue in current LSBO settings, which is the latent consistency/inconsistency. A point in the latent space is considered to be latent consistent when its location remains unchanged after being processed through the decoder and then the encoder of the VAE, which means latent consistency is achieved when 𝒛fϕenc(fθdec(𝒛^))𝒛subscriptsuperscript𝑓encitalic-ϕsubscriptsuperscript𝑓dec𝜃bold-^𝒛\bm{z}\approx f^{\text{enc}}_{\phi}(f^{\text{dec}}_{\theta}(\bm{\hat{z}}))bold_italic_z ≈ italic_f start_POSTSUPERSCRIPT enc end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT ( italic_f start_POSTSUPERSCRIPT dec end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( overbold_^ start_ARG bold_italic_z end_ARG ) ).Therefore, the reconstruction objective discussed in §3.5 aims to improve the latent consistency in the latent space. Latent inconsistencies cause LSBO algorithms not to operate as intended. Namely, as the GP is updated with the queried point 𝒛^bold-^𝒛\bm{\hat{z}}overbold_^ start_ARG bold_italic_z end_ARG and the VAE is updated with the fθdec(𝒛^)subscriptsuperscript𝑓dec𝜃bold-^𝒛f^{\text{dec}}_{\theta}(\bm{\hat{z}})italic_f start_POSTSUPERSCRIPT dec end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( overbold_^ start_ARG bold_italic_z end_ARG ), due to the inconsistencies between 𝒛^bold-^𝒛\bm{\hat{z}}overbold_^ start_ARG bold_italic_z end_ARG and fϕenc(fθdec(𝒛^))subscriptsuperscript𝑓encitalic-ϕsubscriptsuperscript𝑓dec𝜃bold-^𝒛f^{\text{enc}}_{\phi}(f^{\text{dec}}_{\theta}(\bm{\hat{z}}))italic_f start_POSTSUPERSCRIPT enc end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT ( italic_f start_POSTSUPERSCRIPT dec end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( overbold_^ start_ARG bold_italic_z end_ARG ) ), BO loses the location information of the previous iterations, including promising regions that worth further exploitation. Furthermore, the discrepancy between 𝒛^bold-^𝒛\bm{\hat{z}}overbold_^ start_ARG bold_italic_z end_ARG and fϕenc(fθdec(𝒛^))subscriptsuperscript𝑓encitalic-ϕsubscriptsuperscript𝑓dec𝜃bold-^𝒛f^{\text{enc}}_{\phi}(f^{\text{dec}}_{\theta}(\bm{\hat{z}}))italic_f start_POSTSUPERSCRIPT enc end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT ( italic_f start_POSTSUPERSCRIPT dec end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( overbold_^ start_ARG bold_italic_z end_ARG ) ) grows as the density of the sampling region of 𝒛^bold-^𝒛\bm{\hat{z}}overbold_^ start_ARG bold_italic_z end_ARG decreases, leading to noticeable latent inconsistencies particularly during the exploration phase of the LSBO. On the other hand, while the retraining aims to enhance the VAE’s ability to generate target instances by incorporating newly queried data, the limited number of these instances often fails to produce meaningful updates to the VAE. Given the high cost of BB function evaluations, it is impractical to expect a sufficient volume of BB function queries to gather enough new instances for significant VAE updates.

In LCA-LSBO, augmented instances are generated from prefsuperscript𝑝refp^{\text{ref}}italic_p start_POSTSUPERSCRIPT ref end_POSTSUPERSCRIPT, which is referred to as latent reference distribution with mean 𝝁refsuperscript𝝁ref\bm{\mu}^{\text{ref}}bold_italic_μ start_POSTSUPERSCRIPT ref end_POSTSUPERSCRIPT and standard deviation σrefsuperscript𝜎ref\sigma^{\text{ref}}italic_σ start_POSTSUPERSCRIPT ref end_POSTSUPERSCRIPT. This distribution focuses on the augmentation of synthetic latent variables near regions of interest by setting σref<1superscript𝜎ref1\sigma^{\text{ref}}<1italic_σ start_POSTSUPERSCRIPT ref end_POSTSUPERSCRIPT < 1, allowing LCA-LSBO to target specific areas for updates. Once augmented instances are drawn, the VAE model is retrained using an additional term reconstruction term for latent data augmentations in its objective function. This term is referred to as Latent Consistency Loss (LCL). The LCL penalizes deviations between latent variables 𝒛^bold-^𝒛\bm{\hat{z}}overbold_^ start_ARG bold_italic_z end_ARG and fϕenc(fθdec(𝒛^))subscriptsuperscript𝑓encitalic-ϕsubscriptsuperscript𝑓dec𝜃bold-^𝒛f^{\text{enc}}_{\phi}(f^{\text{dec}}_{\theta}(\bm{\hat{z}}))italic_f start_POSTSUPERSCRIPT enc end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT ( italic_f start_POSTSUPERSCRIPT dec end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( overbold_^ start_ARG bold_italic_z end_ARG ) ). The LCL aims to mitigate the negative impacts of the latent inconsistencies between 𝒛^bold-^𝒛\bm{\hat{z}}overbold_^ start_ARG bold_italic_z end_ARG and fϕenc(fθdec(𝒛^))subscriptsuperscript𝑓encitalic-ϕsubscriptsuperscript𝑓dec𝜃bold-^𝒛f^{\text{enc}}_{\phi}(f^{\text{dec}}_{\theta}(\bm{\hat{z}}))italic_f start_POSTSUPERSCRIPT enc end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT ( italic_f start_POSTSUPERSCRIPT dec end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( overbold_^ start_ARG bold_italic_z end_ARG ) ) to LSBO performance.

By incorporating label-free data augmentations in the latent space through LCL, LCA-LSBO simultaneously addresses latent inconsistency and increases the number of instances available for retraining, focusing on the region of interest. The VAE variant utilizing LCL is known as Latent Consistent-Aware VAE (LCA-VAE), with its objective function defined as:

VAELCA(ϕ,θ)=VAE(ϕ,θ)γ𝔼𝒛^pref(𝒛^)[LCL(𝒛^)],subscriptsuperscriptLCAVAEitalic-ϕ𝜃subscriptVAEitalic-ϕ𝜃𝛾subscript𝔼similar-tobold-^𝒛subscript𝑝refbold-^𝒛delimited-[]LCLbold-^𝒛\mathcal{L}^{\text{LCA}}_{\text{VAE}}(\phi,\theta)=\mathcal{L}_{\text{VAE}}(% \phi,\theta)-\gamma\mathbb{E}_{\bm{\hat{z}}\sim p_{\text{ref}}(\bm{\hat{z}})}% \left[\text{LCL}(\bm{\hat{z}})\right],caligraphic_L start_POSTSUPERSCRIPT LCA end_POSTSUPERSCRIPT start_POSTSUBSCRIPT VAE end_POSTSUBSCRIPT ( italic_ϕ , italic_θ ) = caligraphic_L start_POSTSUBSCRIPT VAE end_POSTSUBSCRIPT ( italic_ϕ , italic_θ ) - italic_γ blackboard_E start_POSTSUBSCRIPT overbold_^ start_ARG bold_italic_z end_ARG ∼ italic_p start_POSTSUBSCRIPT ref end_POSTSUBSCRIPT ( overbold_^ start_ARG bold_italic_z end_ARG ) end_POSTSUBSCRIPT [ LCL ( overbold_^ start_ARG bold_italic_z end_ARG ) ] , (5)

where LCL(𝒛^)=𝒛^fϕenc(fθdec(𝒛^))2LCLbold-^𝒛superscriptnormbold-^𝒛subscriptsuperscript𝑓encitalic-ϕsubscriptsuperscript𝑓dec𝜃bold-^𝒛2\text{LCL}(\bm{\hat{z}})=||\bm{\hat{z}}-f^{\text{enc}}_{\phi}(f^{\text{dec}}_{% \theta}(\bm{\hat{z}}))||^{2}LCL ( overbold_^ start_ARG bold_italic_z end_ARG ) = | | overbold_^ start_ARG bold_italic_z end_ARG - italic_f start_POSTSUPERSCRIPT enc end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT ( italic_f start_POSTSUPERSCRIPT dec end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( overbold_^ start_ARG bold_italic_z end_ARG ) ) | | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT and the γ𝛾\gammaitalic_γ is the hyperparameter of the LCL term, controlling its weight. By optimizing this objective function, LCA-VAE shapes the latent space by integrating the data augmentations into the retraining process, enhancing exploration capabilities, and improving sample efficiency. Therefore, in Crystal-LCA-LSBO, Lattice-VAE, Coordinate-VAE, and Element-VAE models are retrained using the LCL term in their objective function,

Crystal-LSBOVAERT(ϕ,θ)=Crystal-LSBOVAE(ϕ,θ)γ𝔼𝒛^pref(𝒛^)[LCL(𝒛^)],subscriptsuperscriptsuperscriptVAERTCrystal-LSBOitalic-ϕ𝜃subscriptsuperscriptVAECrystal-LSBOitalic-ϕ𝜃𝛾subscript𝔼similar-tobold-^𝒛subscript𝑝refbold-^𝒛delimited-[]LCLbold-^𝒛\mathcal{L}^{\text{VAE}^{\text{RT}}}_{\text{Crystal-LSBO}}(\phi,\theta)=% \mathcal{L}^{\text{VAE}}_{\text{Crystal-LSBO}}(\phi,\theta)-\gamma\mathbb{E}_{% \bm{\hat{z}}\sim p_{\text{ref}}(\bm{\hat{z}})}\left[\text{LCL}(\bm{\hat{z}})% \right],caligraphic_L start_POSTSUPERSCRIPT VAE start_POSTSUPERSCRIPT RT end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT Crystal-LSBO end_POSTSUBSCRIPT ( italic_ϕ , italic_θ ) = caligraphic_L start_POSTSUPERSCRIPT VAE end_POSTSUPERSCRIPT start_POSTSUBSCRIPT Crystal-LSBO end_POSTSUBSCRIPT ( italic_ϕ , italic_θ ) - italic_γ blackboard_E start_POSTSUBSCRIPT overbold_^ start_ARG bold_italic_z end_ARG ∼ italic_p start_POSTSUBSCRIPT ref end_POSTSUBSCRIPT ( overbold_^ start_ARG bold_italic_z end_ARG ) end_POSTSUBSCRIPT [ LCL ( overbold_^ start_ARG bold_italic_z end_ARG ) ] , (6)

where Crystal-LSBOVAERT(ϕ,θ)subscriptsuperscriptsuperscriptVAERTCrystal-LSBOitalic-ϕ𝜃\mathcal{L}^{\text{VAE}^{\text{RT}}}_{\text{Crystal-LSBO}}(\phi,\theta)caligraphic_L start_POSTSUPERSCRIPT VAE start_POSTSUPERSCRIPT RT end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT Crystal-LSBO end_POSTSUBSCRIPT ( italic_ϕ , italic_θ ) denotes the updated objective function of Lattive-VAE, Coordinate-VAE, and Element-VAE models during retraining.

A.2 Space Group 1 and Its Implications for VAE Training

Space groups are a way to categorize the patterns and symmetry seen in crystal structures. These patterns are formed by symmetry operations such as translations, rotations, reflections, and inversions that are applied to the arrangement of chemical elements. There are 230 different space groups, each representing a unique set of these operations, which create the repeating patterns in crystals [2]. Space group 1 is the simplest of all these groups. It involves only translations, which means the pattern repeats itself periodically without additional symmetry operations such as rotations or reflections. This group imposes the fewest geometric constraints: the angles and lengths of the edges of the basic unit of the crystal can vary freely. Thanks to these properties, all crystals can be represented using space group 1; that is, representations from other space groups can be converted to space group 1, albeit in a simplified form without the complex symmetry operations found in other groups.

We chose to use space group 1 in our study because it’s the simplest representation and offers the most flexibility. Since crystals in this group can take any shape or size, we can work with a wide range of structures. This is particularly useful when training our VAEs because it helps the model learn from a diverse set of crystal structures with their simplified representations. This diversity improves the model’s ability to generate new structures or to better understand ones it hasn’t seen before. On the other hand, by using space group 1 representations, we inherently miss out on incorporating symmetry information into our framework.

A.3 Training Details of VAE Models

We utilized the PyTorch library to train all our models. For the Lattice-VAE, given the simplicity of its inputs—angles and cell lengths—we used linear layers in both the encoder and decoder. In contrast, the Coordinate-VAE, Element-VAE, and Combined-VAE models featured Convolutional Neural Network (CNN) based layers in encoders and decoders, which are followed by batch normalization, and leaky ReLU operations. Each VAE’s property prediction model leveraged latent representations to estimate formation energy and band gap properties, using linear layers with ReLU activations. We calculated the MSE for these predictions and conducted joint training of this model with the VAE, utilizing the RMSProp optimizer throughout the training process.

In our experiments, we utilized a computing environment equipped with two Intel® Xeon® Gold 6130 CPUs with a total of 32 cores, which are part of internal cluster machine. In this setup, the training of models requires approximately 3 hours, each LSBO experiment is completed in under one hour per seed, and random generation experiments conclude in less than 10 minutes per seed.

A.4 Hyperparameter Optimization for LCA-LSBO

In the Crystal-LCA-LSBO framework, we need to optimize hyperparameters: prefsuperscript𝑝refp^{\rm ref}italic_p start_POSTSUPERSCRIPT roman_ref end_POSTSUPERSCRIPT standard deviation σrefsuperscript𝜎ref\sigma^{\text{ref}}italic_σ start_POSTSUPERSCRIPT ref end_POSTSUPERSCRIPT which is shared across pLatticerefsubscriptsuperscript𝑝refLatticep^{\rm ref}_{\rm Lattice}italic_p start_POSTSUPERSCRIPT roman_ref end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_Lattice end_POSTSUBSCRIPT, pCoordinaterefsubscriptsuperscript𝑝refCoordinatep^{\rm ref}_{\rm Coordinate}italic_p start_POSTSUPERSCRIPT roman_ref end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_Coordinate end_POSTSUBSCRIPT, and pElementrefsubscriptsuperscript𝑝refElementp^{\rm ref}_{\rm Element}italic_p start_POSTSUPERSCRIPT roman_ref end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_Element end_POSTSUBSCRIPT, the threshold for the region of interest τ𝜏\tauitalic_τ, and the weight of the LCL term γ𝛾\gammaitalic_γ. Optimization was conducted within the surrogate-LSBO experimental setup detailed in §5.1, where electronegativity was the target property. Using the selected model in §5.1, we run several LSBO experiments with σrefsuperscript𝜎ref\sigma^{\text{ref}}italic_σ start_POSTSUPERSCRIPT ref end_POSTSUPERSCRIPT values ranging from σref[0.3,0.5,0.7,1]superscript𝜎ref0.30.50.71\sigma^{\text{ref}}\in[0.3,0.5,0.7,1]italic_σ start_POSTSUPERSCRIPT ref end_POSTSUPERSCRIPT ∈ [ 0.3 , 0.5 , 0.7 , 1 ], γ[0.1,1,2.5,5,7.5,10]𝛾0.112.557.510\gamma\in[0.1,1,2.5,5,7.5,10]italic_γ ∈ [ 0.1 , 1 , 2.5 , 5 , 7.5 , 10 ], and τ𝜏\tauitalic_τ set to percentile thresholds P50subscript𝑃50P_{50}italic_P start_POSTSUBSCRIPT 50 end_POSTSUBSCRIPT (median), P75subscript𝑃75P_{75}italic_P start_POSTSUBSCRIPT 75 end_POSTSUBSCRIPT, and P100subscript𝑃100P_{100}italic_P start_POSTSUBSCRIPT 100 end_POSTSUBSCRIPT (maximum), based on electronegativity values in our dataset. Each experiment is repeated with 200 iterations and 10 different seeds. The optimal combination of hyperparameters was found to be σref=0.3superscript𝜎ref0.3\sigma^{\text{ref}}=0.3italic_σ start_POSTSUPERSCRIPT ref end_POSTSUPERSCRIPT = 0.3, γ=2.5𝛾2.5\gamma=2.5italic_γ = 2.5, and τ=P100𝜏subscript𝑃100\tau=P_{100}italic_τ = italic_P start_POSTSUBSCRIPT 100 end_POSTSUBSCRIPT. The sample size of augmentations, Nsuperscript𝑁N^{*}italic_N start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, is set to match the batch size used in the model training/retraining.