Design Editing for Offline Model-based Optimization
Abstract
Offline model-based optimization (MBO) aims to maximize a black-box objective function using only an offline dataset of designs and scores. These tasks span various domains, such as robotics, material design, protein and molecular engineering. A prevalent approach involves training a conditional generative model on existing designs and their associated scores, followed by the generation of new designs conditioned on higher target scores. However, these newly generated designs often underperform due to the lack of high-scoring training data. To address this challenge, we introduce a novel method, Design Editing for Offline Model-based Optimization (DEMO), which consists of two phases. In the first phase, termed pseudo-target distribution generation, we apply gradient ascent on the offline dataset using a trained surrogate model, producing a synthetic dataset where the predicted scores serve as new labels. A conditional diffusion model is subsequently trained on this synthetic dataset to capture a pseudo-target distribution, which enhances the accuracy of the conditional diffusion model in generating higher-scoring designs. Nevertheless, the pseudo-target distribution is susceptible to noise stemming from inaccuracies in the surrogate model, consequently predisposing the conditional diffusion model to generate suboptimal designs. We hence propose the second phase, existing design editing, to directly incorporate the high-scoring features from the offline dataset into design generation. In this phase, top designs from the offline dataset are edited by introducing noise, which are subsequently refined using the conditional diffusion model to produce high-scoring designs. Overall, high-scoring designs begin with inheriting high-scoring features from the second phase and are further refined with a more accurate conditional diffusion model in the first phase. Empirical evaluations on offline MBO tasks show that DEMO outperforms various baseline methods, achieving the highest mean rank of and median rank of .
1 Introduction
In numerous fields, a primary goal is to innovate and design new objects with specific desired traits [1]. This encompasses areas like robotics, material design, protein and molecular engineering [2, 3, 4, 5]. Conventionally, these objectives are pursued by iteratively testing a black-box objective function that maps a design to its property score. However, such testing can be expensive, time-consuming, or even hazardous [3, 4, 5, 6, 7]. Thus, it is more feasible to utilize an existing offline dataset of designs and their scores to find optimal solutions, without additional real-world testing [1]. This problem is known as offline model-based optimization (MBO). The aim of MBO is to identify a design that optimizes the black-box objective function using only the offline dataset.
A common strategy in MBO involves training a conditional generative model on the available offline dataset to capture the conditional probability distribution , where denotes designs and represents property scores. The model then generates new designs conditioned on higher target scores. Essentially, conditional generative models are designed to establish a one-to-many relationship, mapping property scores to all possible designs. This becomes particularly challenging when the black-box objective function operates over a high-dimensional space. Fortunately, previous research has demonstrated that generative techniques can be effective in solving offline MBO tasks. For instance, the CbAS method utilizes a variational autoencoder [8], while MIN applies a generative adversarial network (GAN) [9, 10]. DDOM extends these techniques by integrating a classifier-free conditional diffusion model to enhance generative capabilities [11].
Nonetheless, one important yet unexplored problem with these generative model-based methods is their reliance on merely training on the offline dataset. This training approach results in models that effectively mimic the distribution of the offline dataset they are trained on but fail to capture the information of designs with higher scores. Therefore, while these models learn to replicate the distribution of existing designs, they struggle to consistently produce new designs that significantly outperform those in the offline dataset.
To address this challenge, we introduce an innovative and effective approach, Design Editing for Offline Model-based Optimization (DEMO). DEMO is structured into two primary phases: pseudo-target distribution generation and existing design editing. In the phase of pseudo-target distribution generation, to address the scarcity of high-scoring training data, we first augment new data by utilizing the offline dataset, which may contain pairs of superconductor materials and critical temperatures for example. To achieve this, a surrogate model, represented as , is trained on the offline dataset , and gradient ascent is applied to existing designs with respect to the surrogate model, creating a synthetic dataset with predicted scores as new labels. As illustrated in Figure 1 (a), the surrogate model fits the offline data to , generating new data points and through gradient ascent. Subsequently, a classifier-free conditional diffusion model is trained on to learn the conditional probability distribution of these synthetic designs along with their predicted scores. This diffusion model characterizes a pseudo-target distribution, which has improved accuracy in generating higher-scoring designs.
However, as shown in Figure 1 (a), the surrogate model may not accurately capture the black-box objective function, resulting in the pseudo-target distribution possibly containing noisy information stemming from the surrogate model. Thus, generating directly from the pseudo-target distribution could lead to some suboptimal designs, which have high predicted scores but low ground-truth scores, necessitating the second phase of DEMO. This phase, termed existing design editing, directly incorporates the high-scoring features from the offline dataset to provide more guidance to the design generation process. Specifically, we edit top designs from the offline dataset by introducing random noise to them and employing the conditional diffusion model from the first phase to remove the noise, guided by higher target scores. As illustrated in Figure 1 (a), after injecting noise, the distribution of top designs in the offline dataset (represented by the purple contour) has more overlap with the pseudo-target distribution (represented by the orange contour). By progressively removing the noise, we gradually project these existing top designs to the manifold of higher-scoring designs, as demonstrated in Figure 1 (b). In essence, DEMO produces new designs which first inherit high-scoring features from existing designs and then refine them by a more accurate conditional diffusion model.
In summary, this paper makes three principal contributions:
-
•
We introduce a novel method, Design Editing for Offline Model-based Optimization (DEMO). DEMO operates in two main phases: the first, pseudo-target distribution generation, involves employing a surrogate model to create a synthetic dataset and training a conditional diffusion model on this synthetic dataset to serve as the pseudo-target distribution.
-
•
The second phase, existing design editing, introduces random noise to existing top designs and uses the trained conditional diffusion model to refine them, resulting in designs which not only inherit high-scoring features from existing top designs but also achieve higher scores by leveraging information from the pseudo-target distribution.
-
•
Extensive experiments demonstrate DEMO effectively and reliably generates new designs, yielding state-of-the-art results across offline MBO tasks, with the mean rank of and the median rank of among methods.
2 Preliminary
2.1 Offline Model-based Optimization
Offline model-based optimization (MBO) addresses a range of optimization challenges with the aim of maximizing a black-box objective function based on an offline dataset. Mathematically, we define the valid design space as , with representing the dimension of the design. Offline MBO is formulated as:
(1) |
where is the black-box objective function, and is a potential design. For the optimization process, we utilize an offline dataset , with representing an existing design, such as a superconductor material, and representing the associated property score, such as the critical temperature. Usually, this optimization process outputs candidates for optimal designs, where is a small budget to test the black-box objective function. The offline MBO problem also finds applications in other areas, like robot design, as well as protein and molecule engineering.
2.2 Classifier-free Conditional Diffusion Models
Diffusion models stand out in the family of generative models due to their unique approach involving forward diffusion and backward denoising processes. The essence of diffusion models is to gradually add noise to a sample, followed by training a neural network to reverse this noise addition, thus recovering the original data distribution. In this work, we follow the formulation of diffusion models with continuous time [12, 13]. Here, is a random variable denoting the state of a data point at time . The diffusion process is defined by a stochastic differential equation (SDE):
(2) |
where is the drift coefficient of , is the diffusion coefficient of , and is a standard Wiener process. The backward denoising process is given by the reverse time SDE:
(3) |
where represents a negative infinitesimal step in time, and is a reverse time Wiener process. The gradient of the log probability, , is approximated by a neural network with score-matching objectives [14, 15].
Beyond basic diffusion models, our focus is to train a conditional diffusion model that learns the conditional probability distribution of designs based on their associated property scores. To incorporate conditions to diffusion models, Ho et al. [16] achieve it by dividing the score function into a combination of conditional and unconditional components, known as classifier-free diffusion models. Specifically, a single neural network, , is trained to handle both components by utilizing as the condition or leaving it empty for unconditional functions. Formally, we can write this combination as follows:
(4) |
where is a parameter that adjusts the influence of the conditions. A higher value of ensures that the generation process adheres more closely to the specified conditions, while a lower value allows greater flexibility in the outputs.
3 Related Works
3.1 Offline Model-based Optimization
Recent offline model-based optimization (MBO) techniques broadly fall into two categories: (i) those that employ gradient-based optimizations and (ii) those that create new designs via generative models. Gradient-based methods often employ regularization techniques that enhance either the surrogate model [17, 18, 19] or the design itself [20, 21], thus improving the model’s robustness and generalization capacity. It’s worth noting that while some approaches also involve synthesizing new data with pseudo labels [22, 23], they aim to identify useful information from these synthetic data to correct the surrogate model’s inaccuracies. The second category encompasses methods that learn to replicate the distribution of existing designs and include approaches such as MIN [9], CbAS [8], Auto CbAS [24], and DDOM [11]. These methods are known for their ability to generate innovative designs by sampling from learned distributions. DEMO distinguishes itself by training a conditional diffusion model that learns a pseudo-target distribution and incorporating features from existing top designs, which facilitates effectively and consistently generating new superior candidates.
3.2 Diffusion-Based Editing
Diffusion models have shown remarkable success in various generation tasks across multiple modalities, especially for their ability to control the generation process based on given conditions. For instance, recent advancements have utilized diffusion models for zero-shot, test-time editing in the domains of text-based image and video generation. SDEdit [25] employs an editing strategy to balance realism and faithfulness in image generation. To improve the reconstruction quality, methodologies such as DDIM Inversion [26], Null-text Inversion [27] and Negative-prompt Inversion [28] concentrate on deterministic mappings from source latents to initial noise, conditioned on source text. Building on these, CycleDiffusion [29] and Direct Inversion [30] leverage source latents from each inversion step and further improve the faithfulness of the target image to the source image. Following the image editing technique, several video editing methods [31, 32, 33, 34, 35, 36] adopt image diffusion models and enforce temporal consistency across frames, offering practical and efficient solutions for video editing. Inspired by the success of these editing techniques in the field of computer vision, we edit existing top designs towards a pseudo-target distribution in the context of the offline MBO problem, enhancing both the effectiveness and reliability of generating new designs.
4 Methodology
In this section, we elaborate on the details of our proposed Design Editing for Offline Model-based Optimization (DEMO), including two phases. We introduce the first phase, named pseudo-target distribution generation, in section 4.1. This phase trains a conditional diffusion model, serving as the pseudo-target distribution, on a synthetic dataset created by performing gradient ascent with respect to a surrogate model trained on the offline dataset. While the first phase achieves a more accurate conditional diffusion model capable of generating designs with higher scores than a model trained solely on the offline dataset, it is susceptible to noise caused by inaccuracies in the surrogate model. This motivates the second phase, termed Existing Design Editing, described in section 4.2, which explicitly incorporates high-scoring features from existing top designs. Intuitively, one can make an analogy of our method to writing code for a new research project. In coding for research, the initial step often involves sourcing and adapting useful existing code from previous projects, tailoring it to new requirements through modifications and enhancements. In a similar fashion, DEMO generates new designs by initially inheriting high-scoring features from top existing designs (akin to reusing existing code) and subsequently refining them through a more accurate conditional diffusion model (akin to modifying code for a new purpose). Algorithm 1 illustrates the complete process of DEMO.
4.1 Pseudo-target Distribution Generation
Due to the scarcity of high-scoring training data, conditional generative models trained only on the offline dataset often fail to consistently produce new designs that substantially surpass the existing ones. One promising yet underutilized approach to address this issue is to generate a synthetic dataset first, by applying gradient ascent on existing designs using a trained surrogate model. Conditional generative models trained on this synthetic dataset capture a pseudo-target distribution, which are more adept at creating designs with higher scores.
Creation of Synthetic Dataset. Initially, a deep neural network (DNN), denoted as with parameters , is trained on the offline dataset , where and denotes a design and its associated score, respectively. The parameters are optimized as:
(5) |
The solution obtained from Eq. (5) serves as a surrogate for the unknown black-box objective function in Eq. (1). New data are then generated by performing gradient ascent on the existing designs with respect to the learned surrogate model . For a design in , we update it as:
(6) |
where is the total number of iterations, and is the step size for the gradient ascent update. The initial point is same as , and acquired at step is a synthetic design with enhanced predicted score. By iteratively using each design in the offline dataset as the initial point, a synthetic dataset of the same size as is created, with predicted scores as labels. This process is outlined from line to line in Algorithm1.
Training of Conditional Diffusion Model. We employ a classifier-free conditional diffusion model [16] to learn the conditional probability distribution of synthetic designs and their predicted scores in , which captures a pseudo-target distribution. Following the approach in DDOM [11], we use the Variance Preserving (VP) stochastic differential equation (SDE) for the forward diffusion process, as specified in [12]:
(7) |
where is a continuous time function for . The forward process in DDPM [37] is proved to be a discretization of Eq. (7) [12]. To integrate conditions in the backward denoising process, we need to train a DNN with parameters , conditioned on the time and the score associated with the unperturbed design corresponding to . The parameters are optimized as:
(8) |
where is a positive weighting function depending on time. Since we train on the synthetic dataset , the model optimized according to Eq. (8) more accurately represents the gradient of the logarithm of a pseudo-target distribution. This distribution essentially reflects the marginal probability distribution of designs that have enhanced predicted scores. With the optimized model , we thereby improve the accuracy in generating new high-scoring designs by simulating the backward denoising process. This part is described in Line of Algorithm 1.
4.2 Existing Design Editing
Due to potential inaccuracies of the surrogate model in representing the black-box objective function, the synthetic dataset might include noisy data. Therefore, directly generating from the pseudo-target distribution could lead to suboptimal new designs. Driven by the success of editing techniques in image synthesis tasks [25, 38], we explore the potential of creating new designs from top existing designs, instead of initiating from a random latent variable sampled from the standard Gaussian prior. We perturb by introducing noise at a specific time out of and auxiliary noise levels :
(9) |
where , , and . This results in a closed form that samples . The perturbed design is then used as the starting point. Given a target property score , a new design is synthesized using a second-order Heun’s sampler [11] with the model . To yield candidate optimal designs, we select the top designs from to obtain various perturbed designs and denoise them conditioned on . Lines to of Algorithm 1 present the process of this phase.
5 Experiments
This section first describes the experiment setup, followed by the implementation details and results. We aim to answer the following questions in this section: (Q) Is our proposed DEMO more effective than baseline methods in solving the offline MBO problem? (Q) Are the two phases described in section 4 both necessary? (Q) Compared to existing generative model-based approaches, can DEMO more reliably and consistently generate new higher-scoring designs?
5.1 Dataset and Tasks
We carry out experiments on tasks selected from Design-Bench [1] and BayesO Benchmarks [39], including continuous tasks and discrete tasks. The continuous tasks are as follows: (i) Superconductor (SuperC) [5], where the goal is to create a superconductor with continuous components to maximize critical temperature, using designs; (ii) Ant Morphology (Ant) [1, 40], where the objective is to design a four-legged ant with continuous components to increase crawling speed, based on designs; (iii) D’Kitty Morphology (D’Kitty) [1, 41], where the focus is on designing a four-legged D’Kitty with continuous components to enhance crawling speed, using designs; (iv) Inverse Levy Function (Levy) [39], where the aim is to maximize function values of the inverse black-box Levy function with input dimensions, using designs. The discrete tasks include: (v) TF Bind (TF) [6], where the goal is to identify an -unit DNA sequence that maximizes binding activity score, with designs; (vi) TF Bind (TF) [6], where the objective is to find a -unit DNA sequence that optimizes binding activity score, using designs; (vii) NAS [42], where the aim is to discover the optimal neural network architecture to improve test accuracy on the CIFAR- dataset [43], using designs.
5.2 Evaluation and Metrics
Following the evaluation protocol used in previous studies [1, 11, 22], we assume the budget and generate new designs for each method. The -th (max) percentile normalized ground-truth score is reported in section 5.5, and the -th (median) percentile score is provided in Appendix A.1. This normalized score is calculated as where and are the minimum and maximum scores in the entire offline dataset, respectively. For better comparison, we include the normalized score of the best design in the offline dataset, denoted as . Additionally, we provide mean and median rankings across all tasks for a comprehensive performance evaluation.
5.3 Comparison Methods
We benchmark DEMO against three groups of baseline approaches: (i) traditional methods, (ii) those utilizing gradient optimizations from current designs, and (iii) those employing generative models for sampling. Traditional methods include: (1) BO-qEI [44]: conducts Bayesian Optimization to maximize the surrogate, proposes designs using the quasi-Expected-Improvement acquisition function, and labels the designs using the surrogate model. (2) CMA-ES [45]: progressively adjusts the distribution toward the optimal design by altering the covariance matrix. (3) REINFORCE [46]: optimizes the distribution over the input space using the learned surrogate. The second category includes: (4) Grad: performs simple gradient ascent on existing designs to create new ones. (5) Mean: optimizes the average prediction of the ensemble of surrogate models. (6) Min: optimizes the lowest prediction from a group of learned objective functions. (7) COMs [18]: applies regularization to assign lower scores to designs derived through gradient ascent. (8) ROMA [17]: introduces smoothness regularization to the DNN. (9) NEMO [19]: limits the discrepancy between the surrogate and the black-box objective function using normalized maximum likelihood before performing gradient ascent. (10) BDI [21] employs forward and backward mappings to transfer knowledge from the offline dataset to the design. (11) IOM [47]: ensures representation consistency between the training dataset and the optimized designs. Generative model-based methods include: (12) CbAS [8], which adapts a VAE model to steer the design distribution toward areas with higher scores. (13) Auto CbAS [24], which uses importance sampling to update a regression model based on CbAS. (14) MIN [9], which establishes a relationship between scores and designs and seeks optimal designs within this framework. (15) DDOM [11], which learns a generative diffusion model conditioned on the score values.
5.4 Implementation Details
We follow the training protocols from [18] for all comparative methods unless stated otherwise. A -layer MLP with ReLU activation is used for both and , with a hidden layer size of . In Algorithm 1, the iteration count, , is established at for both continuous and discrete tasks. The Adam optimizer [48] is utilized to train the surrogate models over epochs with a batch size of , and a learning rate set at . The step size, , in equation 6 is configured at for continuous tasks and for discrete tasks. The conditional diffusion model, , undergoes training for epochs with a batch size of . For the existing design editing, following precedents set by previous studies [49, 11], we assign a target score, , of and at . The selected value of is , with further elaboration provided in Appendix A.2. Results from traditional methodologies are referenced from [1], and we conduct independent trials for other methods, reporting the mean and standard error. All experiments are conducted on a single NVIDIA V GPU, with execution times per trial ranging from minutes to hours, depending on the specific tasks.
Method | Superconductor | Ant Morphology | D’Kitty Morphology | Levy |
(best) | ||||
BO-qEI | ||||
CMA-ES | ||||
REINFORCE | ||||
Grad | ||||
Mean | ||||
Min | ||||
COMs | ||||
ROMA | ||||
NEMO | ||||
BDI | ||||
IOM | ||||
CbAS | ||||
Auto CbAS | ||||
MIN | ||||
DDOM | ||||
DEMO(ours) |
Method | TF Bind | TF Bind | NAS | Rank Mean | Rank Median |
(best) | |||||
BO-qEI | |||||
CMA-ES | |||||
REINFORCE | |||||
Grad | |||||
Mean | |||||
Min | |||||
COMs | |||||
ROMA | |||||
NEMO | |||||
BDI | |||||
IOM | |||||
CbAS | |||||
Auto CbAS | |||||
MIN | |||||
DDOM | |||||
DEMO(ours) | 1.7/16 | 1/16 |
5.5 Results
Performance in Continuous Tasks. Table 1 presents the results of the continuous tasks. DEMO reaches state-of-the-art performance on all of them. When compared to other generative model-based approaches, such as MIN and DDOM, DEMO generally outperforms them because these methods train models only on the offline dataset and may not learn characteristics of higher-scoring designs. DEMO achieves better performance by effectively mitigating this issue. Moreover, DEMO beats gradient-based methods, like Grad and COMs, by leveraging guidance from existing top designs and a higher target score simultaneously. This indicates that DEMO is effective for continuous tasks.
Performance in Discrete Tasks. Table 2 exhibits the results of the discrete tasks. DEMO attains top performances in TF Bind and TF Bind , where the results on TF surpass other methods by a significant margin, suggesting the ability of DEMO to solve discrete offline MBO tasks. Nonetheless, DEMO underperforms on NAS, which might be caused by two reasons. First, each neural network architecture is encoded as a sequence of one-hot vectors, which has a length of . This encoding process might be incapable of precisely representing all features of a given architecture, inducing undesirable performance on NAS. Furthermore, after checking the offline dataset of NAS, we find that many existing designs share commonalities. This redundancy means that the offline dataset of NAS contains less useful information than those of other tasks, which further explains why the performance of DEMO on NAS is not as strong.
Summary. These results on both continuous and discrete tasks soundly answer Q. DEMO attains the highest rankings with a mean of and median of as detailed in Table 2 and Figure 3, as well as secures top performances in all tasks. We have further run a Welch’s t-test on the tasks where DEMO obtains state-of-the-art results. We obtain p-values of on SuperC, on Ant, on D’Kitty, on Levy, on TF8, and on TF10. This confirms that DEMO accomplishes statistically significant improvements in tasks.
5.6 Ablation Study
Task | D | DEMO | w/o pseudo-target | w/o editing |
SuperC | ||||
Ant | ||||
D’Kitty | ||||
Levy | ||||
TF8 | ||||
TF10 | ||||
NAS |
To rigorously assess the individual contributions of pseudo-target distribution generation (pseudo-target) and existing design editing (editing) within our DEMO method, ablation experiments are conducted by systematically removing each phase. The omission of the pseudo-target phase includes training a conditional diffusion model only on the offline dataset and then applying the editing phase. In contrast, the removal of the editing phase involves using the model trained during the pseudo-target phase to generate new designs starting from a random Gaussian noise.
The results, as summarized in Table 3, provide clear insights into the impact of these modifications. For the continuous tasks, DEMO consistently achieves higher performance compared to its ablated versions. For instance, in the task SuperC, DEMO achieves a score of , significantly higher than the versions without the pseudo-target phase () and without the editing phase (). Similar improvements are observed in Ant, D’Kitty, and Levy, underscoring the effectiveness of integrating both phases in enhancing performance in continuous tasks. In the discrete tasks TF8, TF10, and NAS, DEMO’s superior performance over both partial versions is evident, highlighting its comprehensive effectiveness in managing discrete challenges. Overall, the ablation studies validate the importance of the pseudo-target distribution generation and existing design editing within the DEMO method, answering Q that both phases are necessary for DEMO. These phases collectively contribute to enhancements across a range of tasks and input dimensions.
5.7 Reliability Study
As previously noted, generative model-based methods, which train solely on the offline dataset, often fail to generate new designs that consistently score higher. In this subsection, we assess the ability of DEMO to reliably produce superior designs compared to DDOM, which represents the latest and most robust among generative model-based approaches. We also discuss the comparison to gradient-based approaches in Appendix A.3. To measure reliability, we compute the proportion of new designs that exceed the best scores in the offline dataset . The results are depicted in Figure 3. DEMO consistently outperforms DDOM across all tasks, achieving notable improvements, particularly in the SuperC and NAS tasks. This confirms DEMO’s enhanced reliability over the state-of-the-art generative model-based baseline in both continuous and discrete settings. The Median scores included in Appendix A.1 further support these findings. DEMO achieves the top median-score rankings, affirming the reliability of DEMO and answering Q.
6 Conclusion and Discussion
In this study, we introduce Design Editing for Offline Model-based Optimization (DEMO), which consists of two phases. The first phase, pseudo-target distribution generation, involves training a surrogate model on the offline dataset and applying gradient ascent to create a synthetic dataset where the predicted scores serve as new labels. A conditional diffusion model is subsequently trained on this synthetic dataset to learn a pseudo-target distribution. The second phase, existing design editing, introduces random noise to existing top designs and employs the learned diffusion model to denoise them, conditioned on higher target scores. Overall, DEMO generates new designs by inheriting high-scoring features from top existing designs in the second phase and refine them with a more accurate conditional diffusion model obtained in the first phase. Extensive experiments on diverse offline MBO tasks validate that DEMO outperform various baseline approaches, yielding state-of-the-arts performance. The limitations and potential negative impacts of this study are discussed in Appendix A.4 and Appendix A.5, respectively.
7 Acknowledgement
This research is partly facilitated by the computational resources provided by Compute Canada and Mila Cluster.
References
- [1] Brandon Trabucco, Xinyang Geng, Aviral Kumar, and Sergey Levine. Design-bench: Benchmarks for data-driven offline model-based optimization. arXiv preprint arXiv:2202.08450, 2022.
- [2] Thomas Liao, Grant Wang, Brian Yang, Rene Lee, Kristofer Pister, Sergey Levine, and Roberto Calandra. Data-efficient learning of morphology and controller for a microrobot. arXiv preprint arXiv:1905.01334, 2019.
- [3] Karen S Sarkisyan et al. Local fitness landscape of the green fluorescent protein. Nature, 2016.
- [4] Christof Angermueller, David Dohan, David Belanger, Ramya Deshpande, Kevin Murphy, and Lucy Colwell. Model-based reinforcement learning for biological sequence design. In Proc. Int. Conf. Learning Rep. (ICLR), 2019.
- [5] Kam Hamidieh. A data-driven statistical model for predicting the critical temperature of a superconductor. Computational Materials Science, 2018.
- [6] Luis A Barrera et al. Survey of variation in human transcription factors reveals prevalent dna binding changes. Science, 2016.
- [7] Paul J Sample, Ban Wang, David W Reid, Vlad Presnyak, Iain J McFadyen, David R Morris, and Georg Seelig. Human 5 UTR design and variant effect prediction from a massively parallel translation assay. Nature Biotechnology, 2019.
- [8] David Brookes, Hahnbeom Park, and Jennifer Listgarten. Conditioning by adaptive sampling for robust design. In Proc. Int. Conf. Machine Lea. (ICML), 2019.
- [9] Aviral Kumar and Sergey Levine. Model inversion networks for model-based optimization. Proc. Adv. Neur. Inf. Proc. Syst (NeurIPS), 2020.
- [10] Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial networks, 2014.
- [11] Siddarth Krishnamoorthy, Satvik Mehul Mashkaria, and Aditya Grover. Diffusion models for black-box optimization, 2023.
- [12] Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations, 2021.
- [13] Chin-Wei Huang, Jae Hyun Lim, and Aaron Courville. A variational perspective on diffusion-based generative models and score matching, 2021.
- [14] Pascal Vincent. A connection between score matching and denoising autoencoders. Neural Computation, 23(7):1661–1674, 2011.
- [15] Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data distribution, 2020.
- [16] Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance, 2022.
- [17] Sihyun Yu, Sungsoo Ahn, Le Song, and Jinwoo Shin. Roma: Robust model adaptation for offline model-based optimization. Proc. Adv. Neur. Inf. Proc. Syst (NeurIPS), 2021.
- [18] Brandon Trabucco, Aviral Kumar, Xinyang Geng, and Sergey Levine. Conservative objective models for effective offline model-based optimization, 2021.
- [19] Justin Fu and Sergey Levine. Offline model-based optimization via normalized maximum likelihood estimation. Proc. Int. Conf. Learning Rep. (ICLR), 2021.
- [20] Can Chen, Yingxue Zhang, Xue Liu, and Mark Coates. Bidirectional learning for offline model-based biological sequence design, 2023.
- [21] Can Chen, Yingxue Zhang, Jie Fu, Xue Liu, and Mark Coates. Bidirectional learning for offline infinite-width model-based optimization, 2023.
- [22] Ye Yuan, Can Chen, Zixuan Liu, Willie Neiswanger, and Xue Liu. Importance-aware co-teaching for offline model-based optimization, 2023.
- [23] Can Chen, Christopher Beckham, Zixuan Liu, Xue Liu, and Christopher Pal. Parallel-mentoring for offline model-based optimization, 2023.
- [24] Clara Fannjiang and Jennifer Listgarten. Autofocused oracles for model-based design. Proc. Adv. Neur. Inf. Proc. Syst (NeurIPS), 2020.
- [25] Chenlin Meng, Yutong He, Yang Song, Jiaming Song, Jiajun Wu, Jun-Yan Zhu, and Stefano Ermon. Sdedit: Guided image synthesis and editing with stochastic differential equations, 2022.
- [26] Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models, 2022.
- [27] Ron Mokady, Amir Hertz, Kfir Aberman, Yael Pritch, and Daniel Cohen-Or. Null-text inversion for editing real images using guided diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6038–6047, 2023.
- [28] Daiki Miyake, Akihiro Iohara, Yu Saito, and Toshiyuki Tanaka. Negative-prompt inversion: Fast image inversion for editing with text-guided diffusion models. arXiv preprint arXiv:2305.16807, 2023.
- [29] Chen Henry Wu and Fernando De la Torre. A latent space of stochastic diffusion models for zero-shot image editing and guidance. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7378–7387, 2023.
- [30] Xuan Ju, Ailing Zeng, Yuxuan Bian, Shaoteng Liu, and Qiang Xu. Direct inversion: Boosting diffusion-based editing with 3 lines of code. arXiv preprint arXiv:2310.01506, 2023.
- [31] Chenyang Qi, Xiaodong Cun, Yong Zhang, Chenyang Lei, Xintao Wang, Ying Shan, and Qifeng Chen. Fatezero: Fusing attentions for zero-shot text-based video editing. arXiv preprint arXiv:2303.09535, 2023.
- [32] Duygu Ceylan, Chun-Hao P Huang, and Niloy J Mitra. Pix2video: Video editing using image diffusion. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 23206–23217, 2023.
- [33] Shuai Yang, Yifan Zhou, Ziwei Liu, and Chen Change Loy. Rerender a video: Zero-shot text-guided video-to-video translation. arXiv preprint arXiv:2306.07954, 2023.
- [34] Michal Geyer, Omer Bar-Tal, Shai Bagon, and Tali Dekel. Tokenflow: Consistent diffusion features for consistent video editing. arXiv preprint arXiv:2307.10373, 2023.
- [35] Yuren Cong, Mengmeng Xu, Christian Simon, Shoufa Chen, Jiawei Ren, Yanping Xie, Juan-Manuel Perez-Rua, Bodo Rosenhahn, Tao Xiang, and Sen He. Flatten: optical flow-guided attention for consistent text-to-video editing. arXiv preprint arXiv:2310.05922, 2023.
- [36] Youyuan Zhang, Xuan Ju, and James J Clark. Fastvideoedit: Leveraging consistency models for efficient text-to-video editing. arXiv preprint arXiv:2403.06269, 2024.
- [37] Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models, 2020.
- [38] Xuan Su, Jiaming Song, Chenlin Meng, and Stefano Ermon. Dual diffusion implicit bridges for image-to-image translation. In International Conference on Learning Representations, 2023.
- [39] Jungtaek Kim. BayesO Benchmarks: Benchmark functions for Bayesian optimization, 2023.
- [40] Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. Openai gym. arXiv preprint arXiv:1606.01540, 2016.
- [41] Michael Ahn, Henry Zhu, Kristian Hartikainen, Hugo Ponte, Abhishek Gupta, Sergey Levine, and Vikash Kumar. Robel: Robotics benchmarks for learning with low-cost robots. In Conf. on Robot Lea. (CoRL), 2020.
- [42] Barret Zoph and Quoc V. Le. Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578, 2017.
- [43] Geoffrey E. Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan R. Salakhutdinov. Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580, 2012.
- [44] James T Wilson, Riccardo Moriconi, Frank Hutter, and Marc Peter Deisenroth. The reparameterization trick for acquisition functions. arXiv preprint arXiv:1712.00424, 2017.
- [45] Nikolaus Hansen. The CMA evolution strategy: A comparing review. In Towards a New Evolutionary Computation: Advances in the Estimation of Distribution Algorithms, 2006.
- [46] Ronald J Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 1992.
- [47] Han Qi, Yi Su, Aviral Kumar, and Sergey Levine. Data-driven model-based optimization via invariant representation learning. In Proc. Adv. Neur. Inf. Proc. Syst (NeurIPS), 2022.
- [48] Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization, 2017.
- [49] Minsu Kim, Federico Berto, Sungsoo Ahn, and Jinkyoo Park. Bootstrapped training of score-conditioned generator for offline design of biological sequences, 2023.
Appendix A Appendix
A.1 Median Normalized Scores
Method | Superconductor | Ant Morphology | D’Kitty Morphology | Levy |
(best) | ||||
BO-qEI | ||||
CMA-ES | ||||
REINFORCE | ||||
Grad | ||||
Mean | ||||
Min | ||||
COMs | ||||
ROMA | ||||
NEMO | ||||
BDI | ||||
IOM | ||||
CbAS | ||||
Auto CbAS | ||||
MIN | ||||
DDOM | ||||
DEMO(ours) |
Method | TF Bind | TF Bind | NAS | Rank Mean | Rank Median |
(best) | |||||
BO-qEI | |||||
CMA-ES | |||||
REINFORCE | |||||
Grad | |||||
Mean | |||||
Min | |||||
COMs | |||||
ROMA | |||||
NEMO | |||||
BDI | |||||
IOM | |||||
CbAS | |||||
Auto CbAS | |||||
MIN | |||||
DDOM | |||||
DEMO(ours) | 4.0/16 | 4/16 |
Performance in Continuous Tasks. Table 4 showcases the median normalized scores for various baseline methods across continuous tasks. DEMO, while not always topping the charts, demonstrates robust performance across these tasks, consistently outperforming several baseline methods. For example, in the Ant Morphology task, DEMO’s score of is the highest one among all approaches. This highlights DEMO’s capability to approximate the distribution of higher-scoring designs effectively. Notably, DEMO outperforms traditional generative models like CbAS and Auto CbAS by significant margins across all tasks, underscoring its advanced generative capabilities. It also maintains a competitive edge against more recent generative methods like MIN and DDOM.
Performance in Discrete Tasks. Moving to discrete tasks, as detailed in Table 5, DEMO exhibits impressive performance in the TF Bind task, substantially surpassing all baselines with a score of . However, in more complex tasks like TF Bind and NAS, while DEMO performs competitively, it does not lead the field. This mixed performance can be attributed to DEMO’s methodology which, although highly effective in capturing a broad distribution of high-quality designs, might struggle in task environments with redundancy in design features.
Summary. The results presented in Tables 4 and 5 collectively validate DEMO’s efficacy across both continuous and discrete optimization tasks, providing further support for answering Q affirmatively. With a mean rank of and a median rank of in terms of the median normalized scores, DEMO stands out among competing methods. This comprehensive performance underscores DEMO’s capacity to integrate and leverage complex design distributions effectively, setting a new standard in generative optimization methods.
A.2 Sensitivity to the Choice of m
In Eq. (9), selecting a time close to results in resembling random Gaussian noise, which introduces greater flexibility into the new design generation process. On the other hand, if is closer to , the resulting design retains more characteristics of the existing top design. Thus, serves as a critical hyperparameter in our methodology. This section explores the robustness of DEMO to various choices of . We perform experiments on one continuous task, SuperC, and one discrete task, TF, with ranging from to in increments of . As illustrated in Figure 4, DEMO generally outperforms the baseline methods with different choices of . Nevertheless, overly extreme values of , whether too high or too low, can diminish performance. Selecting an excessively low causes the model to adhere too closely to the distribution of existing designs, while choosing an overly high biases the model towards the pseudo-target distribution, neglecting the guidance of existing top designs. Choosing from a mid-range effectively balances the influences from both the pseudo-target distribution and the top existing designs. Empirical results suggest that an within the range of yields optimal performance, leading us to set for all tasks.
A.3 Extension of Reliability Study
This section extends the reliability study in section 5.7, comparing DEMO with a gradient-based approach. When compared to Grad, DEMO demonstrates greater consistency in 5 out of 7 tasks. However, Grad outperforms DEMO in Levy and TF tasks, which can be attributed to the gradient-based method’s tendency to generate new designs within a narrower distribution. While Grad achieves a higher proportion of higher-scoring new designs in these two tasks, DEMO generates new designs within a wider distribution and thus produces candidates with higher maximum scores, as evidenced in Table 2.
A.4 Limitations
We have demonstrated the effectiveness of DEMO across a wide range of tasks. However, some evaluation methods may not fully capture real-world complexities. For example, in the superconductor task [5], we follow traditional practice by using a random forest regression model as the oracle, as done in prior studies [1]. Unfortunately, this model might not entirely reflect the intricacies of real-world situations, which could lead to discrepancies between our oracle and actual ground-truth outcomes. Engaging with domain experts in the future could help enhance these evaluation approaches. Nevertheless, given DEMO’s straightforward approach and the empirical evidence supporting its robustness and efficacy across various tasks detailed in the Design-Bench [1] and BayesO Benchmarks [39], we remain confident in its ability to generalize effectively to different contexts.
A.5 Negative Impacts
This study seeks to advance the field of Machine Learning. However, it’s important to recognize that advanced optimization techniques can be used for either beneficial or detrimental purposes, depending on their application. For example, while these methods can contribute positively to society through the development of drugs and materials, they also have the potential to be misused to create harmful substances or products. As researchers, we must stay aware and ensure that our contributions promote societal betterment, while also carefully assessing potential risks and ethical concerns.