Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Design Editing for Offline Model-based Optimization

Ye Yuan1, 2  &Youyuan Zhang111footnotemark: 1&Can (Sam) Chen1, 2&Haolun Wu1, 2&Zixuan (Melody) Li1, 2&Jianmo Li1&James J. Clark1&Xue Liu1  &
1 McGill University, 2 Mila - Quebec AI Institute
ye.yuan3@mail.mcgill.ca, youyuan.zhang@mail.mcgill.ca,
can.chen@mila.quebec, haolun.wu@mail.mcgill.ca,
zixuan.li3@mail.mcgill.ca, jianmo.li@mail.mcgill.ca,
james.clark1@mcgill.ca, xueliu@cs.mcgill.ca
Equal contribution with random order.Corresponding author.
Abstract

Offline model-based optimization (MBO) aims to maximize a black-box objective function using only an offline dataset of designs and scores. These tasks span various domains, such as robotics, material design, protein and molecular engineering. A prevalent approach involves training a conditional generative model on existing designs and their associated scores, followed by the generation of new designs conditioned on higher target scores. However, these newly generated designs often underperform due to the lack of high-scoring training data. To address this challenge, we introduce a novel method, Design Editing for Offline Model-based Optimization (DEMO), which consists of two phases. In the first phase, termed pseudo-target distribution generation, we apply gradient ascent on the offline dataset using a trained surrogate model, producing a synthetic dataset where the predicted scores serve as new labels. A conditional diffusion model is subsequently trained on this synthetic dataset to capture a pseudo-target distribution, which enhances the accuracy of the conditional diffusion model in generating higher-scoring designs. Nevertheless, the pseudo-target distribution is susceptible to noise stemming from inaccuracies in the surrogate model, consequently predisposing the conditional diffusion model to generate suboptimal designs. We hence propose the second phase, existing design editing, to directly incorporate the high-scoring features from the offline dataset into design generation. In this phase, top designs from the offline dataset are edited by introducing noise, which are subsequently refined using the conditional diffusion model to produce high-scoring designs. Overall, high-scoring designs begin with inheriting high-scoring features from the second phase and are further refined with a more accurate conditional diffusion model in the first phase. Empirical evaluations on 7777 offline MBO tasks show that DEMO outperforms various baseline methods, achieving the highest mean rank of 1.71.71.71.7 and median rank of 1111.

1 Introduction

In numerous fields, a primary goal is to innovate and design new objects with specific desired traits [1]. This encompasses areas like robotics, material design, protein and molecular engineering [2, 3, 4, 5]. Conventionally, these objectives are pursued by iteratively testing a black-box objective function that maps a design to its property score. However, such testing can be expensive, time-consuming, or even hazardous [3, 4, 5, 6, 7]. Thus, it is more feasible to utilize an existing offline dataset of designs and their scores to find optimal solutions, without additional real-world testing [1]. This problem is known as offline model-based optimization (MBO). The aim of MBO is to identify a design that optimizes the black-box objective function using only the offline dataset.

A common strategy in MBO involves training a conditional generative model on the available offline dataset to capture the conditional probability distribution p(𝒙|y)𝑝conditional𝒙𝑦p(\bm{x}|y)italic_p ( bold_italic_x | italic_y ), where 𝒙𝒙\bm{x}bold_italic_x denotes designs and y𝑦yitalic_y represents property scores. The model then generates new designs conditioned on higher target scores. Essentially, conditional generative models are designed to establish a one-to-many relationship, mapping property scores to all possible designs. This becomes particularly challenging when the black-box objective function operates over a high-dimensional space. Fortunately, previous research has demonstrated that generative techniques can be effective in solving offline MBO tasks. For instance, the CbAS method utilizes a variational autoencoder [8], while MIN applies a generative adversarial network (GAN) [9, 10]. DDOM extends these techniques by integrating a classifier-free conditional diffusion model to enhance generative capabilities [11].

Nonetheless, one important yet unexplored problem with these generative model-based methods is their reliance on merely training on the offline dataset. This training approach results in models that effectively mimic the distribution of the offline dataset they are trained on but fail to capture the information of designs with higher scores. Therefore, while these models learn to replicate the distribution of existing designs, they struggle to consistently produce new designs that significantly outperform those in the offline dataset.

To address this challenge, we introduce an innovative and effective approach, Design Editing for Offline Model-based Optimization (DEMO). DEMO is structured into two primary phases: pseudo-target distribution generation and existing design editing. In the phase of pseudo-target distribution generation, to address the scarcity of high-scoring training data, we first augment new data by utilizing the offline dataset, which may contain pairs of superconductor materials and critical temperatures for example. To achieve this, a surrogate model, represented as f𝜽()subscript𝑓𝜽f_{\bm{\theta}}(\cdot)italic_f start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( ⋅ ), is trained on the offline dataset 𝒟𝒟\mathcal{D}caligraphic_D, and gradient ascent is applied to existing designs with respect to the surrogate model, creating a synthetic dataset 𝒟superscript𝒟\mathcal{D}^{\prime}caligraphic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT with predicted scores as new labels. As illustrated in Figure 1 (a), the surrogate model fits the offline data p1subscript𝑝1p_{1}italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT to p5subscript𝑝5p_{5}italic_p start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT, generating new data points pasubscript𝑝𝑎p_{a}italic_p start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT and pbsubscript𝑝𝑏p_{b}italic_p start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT through gradient ascent. Subsequently, a classifier-free conditional diffusion model is trained on 𝒟superscript𝒟\mathcal{D}^{\prime}caligraphic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT to learn the conditional probability distribution of these synthetic designs along with their predicted scores. This diffusion model characterizes a pseudo-target distribution, which has improved accuracy in generating higher-scoring designs.

Refer to caption
Figure 1: Illustration of DEMO: A conditional diffusion model, acting as the pseudo-target distribution, is trained on a synthetic dataset produced through a surrogate model. New designs are generated by modifying top existing designs using the diffusion model, under the guidance of target scores.

However, as shown in Figure 1 (a), the surrogate model may not accurately capture the black-box objective function, resulting in the pseudo-target distribution possibly containing noisy information stemming from the surrogate model. Thus, generating directly from the pseudo-target distribution could lead to some suboptimal designs, which have high predicted scores but low ground-truth scores, necessitating the second phase of DEMO. This phase, termed existing design editing, directly incorporates the high-scoring features from the offline dataset to provide more guidance to the design generation process. Specifically, we edit top designs from the offline dataset by introducing random noise to them and employing the conditional diffusion model from the first phase to remove the noise, guided by higher target scores. As illustrated in Figure 1 (a), after injecting noise, the distribution of top designs in the offline dataset (represented by the purple contour) has more overlap with the pseudo-target distribution (represented by the orange contour). By progressively removing the noise, we gradually project these existing top designs to the manifold of higher-scoring designs, as demonstrated in Figure 1 (b). In essence, DEMO produces new designs which first inherit high-scoring features from existing designs and then refine them by a more accurate conditional diffusion model.

In summary, this paper makes three principal contributions:

  • We introduce a novel method, Design Editing for Offline Model-based Optimization (DEMO). DEMO operates in two main phases: the first, pseudo-target distribution generation, involves employing a surrogate model to create a synthetic dataset and training a conditional diffusion model on this synthetic dataset to serve as the pseudo-target distribution.

  • The second phase, existing design editing, introduces random noise to existing top designs and uses the trained conditional diffusion model to refine them, resulting in designs which not only inherit high-scoring features from existing top designs but also achieve higher scores by leveraging information from the pseudo-target distribution.

  • Extensive experiments demonstrate DEMO effectively and reliably generates new designs, yielding state-of-the-art results across 7777 offline MBO tasks, with the mean rank of 1.71.71.71.7 and the median rank of 1111 among 16161616 methods.

2 Preliminary

2.1 Offline Model-based Optimization

Offline model-based optimization (MBO) addresses a range of optimization challenges with the aim of maximizing a black-box objective function based on an offline dataset. Mathematically, we define the valid design space as 𝒳=d𝒳superscript𝑑\mathcal{X}=\mathbb{R}^{d}caligraphic_X = blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, with d𝑑ditalic_d representing the dimension of the design. Offline MBO is formulated as:

𝒙=argmax𝒙𝒳f(𝒙),superscript𝒙subscript𝒙𝒳𝑓𝒙\bm{x}^{*}=\arg\max_{\bm{x}\in\mathcal{X}}f(\bm{x}),bold_italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = roman_arg roman_max start_POSTSUBSCRIPT bold_italic_x ∈ caligraphic_X end_POSTSUBSCRIPT italic_f ( bold_italic_x ) , (1)

where f()𝑓f(\cdot)italic_f ( ⋅ ) is the black-box objective function, and 𝒙𝒳𝒙𝒳\bm{x}\in\mathcal{X}bold_italic_x ∈ caligraphic_X is a potential design. For the optimization process, we utilize an offline dataset 𝒟={(𝒙i,yi)}i=1N𝒟superscriptsubscriptsubscript𝒙𝑖subscript𝑦𝑖𝑖1𝑁\mathcal{D}=\{(\bm{x}_{i},y_{i})\}_{i=1}^{N}caligraphic_D = { ( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT, with 𝒙isubscript𝒙𝑖\bm{x}_{i}bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT representing an existing design, such as a superconductor material, and yisubscript𝑦𝑖y_{i}italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT representing the associated property score, such as the critical temperature. Usually, this optimization process outputs K𝐾Kitalic_K candidates for optimal designs, where K𝐾Kitalic_K is a small budget to test the black-box objective function. The offline MBO problem also finds applications in other areas, like robot design, as well as protein and molecule engineering.

2.2 Classifier-free Conditional Diffusion Models

Diffusion models stand out in the family of generative models due to their unique approach involving forward diffusion and backward denoising processes. The essence of diffusion models is to gradually add noise to a sample, followed by training a neural network to reverse this noise addition, thus recovering the original data distribution. In this work, we follow the formulation of diffusion models with continuous time [12, 13]. Here, 𝒙tsubscript𝒙𝑡\bm{x}_{t}bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is a random variable denoting the state of a data point at time t[0,T]𝑡0𝑇t\in\left[0,T\right]italic_t ∈ [ 0 , italic_T ]. The diffusion process is defined by a stochastic differential equation (SDE):

d𝒙=𝒇(𝒙,t)dt+g(t)d𝒘,𝒙𝒇𝒙𝑡𝑡𝑔𝑡𝒘\differential\bm{x}=\bm{f}(\bm{x},t)\differential t+g(t)\differential\bm{w},start_DIFFOP roman_d end_DIFFOP bold_italic_x = bold_italic_f ( bold_italic_x , italic_t ) start_DIFFOP roman_d end_DIFFOP italic_t + italic_g ( italic_t ) start_DIFFOP roman_d end_DIFFOP bold_italic_w , (2)

where 𝒇(,t)𝒇𝑡\bm{f}(\cdot,t)bold_italic_f ( ⋅ , italic_t ) is the drift coefficient of 𝒙tsubscript𝒙𝑡\bm{x}_{t}bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, g()𝑔g(\cdot)italic_g ( ⋅ ) is the diffusion coefficient of 𝒙tsubscript𝒙𝑡\bm{x}_{t}bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, and 𝒘𝒘\bm{w}bold_italic_w is a standard Wiener process. The backward denoising process is given by the reverse time SDE:

d𝒙=[𝒇(𝒙,t)g(t)2𝒙logpt(𝒙)]dt+g(t)d𝒘¯,𝒙delimited-[]𝒇𝒙𝑡𝑔superscript𝑡2subscript𝒙subscript𝑝𝑡𝒙𝑡𝑔𝑡¯𝒘\differential\bm{x}=\left[\bm{f}(\bm{x},t)-g(t)^{2}\nabla_{\bm{x}}\log p_{t}(% \bm{x})\right]\differential t+g(t)\,\differential\bar{\bm{w}},start_DIFFOP roman_d end_DIFFOP bold_italic_x = [ bold_italic_f ( bold_italic_x , italic_t ) - italic_g ( italic_t ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT roman_log italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_x ) ] start_DIFFOP roman_d end_DIFFOP italic_t + italic_g ( italic_t ) start_DIFFOP roman_d end_DIFFOP over¯ start_ARG bold_italic_w end_ARG , (3)

where dt𝑡\differential tstart_DIFFOP roman_d end_DIFFOP italic_t represents a negative infinitesimal step in time, and w¯¯𝑤\bar{w}over¯ start_ARG italic_w end_ARG is a reverse time Wiener process. The gradient of the log probability, 𝒙logpt(𝒙)subscript𝒙subscript𝑝𝑡𝒙\nabla_{\bm{x}}\log p_{t}(\bm{x})∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT roman_log italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_x ), is approximated by a neural network sϕ(𝒙t,t)subscript𝑠bold-italic-ϕsubscript𝒙𝑡𝑡s_{\bm{\phi}}(\bm{x}_{t},t)italic_s start_POSTSUBSCRIPT bold_italic_ϕ end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t ) with score-matching objectives [14, 15].

Beyond basic diffusion models, our focus is to train a conditional diffusion model that learns the conditional probability distribution of designs based on their associated property scores. To incorporate conditions to diffusion models, Ho et al. [16] achieve it by dividing the score function into a combination of conditional and unconditional components, known as classifier-free diffusion models. Specifically, a single neural network, sϕ(𝒙t,t,y)subscript𝑠bold-italic-ϕsubscript𝒙𝑡𝑡𝑦s_{\bm{\phi}}(\bm{x}_{t},t,y)italic_s start_POSTSUBSCRIPT bold_italic_ϕ end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t , italic_y ), is trained to handle both components by utilizing y𝑦yitalic_y as the condition or leaving it empty for unconditional functions. Formally, we can write this combination as follows:

sϕ(𝒙t,t,y)=(1+ω)sϕ(𝒙t,t,y)ωsϕ(𝒙t,t),subscript𝑠bold-italic-ϕsubscript𝒙𝑡𝑡𝑦1𝜔subscript𝑠bold-italic-ϕsubscript𝒙𝑡𝑡𝑦𝜔subscript𝑠bold-italic-ϕsubscript𝒙𝑡𝑡s_{\bm{\phi}}(\bm{x}_{t},t,y)=(1+\omega)s_{\bm{\phi}}(\bm{x}_{t},t,y)-\omega s% _{\bm{\phi}}(\bm{x}_{t},t),italic_s start_POSTSUBSCRIPT bold_italic_ϕ end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t , italic_y ) = ( 1 + italic_ω ) italic_s start_POSTSUBSCRIPT bold_italic_ϕ end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t , italic_y ) - italic_ω italic_s start_POSTSUBSCRIPT bold_italic_ϕ end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t ) , (4)

where ω𝜔\omegaitalic_ω is a parameter that adjusts the influence of the conditions. A higher value of ω𝜔\omegaitalic_ω ensures that the generation process adheres more closely to the specified conditions, while a lower ω𝜔\omegaitalic_ω value allows greater flexibility in the outputs.

3 Related Works

3.1 Offline Model-based Optimization

Recent offline model-based optimization (MBO) techniques broadly fall into two categories: (i) those that employ gradient-based optimizations and (ii) those that create new designs via generative models. Gradient-based methods often employ regularization techniques that enhance either the surrogate model [17, 18, 19] or the design itself [20, 21], thus improving the model’s robustness and generalization capacity. It’s worth noting that while some approaches also involve synthesizing new data with pseudo labels [22, 23], they aim to identify useful information from these synthetic data to correct the surrogate model’s inaccuracies. The second category encompasses methods that learn to replicate the distribution of existing designs and include approaches such as MIN [9], CbAS [8], Auto CbAS [24], and DDOM [11]. These methods are known for their ability to generate innovative designs by sampling from learned distributions. DEMO distinguishes itself by training a conditional diffusion model that learns a pseudo-target distribution and incorporating features from existing top designs, which facilitates effectively and consistently generating new superior candidates.

3.2 Diffusion-Based Editing

Diffusion models have shown remarkable success in various generation tasks across multiple modalities, especially for their ability to control the generation process based on given conditions. For instance, recent advancements have utilized diffusion models for zero-shot, test-time editing in the domains of text-based image and video generation. SDEdit [25] employs an editing strategy to balance realism and faithfulness in image generation. To improve the reconstruction quality, methodologies such as DDIM Inversion [26], Null-text Inversion [27] and Negative-prompt Inversion [28] concentrate on deterministic mappings from source latents to initial noise, conditioned on source text. Building on these, CycleDiffusion [29] and Direct Inversion [30] leverage source latents from each inversion step and further improve the faithfulness of the target image to the source image. Following the image editing technique, several video editing methods [31, 32, 33, 34, 35, 36] adopt image diffusion models and enforce temporal consistency across frames, offering practical and efficient solutions for video editing. Inspired by the success of these editing techniques in the field of computer vision, we edit existing top designs towards a pseudo-target distribution in the context of the offline MBO problem, enhancing both the effectiveness and reliability of generating new designs.

4 Methodology

In this section, we elaborate on the details of our proposed Design Editing for Offline Model-based Optimization (DEMO), including two phases. We introduce the first phase, named pseudo-target distribution generation, in section 4.1. This phase trains a conditional diffusion model, serving as the pseudo-target distribution, on a synthetic dataset created by performing gradient ascent with respect to a surrogate model trained on the offline dataset. While the first phase achieves a more accurate conditional diffusion model capable of generating designs with higher scores than a model trained solely on the offline dataset, it is susceptible to noise caused by inaccuracies in the surrogate model. This motivates the second phase, termed Existing Design Editing, described in section 4.2, which explicitly incorporates high-scoring features from existing top designs. Intuitively, one can make an analogy of our method to writing code for a new research project. In coding for research, the initial step often involves sourcing and adapting useful existing code from previous projects, tailoring it to new requirements through modifications and enhancements. In a similar fashion, DEMO generates new designs by initially inheriting high-scoring features from top existing designs (akin to reusing existing code) and subsequently refining them through a more accurate conditional diffusion model (akin to modifying code for a new purpose). Algorithm 1 illustrates the complete process of DEMO.

4.1 Pseudo-target Distribution Generation

Due to the scarcity of high-scoring training data, conditional generative models trained only on the offline dataset often fail to consistently produce new designs that substantially surpass the existing ones. One promising yet underutilized approach to address this issue is to generate a synthetic dataset first, by applying gradient ascent on existing designs using a trained surrogate model. Conditional generative models trained on this synthetic dataset capture a pseudo-target distribution, which are more adept at creating designs with higher scores.

Creation of Synthetic Dataset. Initially, a deep neural network (DNN), denoted as f𝜽()subscript𝑓𝜽f_{\bm{\theta}}(\cdot)italic_f start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( ⋅ ) with parameters 𝜽𝜽\bm{\theta}bold_italic_θ, is trained on the offline dataset 𝒟={(𝒙i,yi)}i=1N𝒟superscriptsubscriptsubscript𝒙𝑖subscript𝑦𝑖𝑖1𝑁\mathcal{D}=\{(\bm{x}_{i},y_{i})\}_{i=1}^{N}caligraphic_D = { ( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT, where 𝒙isubscript𝒙𝑖\bm{x}_{i}bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and yisubscript𝑦𝑖y_{i}italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT denotes a design and its associated score, respectively. The parameters 𝜽𝜽\bm{\theta}bold_italic_θ are optimized as:

𝜽=argmin𝜽1Ni=1N(f𝜽(𝒙i)yi)2.superscript𝜽subscript𝜽1𝑁superscriptsubscript𝑖1𝑁superscriptsubscript𝑓𝜽subscript𝒙𝑖subscript𝑦𝑖2\bm{\theta}^{*}=\arg\min_{\bm{\theta}}\frac{1}{N}\sum_{i=1}^{N}\left(f_{\bm{% \theta}}(\bm{x}_{i})-y_{i}\right)^{2}.bold_italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = roman_arg roman_min start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ( italic_f start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) - italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (5)

The solution f𝜽()subscript𝑓superscript𝜽f_{\bm{\theta}^{*}}(\cdot)italic_f start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( ⋅ ) obtained from Eq. (5) serves as a surrogate for the unknown black-box objective function f()𝑓f(\cdot)italic_f ( ⋅ ) in Eq. (1). New data are then generated by performing gradient ascent on the existing designs with respect to the learned surrogate model f𝜽()subscript𝑓superscript𝜽f_{\bm{\theta}^{*}}(\cdot)italic_f start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( ⋅ ). For a design 𝒙isubscript𝒙𝑖\bm{x}_{i}bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT in 𝒟𝒟\mathcal{D}caligraphic_D, we update it as:

𝒙i,t=𝒙i,t1+η𝒙f𝜽(𝒙)|𝒙=𝒙i,t,for t{1,,T},formulae-sequencesubscript𝒙𝑖𝑡subscript𝒙𝑖𝑡1evaluated-at𝜂subscript𝒙subscript𝑓superscript𝜽𝒙𝒙subscript𝒙𝑖𝑡for 𝑡1𝑇\bm{x}_{i,t}=\bm{x}_{i,t-1}+\eta\nabla_{\bm{x}}f_{\bm{\theta}^{*}}(\bm{x})\Big% {|}_{\bm{x}=\bm{x}_{i,t}},\quad\text{for }t\in\{1,\cdots,T\},bold_italic_x start_POSTSUBSCRIPT italic_i , italic_t end_POSTSUBSCRIPT = bold_italic_x start_POSTSUBSCRIPT italic_i , italic_t - 1 end_POSTSUBSCRIPT + italic_η ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_x ) | start_POSTSUBSCRIPT bold_italic_x = bold_italic_x start_POSTSUBSCRIPT italic_i , italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT , for italic_t ∈ { 1 , ⋯ , italic_T } , (6)

where T𝑇Titalic_T is the total number of iterations, and η𝜂\etaitalic_η is the step size for the gradient ascent update. The initial point 𝒙i,0subscript𝒙𝑖0\bm{x}_{i,0}bold_italic_x start_POSTSUBSCRIPT italic_i , 0 end_POSTSUBSCRIPT is same as 𝒙isubscript𝒙𝑖\bm{x}_{i}bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, and 𝒙i,Tsubscript𝒙𝑖𝑇\bm{x}_{i,T}bold_italic_x start_POSTSUBSCRIPT italic_i , italic_T end_POSTSUBSCRIPT acquired at step T𝑇Titalic_T is a synthetic design with enhanced predicted score. By iteratively using each design in the offline dataset 𝒟𝒟\mathcal{D}caligraphic_D as the initial point, a synthetic dataset 𝒟superscript𝒟\mathcal{D^{\prime}}caligraphic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT of the same size as 𝒟𝒟\mathcal{D}caligraphic_D is created, with predicted scores as labels. This process is outlined from line 2222 to line 8888 in Algorithm1.

Training of Conditional Diffusion Model. We employ a classifier-free conditional diffusion model [16] to learn the conditional probability distribution of synthetic designs and their predicted scores in 𝒟superscript𝒟\mathcal{D}^{\prime}caligraphic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, which captures a pseudo-target distribution. Following the approach in DDOM [11], we use the Variance Preserving (VP) stochastic differential equation (SDE) for the forward diffusion process, as specified in [12]:

d𝒙=β(t)2𝒙dt+β(t)d𝒘,𝒙𝛽𝑡2𝒙𝑡𝛽𝑡𝒘\differential\bm{x}=-\frac{\beta(t)}{2}\bm{x}\differential t+\sqrt{\beta(t)}% \differential\bm{w},start_DIFFOP roman_d end_DIFFOP bold_italic_x = - divide start_ARG italic_β ( italic_t ) end_ARG start_ARG 2 end_ARG bold_italic_x start_DIFFOP roman_d end_DIFFOP italic_t + square-root start_ARG italic_β ( italic_t ) end_ARG start_DIFFOP roman_d end_DIFFOP bold_italic_w , (7)

where β(t)𝛽𝑡\beta(t)italic_β ( italic_t ) is a continuous time function for t[0,1]𝑡01t\in[0,1]italic_t ∈ [ 0 , 1 ]. The forward process in DDPM [37] is proved to be a discretization of Eq. (7[12]. To integrate conditions in the backward denoising process, we need to train a DNN sϕ(𝒙t,t,y)subscript𝑠bold-italic-ϕsubscript𝒙𝑡𝑡𝑦s_{\bm{\phi}}(\bm{x}_{t},t,y)italic_s start_POSTSUBSCRIPT bold_italic_ϕ end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t , italic_y ) with parameters ϕbold-italic-ϕ\bm{\phi}bold_italic_ϕ, conditioned on the time t𝑡titalic_t and the score y𝑦yitalic_y associated with the unperturbed design 𝒙0subscript𝒙0\bm{x}_{0}bold_italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT corresponding to 𝒙tsubscript𝒙𝑡\bm{x}_{t}bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. The parameters ϕbold-italic-ϕ\bm{\phi}bold_italic_ϕ are optimized as:

ϕ=argminϕ𝔼t[λ(t)𝔼𝒙0,y[𝔼𝒙t|𝒙0[𝒔ϕ(𝒙t,t,y)𝒙logpt(𝒙t|𝒙0)2]]],\bm{\phi}^{*}=\arg\min_{\bm{\phi}}\mathbb{E}_{t}\left[\lambda(t)\mathbb{E}_{% \bm{x}_{0},y}\left[\mathbb{E}_{\bm{x}_{t}|\bm{x}_{0}}\left[\|\bm{s}_{\bm{\phi}% }(\bm{x}_{t},t,y)-\nabla_{\bm{x}}\log p_{t}(\bm{x}_{t}|\bm{x}_{0})\|^{2}\right% ]\right]\right],bold_italic_ϕ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = roman_arg roman_min start_POSTSUBSCRIPT bold_italic_ϕ end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT [ italic_λ ( italic_t ) blackboard_E start_POSTSUBSCRIPT bold_italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_y end_POSTSUBSCRIPT [ blackboard_E start_POSTSUBSCRIPT bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | bold_italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ ∥ bold_italic_s start_POSTSUBSCRIPT bold_italic_ϕ end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t , italic_y ) - ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT roman_log italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | bold_italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ] ] , (8)

where λ(t)𝜆𝑡\lambda(t)italic_λ ( italic_t ) is a positive weighting function depending on time. Since we train on the synthetic dataset 𝒟superscript𝒟\mathcal{D}^{\prime}caligraphic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, the model optimized according to Eq. (8) more accurately represents the gradient of the logarithm of a pseudo-target distribution. This distribution essentially reflects the marginal probability distribution of designs that have enhanced predicted scores. With the optimized model sϕ(𝒙t,t,y)subscript𝑠superscriptbold-italic-ϕsubscript𝒙𝑡𝑡𝑦s_{\bm{\phi}^{*}}(\bm{x}_{t},t,y)italic_s start_POSTSUBSCRIPT bold_italic_ϕ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t , italic_y ), we thereby improve the accuracy in generating new high-scoring designs by simulating the backward denoising process. This part is described in Line 9999 of Algorithm 1.

4.2 Existing Design Editing

Due to potential inaccuracies of the surrogate model f𝜽()subscript𝑓superscript𝜽f_{\bm{\theta}^{*}}(\cdot)italic_f start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( ⋅ ) in representing the black-box objective function, the synthetic dataset 𝒟superscript𝒟\mathcal{D}^{\prime}caligraphic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT might include noisy data. Therefore, directly generating from the pseudo-target distribution could lead to suboptimal new designs. Driven by the success of editing techniques in image synthesis tasks [25, 38], we explore the potential of creating new designs from top existing designs, instead of initiating from a random latent variable sampled from the standard Gaussian prior. We perturb 𝒙topsubscript𝒙𝑡𝑜𝑝\bm{x}_{top}bold_italic_x start_POSTSUBSCRIPT italic_t italic_o italic_p end_POSTSUBSCRIPT by introducing noise at a specific time m𝑚mitalic_m out of {1,,M}1𝑀\{1,\cdots,M\}{ 1 , ⋯ , italic_M } and auxiliary noise levels β1,,βMsubscript𝛽1subscript𝛽𝑀\beta_{1},\cdots,\beta_{M}italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_β start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT:

𝒙perturb=𝒙top+1α¯mϵ,subscript𝒙𝑝𝑒𝑟𝑡𝑢𝑟𝑏subscript𝒙𝑡𝑜𝑝1subscript¯𝛼𝑚italic-ϵ\bm{x}_{perturb}=\bm{x}_{top}+\sqrt{1-\bar{\alpha}_{m}}\epsilon,bold_italic_x start_POSTSUBSCRIPT italic_p italic_e italic_r italic_t italic_u italic_r italic_b end_POSTSUBSCRIPT = bold_italic_x start_POSTSUBSCRIPT italic_t italic_o italic_p end_POSTSUBSCRIPT + square-root start_ARG 1 - over¯ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_ARG italic_ϵ , (9)

where αm=1βmsubscript𝛼𝑚1subscript𝛽𝑚\alpha_{m}=1-\beta_{m}italic_α start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT = 1 - italic_β start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT, α¯m=s=1mαssubscript¯𝛼𝑚superscriptsubscriptproduct𝑠1𝑚subscript𝛼𝑠\bar{\alpha}_{m}=\prod_{s=1}^{m}\alpha_{s}over¯ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT = ∏ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT, and ϵ𝒩(𝟎,I)similar-toitalic-ϵ𝒩0I\epsilon\sim\mathcal{N}(\bm{0},\textbf{I})italic_ϵ ∼ caligraphic_N ( bold_0 , I ). This results in a closed form that samples 𝒙perturb𝒩(𝒙top,(1α¯m)I)similar-tosubscript𝒙𝑝𝑒𝑟𝑡𝑢𝑟𝑏𝒩subscript𝒙𝑡𝑜𝑝1subscript¯𝛼𝑚I\bm{x}_{perturb}\sim\mathcal{N}(\bm{x}_{top},(1-\bar{\alpha}_{m})\textbf{I})bold_italic_x start_POSTSUBSCRIPT italic_p italic_e italic_r italic_t italic_u italic_r italic_b end_POSTSUBSCRIPT ∼ caligraphic_N ( bold_italic_x start_POSTSUBSCRIPT italic_t italic_o italic_p end_POSTSUBSCRIPT , ( 1 - over¯ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) I ). The perturbed design is then used as the starting point. Given a target property score y^^𝑦\hat{y}over^ start_ARG italic_y end_ARG, a new design is synthesized using a second-order Heun’s sampler [11] with the model sϕ()subscript𝑠superscriptbold-italic-ϕs_{\bm{\phi}^{*}}(\cdot)italic_s start_POSTSUBSCRIPT bold_italic_ϕ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( ⋅ ). To yield K𝐾Kitalic_K candidate optimal designs, we select the top K𝐾Kitalic_K designs from 𝒟𝒟\mathcal{D}caligraphic_D to obtain various perturbed designs and denoise them conditioned on y^^𝑦\hat{y}over^ start_ARG italic_y end_ARG. Lines 11111111 to 16161616 of Algorithm 1 present the process of this phase.

Algorithm 1 Design Editing for Offline Model-based Optimization

Input: Offline dataset 𝒟={(𝒙i,yi)}i=1N𝒟superscriptsubscriptsubscript𝒙𝑖subscript𝑦𝑖𝑖1𝑁\mathcal{D}=\{(\bm{x}_{i},y_{i})\}_{i=1}^{N}caligraphic_D = { ( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT, a target score y^^𝑦\hat{y}over^ start_ARG italic_y end_ARG, and a time m𝑚mitalic_m.
      Output: K𝐾Kitalic_K candidate optimal designs.

1/* Pseudo-target Distribution Generation */
2Initialize a surrogate model f𝜽()subscript𝑓𝜽f_{\bm{\theta}}(\cdot)italic_f start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( ⋅ ) and optimize 𝜽𝜽\bm{\theta}bold_italic_θ with Eq. (5) to obtain f𝜽()subscript𝑓superscript𝜽f_{\bm{\theta}^{*}}(\cdot)italic_f start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( ⋅ ).
3𝒟={}superscript𝒟\mathcal{D}^{\prime}=\{\}caligraphic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = { }
4for i=1,2,,N𝑖12𝑁i=1,2,\cdots,Nitalic_i = 1 , 2 , ⋯ , italic_N do
5     𝒙i,0𝒙isubscript𝒙𝑖0subscript𝒙𝑖\bm{x}_{i,0}\longleftarrow\bm{x}_{i}bold_italic_x start_POSTSUBSCRIPT italic_i , 0 end_POSTSUBSCRIPT ⟵ bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT
6     for t=1,2,,T𝑡12𝑇t=1,2,\cdots,Titalic_t = 1 , 2 , ⋯ , italic_T do
7         Update 𝒙i,tsubscript𝒙𝑖𝑡\bm{x}_{i,t}bold_italic_x start_POSTSUBSCRIPT italic_i , italic_t end_POSTSUBSCRIPT with Eq. (6).      
8     Append (𝒙i,T,f𝜽(𝒙i,T))subscript𝒙𝑖𝑇subscript𝑓𝜽subscript𝒙𝑖𝑇(\bm{x}_{i,T},f_{\bm{\theta}(\bm{x}_{i,T})})( bold_italic_x start_POSTSUBSCRIPT italic_i , italic_T end_POSTSUBSCRIPT , italic_f start_POSTSUBSCRIPT bold_italic_θ ( bold_italic_x start_POSTSUBSCRIPT italic_i , italic_T end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT ) to 𝒟superscript𝒟\mathcal{D}^{\prime}caligraphic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT.
9Initialize sϕ()subscript𝑠bold-italic-ϕs_{\bm{\phi}}(\cdot)italic_s start_POSTSUBSCRIPT bold_italic_ϕ end_POSTSUBSCRIPT ( ⋅ ) and optimize ϕbold-italic-ϕ\bm{\phi}bold_italic_ϕ with Eq. (8) on 𝒟superscript𝒟\mathcal{D}^{\prime}caligraphic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT to obtain sϕ()subscript𝑠superscriptbold-italic-ϕs_{\bm{\phi}^{*}}(\cdot)italic_s start_POSTSUBSCRIPT bold_italic_ϕ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( ⋅ ).
10/* Existing Design Editing */
11Candidates = {}\{\}{ }
12for k=1,2,,K𝑘12𝐾k=1,2,\cdots,Kitalic_k = 1 , 2 , ⋯ , italic_K do
13     Select design 𝒙topsubscript𝒙𝑡𝑜𝑝\bm{x}_{top}bold_italic_x start_POSTSUBSCRIPT italic_t italic_o italic_p end_POSTSUBSCRIPT with the k𝑘kitalic_k-th best score among all designs in 𝒟𝒟\mathcal{D}caligraphic_D.
14     Perturb 𝒙topsubscript𝒙𝑡𝑜𝑝\bm{x}_{top}bold_italic_x start_POSTSUBSCRIPT italic_t italic_o italic_p end_POSTSUBSCRIPT with Eq. (9) and the given time m𝑚mitalic_m.
15     Denoise 𝒙perturbsubscript𝒙𝑝𝑒𝑟𝑡𝑢𝑟𝑏\bm{x}_{perturb}bold_italic_x start_POSTSUBSCRIPT italic_p italic_e italic_r italic_t italic_u italic_r italic_b end_POSTSUBSCRIPT and generate 𝒙newsubscript𝒙𝑛𝑒𝑤\bm{x}_{new}bold_italic_x start_POSTSUBSCRIPT italic_n italic_e italic_w end_POSTSUBSCRIPT using the Heun’s method with sϕ()subscript𝑠superscriptbold-italic-ϕs_{\bm{\phi}^{*}}(\cdot)italic_s start_POSTSUBSCRIPT bold_italic_ϕ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( ⋅ ) and y^^𝑦\hat{y}over^ start_ARG italic_y end_ARG.
16     Append 𝒙newsubscript𝒙𝑛𝑒𝑤\bm{x}_{new}bold_italic_x start_POSTSUBSCRIPT italic_n italic_e italic_w end_POSTSUBSCRIPT to Candidates
17return Candidates

5 Experiments

This section first describes the experiment setup, followed by the implementation details and results. We aim to answer the following questions in this section: (Q1111) Is our proposed DEMO more effective than baseline methods in solving the offline MBO problem? (Q2222) Are the two phases described in section 4 both necessary? (Q3333) Compared to existing generative model-based approaches, can DEMO more reliably and consistently generate new higher-scoring designs?

5.1 Dataset and Tasks

We carry out experiments on 7777 tasks selected from Design-Bench [1] and BayesO Benchmarks [39], including 4444 continuous tasks and 3333 discrete tasks. The continuous tasks are as follows: (i) Superconductor (SuperC) [5], where the goal is to create a superconductor with 86868686 continuous components to maximize critical temperature, using 17,0101701017,01017 , 010 designs; (ii) Ant Morphology (Ant) [1, 40], where the objective is to design a four-legged ant with 60606060 continuous components to increase crawling speed, based on 10,0041000410,00410 , 004 designs; (iii) D’Kitty Morphology (D’Kitty) [1, 41], where the focus is on designing a four-legged D’Kitty with 56565656 continuous components to enhance crawling speed, using 10,0041000410,00410 , 004 designs; (iv) Inverse Levy Function (Levy) [39], where the aim is to maximize function values of the inverse black-box Levy function with 60606060 input dimensions, using 15,0001500015,00015 , 000 designs. The discrete tasks include: (v) TF Bind 8888 (TF8888[6], where the goal is to identify an 8888-unit DNA sequence that maximizes binding activity score, with 32,8983289832,89832 , 898 designs; (vi) TF Bind 10101010 (TF10101010[6], where the objective is to find a 10101010-unit DNA sequence that optimizes binding activity score, using 50,0005000050,00050 , 000 designs; (vii) NAS [42], where the aim is to discover the optimal neural network architecture to improve test accuracy on the CIFAR-10101010 dataset [43], using 1,77117711,7711 , 771 designs.

5.2 Evaluation and Metrics

Following the evaluation protocol used in previous studies [1, 11, 22], we assume the budget K=256𝐾256K=256italic_K = 256 and generate 256256256256 new designs for each method. The 100100100100-th (max) percentile normalized ground-truth score is reported in section 5.5, and the 50505050-th (median) percentile score is provided in Appendix A.1. This normalized score is calculated as yn=yyminymaxymin,subscript𝑦𝑛𝑦subscript𝑦minsubscript𝑦maxsubscript𝑦miny_{n}=\frac{y-y_{\text{min}}}{y_{\text{max}}-y_{\text{min}}},italic_y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = divide start_ARG italic_y - italic_y start_POSTSUBSCRIPT min end_POSTSUBSCRIPT end_ARG start_ARG italic_y start_POSTSUBSCRIPT max end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT min end_POSTSUBSCRIPT end_ARG , where yminsubscript𝑦miny_{\text{min}}italic_y start_POSTSUBSCRIPT min end_POSTSUBSCRIPT and ymaxsubscript𝑦maxy_{\text{max}}italic_y start_POSTSUBSCRIPT max end_POSTSUBSCRIPT are the minimum and maximum scores in the entire offline dataset, respectively. For better comparison, we include the normalized score of the best design in the offline dataset, denoted as 𝒟(best)𝒟best\mathcal{D}(\textbf{best})caligraphic_D ( best ). Additionally, we provide mean and median rankings across all 7777 tasks for a comprehensive performance evaluation.

5.3 Comparison Methods

We benchmark DEMO against three groups of baseline approaches: (i) traditional methods, (ii) those utilizing gradient optimizations from current designs, and (iii) those employing generative models for sampling. Traditional methods include: (1) BO-qEI [44]: conducts Bayesian Optimization to maximize the surrogate, proposes designs using the quasi-Expected-Improvement acquisition function, and labels the designs using the surrogate model. (2) CMA-ES [45]: progressively adjusts the distribution toward the optimal design by altering the covariance matrix. (3) REINFORCE [46]: optimizes the distribution over the input space using the learned surrogate. The second category includes: (4) Grad: performs simple gradient ascent on existing designs to create new ones. (5) Mean: optimizes the average prediction of the ensemble of surrogate models. (6) Min: optimizes the lowest prediction from a group of learned objective functions. (7) COMs [18]: applies regularization to assign lower scores to designs derived through gradient ascent. (8) ROMA [17]: introduces smoothness regularization to the DNN. (9) NEMO [19]: limits the discrepancy between the surrogate and the black-box objective function using normalized maximum likelihood before performing gradient ascent. (10) BDI [21] employs forward and backward mappings to transfer knowledge from the offline dataset to the design. (11) IOM [47]: ensures representation consistency between the training dataset and the optimized designs. Generative model-based methods include: (12) CbAS [8], which adapts a VAE model to steer the design distribution toward areas with higher scores. (13) Auto CbAS [24], which uses importance sampling to update a regression model based on CbAS. (14) MIN [9], which establishes a relationship between scores and designs and seeks optimal designs within this framework. (15) DDOM [11], which learns a generative diffusion model conditioned on the score values.

5.4 Implementation Details

We follow the training protocols from [18] for all comparative methods unless stated otherwise. A 3333-layer MLP with ReLU activation is used for both f𝜽()subscript𝑓𝜽f_{\bm{\theta}}(\cdot)italic_f start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( ⋅ ) and sϕ()subscript𝑠bold-italic-ϕs_{\bm{\phi}}(\cdot)italic_s start_POSTSUBSCRIPT bold_italic_ϕ end_POSTSUBSCRIPT ( ⋅ ), with a hidden layer size of 2048204820482048. In Algorithm 1, the iteration count, T𝑇Titalic_T, is established at 100100100100 for both continuous and discrete tasks. The Adam optimizer [48] is utilized to train the surrogate models over 200200200200 epochs with a batch size of 128128128128, and a learning rate set at 1e11𝑒11e-11 italic_e - 1. The step size, η𝜂\etaitalic_η, in equation 6 is configured at 1e31𝑒31e-31 italic_e - 3 for continuous tasks and 1e11𝑒11e-11 italic_e - 1 for discrete tasks. The conditional diffusion model, sϕ()subscript𝑠bold-italic-ϕs_{\bm{\phi}}(\cdot)italic_s start_POSTSUBSCRIPT bold_italic_ϕ end_POSTSUBSCRIPT ( ⋅ ), undergoes training for 1000100010001000 epochs with a batch size of 128128128128. For the existing design editing, following precedents set by previous studies [49, 11], we assign a target score, y^^𝑦\hat{y}over^ start_ARG italic_y end_ARG, of 1111 and M𝑀Mitalic_M at 1000100010001000. The selected value of m𝑚mitalic_m is 400400400400, with further elaboration provided in Appendix A.2. Results from traditional methodologies are referenced from [1], and we conduct 8888 independent trials for other methods, reporting the mean and standard error. All experiments are conducted on a single NVIDIA V100100100100 GPU, with execution times per trial ranging from 10101010 minutes to 20202020 hours, depending on the specific tasks.

Table 1: Experimental results on continuous tasks for comparison.
Method Superconductor Ant Morphology D’Kitty Morphology Levy
𝒟𝒟\mathcal{D}caligraphic_D(best) 0.3990.3990.3990.399 0.5650.5650.5650.565 0.8840.8840.8840.884 0.6130.6130.6130.613
BO-qEI 0.402±0.034plus-or-minus0.4020.0340.402\pm 0.0340.402 ± 0.034 0.819±0.000plus-or-minus0.8190.0000.819\pm 0.0000.819 ± 0.000 0.896±0.000plus-or-minus0.8960.0000.896\pm 0.0000.896 ± 0.000 0.810±0.016plus-or-minus0.8100.0160.810\pm 0.0160.810 ± 0.016
CMA-ES 0.465±0.024plus-or-minus0.4650.0240.465\pm 0.0240.465 ± 0.024 1.214±0.732plus-or-minus1.2140.732\textbf{1.214}\pm\textbf{0.732}1.214 ± 0.732 0.724±0.001plus-or-minus0.7240.0010.724\pm 0.0010.724 ± 0.001 0.887±0.025plus-or-minus0.8870.0250.887\pm 0.0250.887 ± 0.025
REINFORCE 0.481±0.013plus-or-minus0.4810.0130.481\pm 0.0130.481 ± 0.013 0.266±0.032plus-or-minus0.2660.0320.266\pm 0.0320.266 ± 0.032 0.562±0.196plus-or-minus0.5620.1960.562\pm 0.1960.562 ± 0.196 0.564±0.090plus-or-minus0.5640.0900.564\pm 0.0900.564 ± 0.090
Grad 0.489±0.018plus-or-minus0.4890.0180.489\pm 0.0180.489 ± 0.018 0.927±0.027plus-or-minus0.9270.0270.927\pm 0.0270.927 ± 0.027 0.949±0.014plus-or-minus0.9490.0140.949\pm 0.0140.949 ± 0.014 0.948±0.031plus-or-minus0.9480.0310.948\pm 0.0310.948 ± 0.031
Mean 0.505±0.013plus-or-minus0.5050.0130.505\pm 0.0130.505 ± 0.013 0.940±0.014plus-or-minus0.9400.0140.940\pm 0.0140.940 ± 0.014 0.956±0.014plus-or-minus0.9560.014\textbf{0.956}\pm\textbf{0.014}0.956 ± 0.014 0.984±0.023plus-or-minus0.9840.023\textbf{0.984}\pm\textbf{0.023}0.984 ± 0.023
Min 0.501±0.019plus-or-minus0.5010.0190.501\pm 0.0190.501 ± 0.019 0.918±0.034plus-or-minus0.9180.0340.918\pm 0.0340.918 ± 0.034 0.942±0.009plus-or-minus0.9420.0090.942\pm 0.0090.942 ± 0.009 0.964±0.023plus-or-minus0.9640.0230.964\pm 0.0230.964 ± 0.023
COMs 0.481±0.028plus-or-minus0.4810.0280.481\pm 0.0280.481 ± 0.028 0.842±0.037plus-or-minus0.8420.0370.842\pm 0.0370.842 ± 0.037 0.926±0.019plus-or-minus0.9260.0190.926\pm 0.0190.926 ± 0.019 0.936±0.025plus-or-minus0.9360.0250.936\pm 0.0250.936 ± 0.025
ROMA 0.509±0.015plus-or-minus0.5090.015\textbf{0.509}\pm\textbf{0.015}0.509 ± 0.015 0.916±0.030plus-or-minus0.9160.0300.916\pm 0.0300.916 ± 0.030 0.929±0.013plus-or-minus0.9290.0130.929\pm 0.0130.929 ± 0.013 0.976±0.019plus-or-minus0.9760.0190.976\pm 0.0190.976 ± 0.019
NEMO 0.502±0.002plus-or-minus0.5020.0020.502\pm 0.0020.502 ± 0.002 0.955±0.006plus-or-minus0.9550.0060.955\pm 0.0060.955 ± 0.006 0.952±0.004plus-or-minus0.9520.004\textbf{0.952}\pm\textbf{0.004}0.952 ± 0.004 0.969±0.019plus-or-minus0.9690.0190.969\pm 0.0190.969 ± 0.019
BDI 0.513±0.000plus-or-minus0.5130.0000.513\pm 0.0000.513 ± 0.000 0.906±0.000plus-or-minus0.9060.0000.906\pm 0.0000.906 ± 0.000 0.919±0.000plus-or-minus0.9190.0000.919\pm 0.0000.919 ± 0.000 0.938±0.000plus-or-minus0.9380.0000.938\pm 0.0000.938 ± 0.000
IOM 0.518±0.020plus-or-minus0.5180.020\textbf{0.518}\pm\textbf{0.020}0.518 ± 0.020 0.922±0.030plus-or-minus0.9220.0300.922\pm 0.0300.922 ± 0.030 0.944±0.012plus-or-minus0.9440.0120.944\pm 0.0120.944 ± 0.012 0.988±0.021plus-or-minus0.9880.021\textbf{0.988}\pm\textbf{0.021}0.988 ± 0.021
CbAS 0.503±0.069plus-or-minus0.5030.069\textbf{0.503}\pm\textbf{0.069}0.503 ± 0.069 0.876±0.031plus-or-minus0.8760.0310.876\pm 0.0310.876 ± 0.031 0.892±0.008plus-or-minus0.8920.0080.892\pm 0.0080.892 ± 0.008 0.938±0.037plus-or-minus0.9380.0370.938\pm 0.0370.938 ± 0.037
Auto CbAS 0.421±0.045plus-or-minus0.4210.0450.421\pm 0.0450.421 ± 0.045 0.882±0.045plus-or-minus0.8820.0450.882\pm 0.0450.882 ± 0.045 0.906±0.006plus-or-minus0.9060.0060.906\pm 0.0060.906 ± 0.006 0.797±0.033plus-or-minus0.7970.0330.797\pm 0.0330.797 ± 0.033
MIN 0.499±0.017plus-or-minus0.4990.0170.499\pm 0.0170.499 ± 0.017 0.445±0.080plus-or-minus0.4450.0800.445\pm 0.0800.445 ± 0.080 0.892±0.011plus-or-minus0.8920.0110.892\pm 0.0110.892 ± 0.011 0.761±0.037plus-or-minus0.7610.0370.761\pm 0.0370.761 ± 0.037
DDOM 0.486±0.013plus-or-minus0.4860.0130.486\pm 0.0130.486 ± 0.013 0.952±0.007plus-or-minus0.9520.0070.952\pm 0.0070.952 ± 0.007 0.941±0.006plus-or-minus0.9410.0060.941\pm 0.0060.941 ± 0.006 0.927±0.031plus-or-minus0.9270.0310.927\pm 0.0310.927 ± 0.031
DEMO(ours) 0.520±0.006plus-or-minus0.5200.006\textbf{0.520}\pm\textbf{0.006}0.520 ± 0.006 0.971±0.005plus-or-minus0.9710.005\textbf{0.971}\pm\textbf{0.005}0.971 ± 0.005 0.957±0.006plus-or-minus0.9570.006\textbf{0.957}\pm\textbf{0.006}0.957 ± 0.006 1.005±0.020plus-or-minus1.0050.020\textbf{1.005}\pm\textbf{0.020}1.005 ± 0.020
Table 2: Experimental results on discrete tasks, and ranking on all tasks for comparison.
Method TF Bind 8888 TF Bind 10101010 NAS Rank Mean Rank Median
𝒟𝒟\mathcal{D}caligraphic_D(best) 0.4390.4390.4390.439 0.4670.4670.4670.467 0.4360.4360.4360.436
BO-qEI 0.798±0.083plus-or-minus0.7980.0830.798\pm 0.0830.798 ± 0.083 0.652±0.038plus-or-minus0.6520.0380.652\pm 0.0380.652 ± 0.038 1.079±0.059plus-or-minus1.0790.059\textbf{1.079}\pm\textbf{0.059}1.079 ± 0.059 11.1/1611.11611.1/1611.1 / 16 13/16131613/1613 / 16
CMA-ES 0.953±0.022plus-or-minus0.9530.0220.953\pm 0.0220.953 ± 0.022 0.670±0.023plus-or-minus0.6700.0230.670\pm 0.0230.670 ± 0.023 0.985±0.079plus-or-minus0.9850.0790.985\pm 0.0790.985 ± 0.079 7.1/167.1167.1/167.1 / 16 3/163163/163 / 16
REINFORCE 0.948±0.028plus-or-minus0.9480.0280.948\pm 0.0280.948 ± 0.028 0.663±0.034plus-or-minus0.6630.0340.663\pm 0.0340.663 ± 0.034 1.895±0.000plus-or-minus1.8950.000-1.895\pm 0.000- 1.895 ± 0.000 12.1/1612.11612.1/1612.1 / 16 16/16161616/1616 / 16
Grad 0.898±0.033plus-or-minus0.8980.0330.898\pm 0.0330.898 ± 0.033 0.638±0.022plus-or-minus0.6380.0220.638\pm 0.0220.638 ± 0.022 0.611±0.052plus-or-minus0.6110.0520.611\pm 0.0520.611 ± 0.052 8.9/168.9168.9/168.9 / 16 10/16101610/1610 / 16
Mean 0.895±0.020plus-or-minus0.8950.0200.895\pm 0.0200.895 ± 0.020 0.654±0.028plus-or-minus0.6540.0280.654\pm 0.0280.654 ± 0.028 0.663±0.058plus-or-minus0.6630.0580.663\pm 0.0580.663 ± 0.058 6.4/166.4166.4/166.4 / 16 5/165165/165 / 16
Min 0.931±0.036plus-or-minus0.9310.0360.931\pm 0.0360.931 ± 0.036 0.634±0.033plus-or-minus0.6340.0330.634\pm 0.0330.634 ± 0.033 0.708±0.027plus-or-minus0.7080.0270.708\pm 0.0270.708 ± 0.027 8.0/168.0168.0/168.0 / 16 8/168168/168 / 16
COMs 0.474±0.053plus-or-minus0.4740.0530.474\pm 0.0530.474 ± 0.053 0.625±0.010plus-or-minus0.6250.0100.625\pm 0.0100.625 ± 0.010 0.796±0.029plus-or-minus0.7960.0290.796\pm 0.0290.796 ± 0.029 11.1/1611.11611.1/1611.1 / 16 12/16121612/1612 / 16
ROMA 0.921±0.040plus-or-minus0.9210.0400.921\pm 0.0400.921 ± 0.040 0.669±0.035plus-or-minus0.6690.0350.669\pm 0.0350.669 ± 0.035 0.934±0.025plus-or-minus0.9340.0250.934\pm 0.0250.934 ± 0.025 5.7/165.7165.7/165.7 / 16 4/164164/164 / 16
NEMO 0.942±0.003plus-or-minus0.9420.0030.942\pm 0.0030.942 ± 0.003 0.708±0.010plus-or-minus0.7080.0100.708\pm 0.0100.708 ± 0.010 0.735±0.012plus-or-minus0.7350.0120.735\pm 0.0120.735 ± 0.012 4.6/164.6164.6/164.6 / 16 5/165165/165 / 16
BDI 0.870±0.000plus-or-minus0.8700.0000.870\pm 0.0000.870 ± 0.000 0.605±0.000plus-or-minus0.6050.0000.605\pm 0.0000.605 ± 0.000 0.722±0.000plus-or-minus0.7220.0000.722\pm 0.0000.722 ± 0.000 9.7/169.7169.7/169.7 / 16 10/16101610/1610 / 16
IOM 0.870±0.074plus-or-minus0.8700.0740.870\pm 0.0740.870 ± 0.074 0.648±0.025plus-or-minus0.6480.0250.648\pm 0.0250.648 ± 0.025 0.411±0.044plus-or-minus0.4110.0440.411\pm 0.0440.411 ± 0.044 7.6/167.6167.6/167.6 / 16 7/167167/167 / 16
CbAS 0.927±0.051plus-or-minus0.9270.0510.927\pm 0.0510.927 ± 0.051 0.651±0.060plus-or-minus0.6510.0600.651\pm 0.0600.651 ± 0.060 0.683±0.079plus-or-minus0.6830.0790.683\pm 0.0790.683 ± 0.079 9.3/169.3169.3/169.3 / 16 8/168168/168 / 16
Auto CbAS 0.910±0.044plus-or-minus0.9100.0440.910\pm 0.0440.910 ± 0.044 0.630±0.045plus-or-minus0.6300.0450.630\pm 0.0450.630 ± 0.045 0.506±0.074plus-or-minus0.5060.0740.506\pm 0.0740.506 ± 0.074 12.4/1612.41612.4/1612.4 / 16 13/16131613/1613 / 16
MIN 0.905±0.052plus-or-minus0.9050.0520.905\pm 0.0520.905 ± 0.052 0.616±0.021plus-or-minus0.6160.0210.616\pm 0.0210.616 ± 0.021 0.717±0.046plus-or-minus0.7170.0460.717\pm 0.0460.717 ± 0.046 12.3/1612.31612.3/1612.3 / 16 13/16131613/1613 / 16
DDOM 0.961±0.024plus-or-minus0.9610.024\textbf{0.961}\pm\textbf{0.024}0.961 ± 0.024 0.640±0.029plus-or-minus0.6400.0290.640\pm 0.0290.640 ± 0.029 0.737±0.014plus-or-minus0.7370.0140.737\pm 0.0140.737 ± 0.014 7.3/167.3167.3/167.3 / 16 7/167167/167 / 16
DEMO(ours) 0.980±0.004plus-or-minus0.9800.004\textbf{0.980}\pm\textbf{0.004}0.980 ± 0.004 0.762±0.058plus-or-minus0.7620.058\textbf{0.762}\pm\textbf{0.058}0.762 ± 0.058 0.766±0.017plus-or-minus0.7660.0170.766\pm 0.0170.766 ± 0.017 1.7/16 1/16

5.5 Results

Performance in Continuous Tasks. Table 1 presents the results of the 4444 continuous tasks. DEMO reaches state-of-the-art performance on all of them. When compared to other generative model-based approaches, such as MIN and DDOM, DEMO generally outperforms them because these methods train models only on the offline dataset and may not learn characteristics of higher-scoring designs. DEMO achieves better performance by effectively mitigating this issue. Moreover, DEMO beats gradient-based methods, like Grad and COMs, by leveraging guidance from existing top designs and a higher target score simultaneously. This indicates that DEMO is effective for continuous tasks.

Performance in Discrete Tasks. Table 2 exhibits the results of the 3333 discrete tasks. DEMO attains top performances in TF Bind 8888 and TF Bind 10101010, where the results on TF10101010 surpass other methods by a significant margin, suggesting the ability of DEMO to solve discrete offline MBO tasks. Nonetheless, DEMO underperforms on NAS, which might be caused by two reasons. First, each neural network architecture is encoded as a sequence of one-hot vectors, which has a length of 64646464. This encoding process might be incapable of precisely representing all features of a given architecture, inducing undesirable performance on NAS. Furthermore, after checking the offline dataset of NAS, we find that many existing designs share commonalities. This redundancy means that the offline dataset of NAS contains less useful information than those of other tasks, which further explains why the performance of DEMO on NAS is not as strong.

Summary. These results on both continuous and discrete tasks soundly answer Q1111. DEMO attains the highest rankings with a mean of 1.7/161.7161.7/161.7 / 16 and median of 1/161161/161 / 16 as detailed in Table 2 and Figure 3, as well as secures top performances in all tasks. We have further run a Welch’s t-test on the tasks where DEMO obtains state-of-the-art results. We obtain p-values of 0.0070.0070.0070.007 on SuperC, 0.000030.000030.000030.00003 on Ant, 0.080.080.080.08 on D’Kitty, 0.0050.0050.0050.005 on Levy, 0.0050.0050.0050.005 on TF8, and 0.020.020.020.02 on TF10. This confirms that DEMO accomplishes statistically significant improvements in 5/7575/75 / 7 tasks.

5.6 Ablation Study

Table 3: Ablation studies on two phases of DEMO.
Task D DEMO w/o pseudo-target w/o editing
SuperC 86868686 0.520±0.006plus-or-minus0.5200.006\textbf{0.520}\pm\textbf{0.006}0.520 ± 0.006 0.487±0.012plus-or-minus0.4870.0120.487\pm 0.0120.487 ± 0.012 0.482±0.013plus-or-minus0.4820.0130.482\pm 0.0130.482 ± 0.013
Ant 60606060 0.971±0.005plus-or-minus0.9710.005\textbf{0.971}\pm\textbf{0.005}0.971 ± 0.005 0.945±0.016plus-or-minus0.9450.0160.945\pm 0.0160.945 ± 0.016 0.963±0.008plus-or-minus0.9630.0080.963\pm 0.0080.963 ± 0.008
D’Kitty 56565656 0.957±0.006plus-or-minus0.9570.006\textbf{0.957}\pm\textbf{0.006}0.957 ± 0.006 0.955±0.005plus-or-minus0.9550.0050.955\pm 0.0050.955 ± 0.005 0.933±0.002plus-or-minus0.9330.0020.933\pm 0.0020.933 ± 0.002
Levy 60606060 1.005±0.020plus-or-minus1.0050.020\textbf{1.005}\pm\textbf{0.020}1.005 ± 0.020 0.901±0.029plus-or-minus0.9010.0290.901\pm 0.0290.901 ± 0.029 0.990±0.020plus-or-minus0.9900.0200.990\pm 0.0200.990 ± 0.020
TF8 8888 0.980±0.004plus-or-minus0.9800.004\textbf{0.980}\pm\textbf{0.004}0.980 ± 0.004 0.757±0.063plus-or-minus0.7570.0630.757\pm 0.0630.757 ± 0.063 0.965±0.008plus-or-minus0.9650.0080.965\pm 0.0080.965 ± 0.008
TF10 10101010 0.762±0.058plus-or-minus0.7620.058\textbf{0.762}\pm\textbf{0.058}0.762 ± 0.058 0.626±0.009plus-or-minus0.6260.0090.626\pm 0.0090.626 ± 0.009 0.658±0.019plus-or-minus0.6580.0190.658\pm 0.0190.658 ± 0.019
NAS 64646464 0.766±0.017plus-or-minus0.7660.017\textbf{0.766}\pm\textbf{0.017}0.766 ± 0.017 0.741±0.022plus-or-minus0.7410.0220.741\pm 0.0220.741 ± 0.022 0.668±0.084plus-or-minus0.6680.0840.668\pm 0.0840.668 ± 0.084

To rigorously assess the individual contributions of pseudo-target distribution generation (pseudo-target) and existing design editing (editing) within our DEMO method, ablation experiments are conducted by systematically removing each phase. The omission of the pseudo-target phase includes training a conditional diffusion model only on the offline dataset and then applying the editing phase. In contrast, the removal of the editing phase involves using the model trained during the pseudo-target phase to generate new designs starting from a random Gaussian noise.

The results, as summarized in Table 3, provide clear insights into the impact of these modifications. For the 4444 continuous tasks, DEMO consistently achieves higher performance compared to its ablated versions. For instance, in the task SuperC, DEMO achieves a score of 0.520±0.006plus-or-minus0.5200.0060.520\pm 0.0060.520 ± 0.006, significantly higher than the versions without the pseudo-target phase (0.487±0.012plus-or-minus0.4870.0120.487\pm 0.0120.487 ± 0.012) and without the editing phase (0.482±0.013plus-or-minus0.4820.0130.482\pm 0.0130.482 ± 0.013). Similar improvements are observed in Ant, D’Kitty, and Levy, underscoring the effectiveness of integrating both phases in enhancing performance in continuous tasks. In the discrete tasks TF8, TF10, and NAS, DEMO’s superior performance over both partial versions is evident, highlighting its comprehensive effectiveness in managing discrete challenges. Overall, the ablation studies validate the importance of the pseudo-target distribution generation and existing design editing within the DEMO method, answering Q2222 that both phases are necessary for DEMO. These phases collectively contribute to enhancements across a range of tasks and input dimensions.

Refer to caption
Figure 2: The black triangles represent the mean rank, and the vertical sticks showcase the median rank. The whiskers indicate the minimum and maximum of the rank.
Refer to caption
Figure 3: The proportion is calculated as the number of new designs which surpass 𝒟(best)𝒟best\mathcal{D}(\textbf{best})caligraphic_D ( best ) divided by the budget 256256256256, indicating the reliability to consistently generate new higher-scoring designs. This figure demonstrates that DEMO is more reliable than DDOM in all tasks.

5.7 Reliability Study

As previously noted, generative model-based methods, which train solely on the offline dataset, often fail to generate new designs that consistently score higher. In this subsection, we assess the ability of DEMO to reliably produce superior designs compared to DDOM, which represents the latest and most robust among generative model-based approaches. We also discuss the comparison to gradient-based approaches in Appendix A.3. To measure reliability, we compute the proportion of new designs that exceed the best scores in the offline dataset 𝒟(best)𝒟best\mathcal{D}(\textbf{best})caligraphic_D ( best ). The results are depicted in Figure 3. DEMO consistently outperforms DDOM across all tasks, achieving notable improvements, particularly in the SuperC and NAS tasks. This confirms DEMO’s enhanced reliability over the state-of-the-art generative model-based baseline in both continuous and discrete settings. The Median scores included in Appendix A.1 further support these findings. DEMO achieves the top median-score rankings, affirming the reliability of DEMO and answering Q3333.

6 Conclusion and Discussion

In this study, we introduce Design Editing for Offline Model-based Optimization (DEMO), which consists of two phases. The first phase, pseudo-target distribution generation, involves training a surrogate model on the offline dataset and applying gradient ascent to create a synthetic dataset where the predicted scores serve as new labels. A conditional diffusion model is subsequently trained on this synthetic dataset to learn a pseudo-target distribution. The second phase, existing design editing, introduces random noise to existing top designs and employs the learned diffusion model to denoise them, conditioned on higher target scores. Overall, DEMO generates new designs by inheriting high-scoring features from top existing designs in the second phase and refine them with a more accurate conditional diffusion model obtained in the first phase. Extensive experiments on diverse offline MBO tasks validate that DEMO outperform various baseline approaches, yielding state-of-the-arts performance. The limitations and potential negative impacts of this study are discussed in Appendix A.4 and Appendix A.5, respectively.

7 Acknowledgement

This research is partly facilitated by the computational resources provided by Compute Canada and Mila Cluster.

References

  • [1] Brandon Trabucco, Xinyang Geng, Aviral Kumar, and Sergey Levine. Design-bench: Benchmarks for data-driven offline model-based optimization. arXiv preprint arXiv:2202.08450, 2022.
  • [2] Thomas Liao, Grant Wang, Brian Yang, Rene Lee, Kristofer Pister, Sergey Levine, and Roberto Calandra. Data-efficient learning of morphology and controller for a microrobot. arXiv preprint arXiv:1905.01334, 2019.
  • [3] Karen S Sarkisyan et al. Local fitness landscape of the green fluorescent protein. Nature, 2016.
  • [4] Christof Angermueller, David Dohan, David Belanger, Ramya Deshpande, Kevin Murphy, and Lucy Colwell. Model-based reinforcement learning for biological sequence design. In Proc. Int. Conf. Learning Rep. (ICLR), 2019.
  • [5] Kam Hamidieh. A data-driven statistical model for predicting the critical temperature of a superconductor. Computational Materials Science, 2018.
  • [6] Luis A Barrera et al. Survey of variation in human transcription factors reveals prevalent dna binding changes. Science, 2016.
  • [7] Paul J Sample, Ban Wang, David W Reid, Vlad Presnyak, Iain J McFadyen, David R Morris, and Georg Seelig. Human 5 UTR design and variant effect prediction from a massively parallel translation assay. Nature Biotechnology, 2019.
  • [8] David Brookes, Hahnbeom Park, and Jennifer Listgarten. Conditioning by adaptive sampling for robust design. In Proc. Int. Conf. Machine Lea. (ICML), 2019.
  • [9] Aviral Kumar and Sergey Levine. Model inversion networks for model-based optimization. Proc. Adv. Neur. Inf. Proc. Syst (NeurIPS), 2020.
  • [10] Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial networks, 2014.
  • [11] Siddarth Krishnamoorthy, Satvik Mehul Mashkaria, and Aditya Grover. Diffusion models for black-box optimization, 2023.
  • [12] Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations, 2021.
  • [13] Chin-Wei Huang, Jae Hyun Lim, and Aaron Courville. A variational perspective on diffusion-based generative models and score matching, 2021.
  • [14] Pascal Vincent. A connection between score matching and denoising autoencoders. Neural Computation, 23(7):1661–1674, 2011.
  • [15] Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data distribution, 2020.
  • [16] Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance, 2022.
  • [17] Sihyun Yu, Sungsoo Ahn, Le Song, and Jinwoo Shin. Roma: Robust model adaptation for offline model-based optimization. Proc. Adv. Neur. Inf. Proc. Syst (NeurIPS), 2021.
  • [18] Brandon Trabucco, Aviral Kumar, Xinyang Geng, and Sergey Levine. Conservative objective models for effective offline model-based optimization, 2021.
  • [19] Justin Fu and Sergey Levine. Offline model-based optimization via normalized maximum likelihood estimation. Proc. Int. Conf. Learning Rep. (ICLR), 2021.
  • [20] Can Chen, Yingxue Zhang, Xue Liu, and Mark Coates. Bidirectional learning for offline model-based biological sequence design, 2023.
  • [21] Can Chen, Yingxue Zhang, Jie Fu, Xue Liu, and Mark Coates. Bidirectional learning for offline infinite-width model-based optimization, 2023.
  • [22] Ye Yuan, Can Chen, Zixuan Liu, Willie Neiswanger, and Xue Liu. Importance-aware co-teaching for offline model-based optimization, 2023.
  • [23] Can Chen, Christopher Beckham, Zixuan Liu, Xue Liu, and Christopher Pal. Parallel-mentoring for offline model-based optimization, 2023.
  • [24] Clara Fannjiang and Jennifer Listgarten. Autofocused oracles for model-based design. Proc. Adv. Neur. Inf. Proc. Syst (NeurIPS), 2020.
  • [25] Chenlin Meng, Yutong He, Yang Song, Jiaming Song, Jiajun Wu, Jun-Yan Zhu, and Stefano Ermon. Sdedit: Guided image synthesis and editing with stochastic differential equations, 2022.
  • [26] Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models, 2022.
  • [27] Ron Mokady, Amir Hertz, Kfir Aberman, Yael Pritch, and Daniel Cohen-Or. Null-text inversion for editing real images using guided diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6038–6047, 2023.
  • [28] Daiki Miyake, Akihiro Iohara, Yu Saito, and Toshiyuki Tanaka. Negative-prompt inversion: Fast image inversion for editing with text-guided diffusion models. arXiv preprint arXiv:2305.16807, 2023.
  • [29] Chen Henry Wu and Fernando De la Torre. A latent space of stochastic diffusion models for zero-shot image editing and guidance. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7378–7387, 2023.
  • [30] Xuan Ju, Ailing Zeng, Yuxuan Bian, Shaoteng Liu, and Qiang Xu. Direct inversion: Boosting diffusion-based editing with 3 lines of code. arXiv preprint arXiv:2310.01506, 2023.
  • [31] Chenyang Qi, Xiaodong Cun, Yong Zhang, Chenyang Lei, Xintao Wang, Ying Shan, and Qifeng Chen. Fatezero: Fusing attentions for zero-shot text-based video editing. arXiv preprint arXiv:2303.09535, 2023.
  • [32] Duygu Ceylan, Chun-Hao P Huang, and Niloy J Mitra. Pix2video: Video editing using image diffusion. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 23206–23217, 2023.
  • [33] Shuai Yang, Yifan Zhou, Ziwei Liu, and Chen Change Loy. Rerender a video: Zero-shot text-guided video-to-video translation. arXiv preprint arXiv:2306.07954, 2023.
  • [34] Michal Geyer, Omer Bar-Tal, Shai Bagon, and Tali Dekel. Tokenflow: Consistent diffusion features for consistent video editing. arXiv preprint arXiv:2307.10373, 2023.
  • [35] Yuren Cong, Mengmeng Xu, Christian Simon, Shoufa Chen, Jiawei Ren, Yanping Xie, Juan-Manuel Perez-Rua, Bodo Rosenhahn, Tao Xiang, and Sen He. Flatten: optical flow-guided attention for consistent text-to-video editing. arXiv preprint arXiv:2310.05922, 2023.
  • [36] Youyuan Zhang, Xuan Ju, and James J Clark. Fastvideoedit: Leveraging consistency models for efficient text-to-video editing. arXiv preprint arXiv:2403.06269, 2024.
  • [37] Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models, 2020.
  • [38] Xuan Su, Jiaming Song, Chenlin Meng, and Stefano Ermon. Dual diffusion implicit bridges for image-to-image translation. In International Conference on Learning Representations, 2023.
  • [39] Jungtaek Kim. BayesO Benchmarks: Benchmark functions for Bayesian optimization, 2023.
  • [40] Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. Openai gym. arXiv preprint arXiv:1606.01540, 2016.
  • [41] Michael Ahn, Henry Zhu, Kristian Hartikainen, Hugo Ponte, Abhishek Gupta, Sergey Levine, and Vikash Kumar. Robel: Robotics benchmarks for learning with low-cost robots. In Conf. on Robot Lea. (CoRL), 2020.
  • [42] Barret Zoph and Quoc V. Le. Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578, 2017.
  • [43] Geoffrey E. Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan R. Salakhutdinov. Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580, 2012.
  • [44] James T Wilson, Riccardo Moriconi, Frank Hutter, and Marc Peter Deisenroth. The reparameterization trick for acquisition functions. arXiv preprint arXiv:1712.00424, 2017.
  • [45] Nikolaus Hansen. The CMA evolution strategy: A comparing review. In Towards a New Evolutionary Computation: Advances in the Estimation of Distribution Algorithms, 2006.
  • [46] Ronald J Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 1992.
  • [47] Han Qi, Yi Su, Aviral Kumar, and Sergey Levine. Data-driven model-based optimization via invariant representation learning. In Proc. Adv. Neur. Inf. Proc. Syst (NeurIPS), 2022.
  • [48] Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization, 2017.
  • [49] Minsu Kim, Federico Berto, Sungsoo Ahn, and Jinkyoo Park. Bootstrapped training of score-conditioned generator for offline design of biological sequences, 2023.

Appendix A Appendix

A.1 Median Normalized Scores

Table 4: Experimental results on continuous tasks for comparison.
Method Superconductor Ant Morphology D’Kitty Morphology Levy
𝒟𝒟\mathcal{D}caligraphic_D(best) 0.3990.3990.3990.399 0.5650.5650.5650.565 0.8840.8840.8840.884 0.6130.6130.6130.613
BO-qEI 0.300±0.015plus-or-minus0.3000.0150.300\pm 0.0150.300 ± 0.015 0.567±0.000plus-or-minus0.5670.0000.567\pm 0.0000.567 ± 0.000 0.883±0.000plus-or-minus0.8830.000\textbf{0.883}\pm\textbf{0.000}0.883 ± 0.000 0.643±0.009plus-or-minus0.6430.0090.643\pm 0.0090.643 ± 0.009
CMA-ES 0.379±0.003plus-or-minus0.3790.0030.379\pm 0.0030.379 ± 0.003 0.045±0.004plus-or-minus0.0450.004-0.045\pm 0.004- 0.045 ± 0.004 0.684±0.016plus-or-minus0.6840.0160.684\pm 0.0160.684 ± 0.016 0.410±0.009plus-or-minus0.4100.0090.410\pm 0.0090.410 ± 0.009
REINFORCE 0.463±0.016plus-or-minus0.4630.016\textbf{0.463}\pm\textbf{0.016}0.463 ± 0.016 0.138±0.032plus-or-minus0.1380.0320.138\pm 0.0320.138 ± 0.032 0.356±0.131plus-or-minus0.3560.1310.356\pm 0.1310.356 ± 0.131 0.377±0.065plus-or-minus0.3770.0650.377\pm 0.0650.377 ± 0.065
Grad 0.293±0.010plus-or-minus0.2930.0100.293\pm 0.0100.293 ± 0.010 0.463±0.023plus-or-minus0.4630.0230.463\pm 0.0230.463 ± 0.023 0.862±0.007plus-or-minus0.8620.0070.862\pm 0.0070.862 ± 0.007 0.613±0.019plus-or-minus0.6130.0190.613\pm 0.0190.613 ± 0.019
Mean 0.334±0.004plus-or-minus0.3340.0040.334\pm 0.0040.334 ± 0.004 0.569±0.011plus-or-minus0.5690.0110.569\pm 0.0110.569 ± 0.011 0.876±0.005plus-or-minus0.8760.0050.876\pm 0.0050.876 ± 0.005 0.561±0.007plus-or-minus0.5610.0070.561\pm 0.0070.561 ± 0.007
Min 0.364±0.030plus-or-minus0.3640.0300.364\pm 0.0300.364 ± 0.030 0.569±0.021plus-or-minus0.5690.0210.569\pm 0.0210.569 ± 0.021 0.873±0.009plus-or-minus0.8730.0090.873\pm 0.0090.873 ± 0.009 0.537±0.006plus-or-minus0.5370.0060.537\pm 0.0060.537 ± 0.006
COMs 0.316±0.024plus-or-minus0.3160.0240.316\pm 0.0240.316 ± 0.024 0.564±0.002plus-or-minus0.5640.0020.564\pm 0.0020.564 ± 0.002 0.881±0.002plus-or-minus0.8810.0020.881\pm 0.0020.881 ± 0.002 0.511±0.012plus-or-minus0.5110.0120.511\pm 0.0120.511 ± 0.012
ROMA 0.370±0.019plus-or-minus0.3700.0190.370\pm 0.0190.370 ± 0.019 0.477±0.038plus-or-minus0.4770.0380.477\pm 0.0380.477 ± 0.038 0.854±0.007plus-or-minus0.8540.0070.854\pm 0.0070.854 ± 0.007 0.558±0.003plus-or-minus0.5580.0030.558\pm 0.0030.558 ± 0.003
NEMO 0.320±0.008plus-or-minus0.3200.0080.320\pm 0.0080.320 ± 0.008 0.592±0.000plus-or-minus0.5920.0000.592\pm 0.0000.592 ± 0.000 0.883±0.000plus-or-minus0.8830.000\textbf{0.883}\pm\textbf{0.000}0.883 ± 0.000 0.538±0.006plus-or-minus0.5380.0060.538\pm 0.0060.538 ± 0.006
BDI 0.412±0.000plus-or-minus0.4120.0000.412\pm 0.0000.412 ± 0.000 0.474±0.000plus-or-minus0.4740.0000.474\pm 0.0000.474 ± 0.000 0.855±0.000plus-or-minus0.8550.0000.855\pm 0.0000.855 ± 0.000 0.534±0.003plus-or-minus0.5340.0030.534\pm 0.0030.534 ± 0.003
IOM 0.350±0.023plus-or-minus0.3500.0230.350\pm 0.0230.350 ± 0.023 0.513±0.035plus-or-minus0.5130.0350.513\pm 0.0350.513 ± 0.035 0.876±0.006plus-or-minus0.8760.0060.876\pm 0.0060.876 ± 0.006 0.562±0.007plus-or-minus0.5620.0070.562\pm 0.0070.562 ± 0.007
CbAS 0.111±0.017plus-or-minus0.1110.0170.111\pm 0.0170.111 ± 0.017 0.384±0.016plus-or-minus0.3840.0160.384\pm 0.0160.384 ± 0.016 0.753±0.008plus-or-minus0.7530.0080.753\pm 0.0080.753 ± 0.008 0.479±0.020plus-or-minus0.4790.0200.479\pm 0.0200.479 ± 0.020
Auto CbAS 0.131±0.010plus-or-minus0.1310.0100.131\pm 0.0100.131 ± 0.010 0.364±0.014plus-or-minus0.3640.0140.364\pm 0.0140.364 ± 0.014 0.736±0.025plus-or-minus0.7360.0250.736\pm 0.0250.736 ± 0.025 0.499±0.022plus-or-minus0.4990.0220.499\pm 0.0220.499 ± 0.022
MIN 0.336±0.016plus-or-minus0.3360.0160.336\pm 0.0160.336 ± 0.016 0.618±0.040plus-or-minus0.6180.040\textbf{0.618}\pm\textbf{0.040}0.618 ± 0.040 0.887±0.004plus-or-minus0.8870.004\textbf{0.887}\pm\textbf{0.004}0.887 ± 0.004 0.681±0.030plus-or-minus0.6810.030\textbf{0.681}\pm\textbf{0.030}0.681 ± 0.030
DDOM 0.346±0.009plus-or-minus0.3460.0090.346\pm 0.0090.346 ± 0.009 0.615±0.007plus-or-minus0.6150.007\textbf{0.615}\pm\textbf{0.007}0.615 ± 0.007 0.861±0.003plus-or-minus0.8610.0030.861\pm 0.0030.861 ± 0.003 0.595±0.012plus-or-minus0.5950.0120.595\pm 0.0120.595 ± 0.012
DEMO(ours) 0.412±0.008plus-or-minus0.4120.0080.412\pm 0.0080.412 ± 0.008 0.624±0.014plus-or-minus0.6240.014\textbf{0.624}\pm\textbf{0.014}0.624 ± 0.014 0.875±0.003plus-or-minus0.8750.0030.875\pm 0.0030.875 ± 0.003 0.601±0.006plus-or-minus0.6010.0060.601\pm 0.0060.601 ± 0.006
Table 5: Experimental results on discrete tasks, and ranking on all tasks for comparison.
Method TF Bind 8888 TF Bind 10101010 NAS Rank Mean Rank Median
𝒟𝒟\mathcal{D}caligraphic_D(best) 0.4390.4390.4390.439 0.4670.4670.4670.467 0.4360.4360.4360.436
BO-qEI 0.439±0.000plus-or-minus0.4390.0000.439\pm 0.0000.439 ± 0.000 0.467±0.000plus-or-minus0.4670.0000.467\pm 0.0000.467 ± 0.000 0.544±0.099plus-or-minus0.5440.0990.544\pm 0.0990.544 ± 0.099 6.7/166.7166.7/166.7 / 16 7/167167/167 / 16
CMA-ES 0.537±0.014plus-or-minus0.5370.0140.537\pm 0.0140.537 ± 0.014 0.484±0.014plus-or-minus0.4840.0140.484\pm 0.0140.484 ± 0.014 0.591±0.102plus-or-minus0.5910.102\textbf{0.591}\pm\textbf{0.102}0.591 ± 0.102 8.9/168.9168.9/168.9 / 16 6/166166/166 / 16
REINFORCE 0.462±0.021plus-or-minus0.4620.0210.462\pm 0.0210.462 ± 0.021 0.475±0.008plus-or-minus0.4750.0080.475\pm 0.0080.475 ± 0.008 1.895±0.000plus-or-minus1.8950.000-1.895\pm 0.000- 1.895 ± 0.000 11.3/1611.31611.3/1611.3 / 16 15/16151615/1615 / 16
Grad 0.556±0.021plus-or-minus0.5560.0210.556\pm 0.0210.556 ± 0.021 0.562±0.017plus-or-minus0.5620.017\textbf{0.562}\pm\textbf{0.017}0.562 ± 0.017 0.227±0.110plus-or-minus0.2270.1100.227\pm 0.1100.227 ± 0.110 7.9/167.9167.9/167.9 / 16 9/169169/169 / 16
Mean 0.539±0.030plus-or-minus0.5390.0300.539\pm 0.0300.539 ± 0.030 0.539±0.010plus-or-minus0.5390.0100.539\pm 0.0100.539 ± 0.010 0.494±0.077plus-or-minus0.4940.0770.494\pm 0.0770.494 ± 0.077 6.1/166.1166.1/166.1 / 16 5/165165/165 / 16
Min 0.569±0.050plus-or-minus0.5690.0500.569\pm 0.0500.569 ± 0.050 0.485±0.021plus-or-minus0.4850.0210.485\pm 0.0210.485 ± 0.021 0.567±0.006plus-or-minus0.5670.006\textbf{0.567}\pm\textbf{0.006}0.567 ± 0.006 5.3/165.3165.3/165.3 / 16 5/165165/165 / 16
COMs 0.439±0.000plus-or-minus0.4390.0000.439\pm 0.0000.439 ± 0.000 0.467±0.002plus-or-minus0.4670.0020.467\pm 0.0020.467 ± 0.002 0.525±0.003plus-or-minus0.5250.0030.525\pm 0.0030.525 ± 0.003 8.7/168.7168.7/168.7 / 16 8/168168/168 / 16
ROMA 0.555±0.020plus-or-minus0.5550.0200.555\pm 0.0200.555 ± 0.020 0.512±0.020plus-or-minus0.5120.0200.512\pm 0.0200.512 ± 0.020 0.525±0.003plus-or-minus0.5250.0030.525\pm 0.0030.525 ± 0.003 6.9/166.9166.9/166.9 / 16 6/166166/166 / 16
NEMO 0.438±0.001plus-or-minus0.4380.0010.438\pm 0.0010.438 ± 0.001 0.454±0.001plus-or-minus0.4540.0010.454\pm 0.0010.454 ± 0.001 0.564±0.016plus-or-minus0.5640.016\textbf{0.564}\pm\textbf{0.016}0.564 ± 0.016 8.1/168.1168.1/168.1 / 16 9/169169/169 / 16
BDI 0.439±0.000plus-or-minus0.4390.0000.439\pm 0.0000.439 ± 0.000 0.476±0.000plus-or-minus0.4760.0000.476\pm 0.0000.476 ± 0.000 0.517±0.000plus-or-minus0.5170.0000.517\pm 0.0000.517 ± 0.000 8.3/168.3168.3/168.3 / 16 8/168168/168 / 16
IOM 0.439±0.000plus-or-minus0.4390.0000.439\pm 0.0000.439 ± 0.000 0.477±0.010plus-or-minus0.4770.0100.477\pm 0.0100.477 ± 0.010 0.050±0.011plus-or-minus0.0500.011-0.050\pm 0.011- 0.050 ± 0.011 8.0/168.0168.0/168.0 / 16 7/167167/167 / 16
CbAS 0.428±0.010plus-or-minus0.4280.0100.428\pm 0.0100.428 ± 0.010 0.463±0.007plus-or-minus0.4630.0070.463\pm 0.0070.463 ± 0.007 0.292±0.027plus-or-minus0.2920.0270.292\pm 0.0270.292 ± 0.027 13.6/1613.61613.6/1613.6 / 16 13/16131613/1613 / 16
Auto CbAS 0.419±0.007plus-or-minus0.4190.0070.419\pm 0.0070.419 ± 0.007 0.461±0.007plus-or-minus0.4610.0070.461\pm 0.0070.461 ± 0.007 0.217±0.005plus-or-minus0.2170.0050.217\pm 0.0050.217 ± 0.005 14.3/1614.31614.3/1614.3 / 16 14/16141614/1614 / 16
MIN 0.421±0.015plus-or-minus0.4210.0150.421\pm 0.0150.421 ± 0.015 0.468±0.006plus-or-minus0.4680.0060.468\pm 0.0060.468 ± 0.006 0.433±0.000plus-or-minus0.4330.0000.433\pm 0.0000.433 ± 0.000 6.7/166.7166.7/166.7 / 16 9/169169/169 / 16
DDOM 0.401±0.008plus-or-minus0.4010.0080.401\pm 0.0080.401 ± 0.008 0.464±0.006plus-or-minus0.4640.0060.464\pm 0.0060.464 ± 0.006 0.306±0.017plus-or-minus0.3060.0170.306\pm 0.0170.306 ± 0.017 9.4/169.4169.4/169.4 / 16 10/16101610/1610 / 16
DEMO(ours) 0.826±0.005plus-or-minus0.8260.005\textbf{0.826}\pm\textbf{0.005}0.826 ± 0.005 0.475±0.004plus-or-minus0.4750.0040.475\pm 0.0040.475 ± 0.004 0.541±0.005plus-or-minus0.5410.0050.541\pm 0.0050.541 ± 0.005 4.0/16 4/16

Performance in Continuous Tasks. Table 4 showcases the median normalized scores for various baseline methods across 4444 continuous tasks. DEMO, while not always topping the charts, demonstrates robust performance across these tasks, consistently outperforming several baseline methods. For example, in the Ant Morphology task, DEMO’s score of 0.624±0.014plus-or-minus0.6240.0140.624\pm 0.0140.624 ± 0.014 is the highest one among all approaches. This highlights DEMO’s capability to approximate the distribution of higher-scoring designs effectively. Notably, DEMO outperforms traditional generative models like CbAS and Auto CbAS by significant margins across all tasks, underscoring its advanced generative capabilities. It also maintains a competitive edge against more recent generative methods like MIN and DDOM.

Performance in Discrete Tasks. Moving to discrete tasks, as detailed in Table 5, DEMO exhibits impressive performance in the TF Bind 8888 task, substantially surpassing all baselines with a score of 0.826±0.005plus-or-minus0.8260.0050.826\pm 0.0050.826 ± 0.005. However, in more complex tasks like TF Bind 10101010 and NAS, while DEMO performs competitively, it does not lead the field. This mixed performance can be attributed to DEMO’s methodology which, although highly effective in capturing a broad distribution of high-quality designs, might struggle in task environments with redundancy in design features.

Summary. The results presented in Tables 4 and 5 collectively validate DEMO’s efficacy across both continuous and discrete optimization tasks, providing further support for answering Q1111 affirmatively. With a mean rank of 4.0/164.0164.0/164.0 / 16 and a median rank of 4/164164/164 / 16 in terms of the median normalized scores, DEMO stands out among 16161616 competing methods. This comprehensive performance underscores DEMO’s capacity to integrate and leverage complex design distributions effectively, setting a new standard in generative optimization methods.

A.2 Sensitivity to the Choice of m

In Eq. (9), selecting a time m𝑚mitalic_m close to M𝑀Mitalic_M results in 𝒙perturbsubscript𝒙𝑝𝑒𝑟𝑡𝑢𝑟𝑏\bm{x}_{perturb}bold_italic_x start_POSTSUBSCRIPT italic_p italic_e italic_r italic_t italic_u italic_r italic_b end_POSTSUBSCRIPT resembling random Gaussian noise, which introduces greater flexibility into the new design generation process. On the other hand, if m𝑚mitalic_m is closer to 00, the resulting design retains more characteristics of the existing top design. Thus, m𝑚mitalic_m serves as a critical hyperparameter in our methodology. This section explores the robustness of DEMO to various choices of m𝑚mitalic_m. We perform experiments on one continuous task, SuperC, and one discrete task, TF8888, with m𝑚mitalic_m ranging from 00 to 1000100010001000 in increments of 100100100100. As illustrated in Figure 4, DEMO generally outperforms the baseline methods with different choices of m𝑚mitalic_m. Nevertheless, overly extreme values of m𝑚mitalic_m, whether too high or too low, can diminish performance. Selecting an excessively low m𝑚mitalic_m causes the model to adhere too closely to the distribution of existing designs, while choosing an overly high m𝑚mitalic_m biases the model towards the pseudo-target distribution, neglecting the guidance of existing top designs. Choosing m𝑚mitalic_m from a mid-range effectively balances the influences from both the pseudo-target distribution and the top existing designs. Empirical results suggest that an m𝑚mitalic_m within the range of [200,600]200600[200,600][ 200 , 600 ] yields optimal performance, leading us to set m=400𝑚400m=400italic_m = 400 for all tasks.

Refer to caption
Figure 4: Selecting m𝑚mitalic_m near 00 results in generated designs that align closely with the distribution of existing designs. Conversely, setting m𝑚mitalic_m near 1000100010001000 steers the generated designs toward the pseudo-target distribution. Optimal designs are achieved by choosing m𝑚mitalic_m in the mid-range, effectively utilizing information from both the pseudo-target distribution and the top existing designs.

A.3 Extension of Reliability Study

This section extends the reliability study in section 5.7, comparing DEMO with a gradient-based approach. When compared to Grad, DEMO demonstrates greater consistency in 5 out of 7 tasks. However, Grad outperforms DEMO in Levy and TF10101010 tasks, which can be attributed to the gradient-based method’s tendency to generate new designs within a narrower distribution. While Grad achieves a higher proportion of higher-scoring new designs in these two tasks, DEMO generates new designs within a wider distribution and thus produces candidates with higher maximum scores, as evidenced in Table 2.

Refer to caption
Figure 5: The proportion is calculated as the number of new designs which surpass 𝒟(best)𝒟best\mathcal{D}(\textbf{best})caligraphic_D ( best ) divided by the budget 256256256256, indicating the reliability to consistently generate new higher-scoring designs. This figure demonstrates that DEMO is more reliable than Grad in 5/7575/75 / 7 tasks.

A.4 Limitations

We have demonstrated the effectiveness of DEMO across a wide range of tasks. However, some evaluation methods may not fully capture real-world complexities. For example, in the superconductor task [5], we follow traditional practice by using a random forest regression model as the oracle, as done in prior studies [1]. Unfortunately, this model might not entirely reflect the intricacies of real-world situations, which could lead to discrepancies between our oracle and actual ground-truth outcomes. Engaging with domain experts in the future could help enhance these evaluation approaches. Nevertheless, given DEMO’s straightforward approach and the empirical evidence supporting its robustness and efficacy across various tasks detailed in the Design-Bench [1] and BayesO Benchmarks [39], we remain confident in its ability to generalize effectively to different contexts.

A.5 Negative Impacts

This study seeks to advance the field of Machine Learning. However, it’s important to recognize that advanced optimization techniques can be used for either beneficial or detrimental purposes, depending on their application. For example, while these methods can contribute positively to society through the development of drugs and materials, they also have the potential to be misused to create harmful substances or products. As researchers, we must stay aware and ensure that our contributions promote societal betterment, while also carefully assessing potential risks and ethical concerns.