[1,2]\fnmYuangang \surPan
1]\orgdivCentre for Frontier AI Research, \orgnameAgency for Science, Technology and Research (A*STAR), \orgaddress\postcode138632, \countrySingapore
2]\orgdivInstitute of High Performance Computing, \orgnameAgency for Science, Technology and Research (A*STAR), \orgaddress\postcode138632, \countrySingapore
3]\orgdivDepartment of Computing and Decision Sciences, \orgnameLingnan University, \orgaddress\cityHong Kong
PROUD: PaRetO-gUided Diffusion Model for Multi-objective Generation
Abstract
Recent advancements in the realm of deep generative models focus on generating samples that satisfy multiple desired properties. However, prevalent approaches optimize these property functions independently, thus omitting the trade-offs among them. In addition, the property optimization is often improperly integrated into the generative models, resulting in an unnecessary compromise on generation quality (i.e., the quality of generated samples). To address these issues, we formulate a constrained optimization problem. It seeks to optimize generation quality while ensuring that generated samples reside at the Pareto front of multiple property objectives. Such a formulation enables the generation of samples that cannot be further improved simultaneously on the conflicting property functions and preserves good quality of generated samples. Building upon this formulation, we introduce the PaRetO-gUided Diffusion model (PROUD), wherein the gradients in the denoising process are dynamically adjusted to enhance generation quality while the generated samples adhere to Pareto optimality. Experimental evaluations on image generation and protein generation tasks demonstrate that our PROUD consistently maintains superior generation quality while approaching Pareto optimality across multiple property functions compared to various baselines.
keywords:
Multi-objective generation, diffusion model, Pareto optimality, generative model1 Introduction
Deep generative models have been developing prosperously over the last decade, with advances in variational autoencoders [27], generative adversarial networks [18, 61], normalizing flows [37], energy-based models [46], and diffusion models [44, 22]. Particularly, controllable generative models can generate samples that satisfy multiple properties of interest, showing great promise in various applications, such as material design [26, 50] and controlled text/image generation [8, 32]. These properties of interest vary depending on the specific application domains. For example, in protein design, the properties can refer to specified structural or functional characteristics, such as solubility or binding affinity [56]. In image generation, the properties can refer to certain attributes or features, such as specified hairstyle & makeup [55], or specified color patches [34]. In addition, it is considered imperative that generated samples should reside in the same data manifold111This relates to the manifold hypothesis that many real-world high-dimensional datasets lie on low-dimensional latent manifolds in the high-dimensional space [15] as training samples for data naturalness concerns [19].
Before delving into details, we first establish the problem setting. Given a dataset , where denotes a low-dimensional manifold in the high-dimensional space . Suppose we have objective functions , each of which returns a property value for the sample . The aim of multi-objective generation is to learn a generative model that produces samples optimized to achieve the best values across these functions while ensuring the generated samples remain within the manifold (green cross in Fig. 1(a), namely, ensuring that the quality of generated samples (dubbed as generation quality) is good222In other words, the generated samples is as realistic as samples in the given dataset ..
The multi-objective generation problem introduced above inherently requires reconciling the optimization challenges in two spaces: the functionality space and the sample space as shown in Fig. 1(a). Given the need to deal with multiple conflicting objectives in order to achieve the generation with desired properties, one challenge is how to produce samples that cannot be further improved simultaneously across the objectives, a.k.a. Pareto optimality [6] (the Pareto front in Fig. 1(a)). The second challenge arises from the manifold assumption that the generated samples should lie within the data manifold , namely, generated samples are supposed to be of good quality [40]. Optimizing multiple objectives without considering generation quality could result in Pareto solutions outside of the data manifold (i.e., invalid samples on the Pareto front of Fig. 1(a)). The third challenge relates to the coordination of generation quality and multi-property optimization. To guarantee generation quality, generative models typically define a divergence between the distribution of generated data and that of real training data [58, 18], which tends to disperse the generated data throughout the whole data manifold (the purple plane in Fig. 1(a)). However, since only a limited fraction of the samples on the data manifold lie on the Pareto front, there inevitably exists some distribution gap between the generated data and the training data, leading to compromise of generation quality, when achieving Pareto optimality.
A large number of studies [28, 11, 54, 31] attempt to design controllable generative models with multiple properties by simply assuming that these properties are independent and aggregating the multiple property objectives into a single one for controlled generation. Notably, a very recent study [19] takes into consideration the trade-offs between multiple properties by incorporating the multi-objective optimization techniques into the generative models. It modified the gradient of sampling in vanilla diffusion models as a linear combination of the original diffusion gradient and the gradient solved by the multi-objective Bayesian optimization. However, the adopted fixed coefficient is challenging to effectively coordinate the generation quality and the optimization of multiple property objectives. This results in an unnecessary loss of generation quality while achieving Pareto optimality for the property objectives.
In this work, we propose PaRetO-gUided Diffusion model (PROUD) for multi-objective generation. PROUD is formulated as a constrained optimization that minimizes the Kullback–Leibler (KL) divergence between the distribution of the generated data and that of the training data, where the distribution of the generated data is also constrained to be close to the distribution of Pareto solutions under the KL divergence. This guarantees that generated samples are moved towards the Pareto set and then the quality of these generated samples is optimized to the best within a neighborhood of the Pareto set. Specifically, constrained optimization is implemented during the generative process of a pre-trained unconditional diffusion model. Multiple gradient descents for the multiple objectives and the original diffusion gradient are adaptively weighted to denoise samples. The contributions of this work are summarized as follows:
-
•
We propose a novel constrained optimization formulation for controllable generation adhering to multiple properties, defined as multi-objective generation, which can better coordinate the generation quality and the optimization for multi-objectives.
-
•
A new controllable diffusion model (PROUD) is introduced to solve the constrained optimization formulation. The guidance of multiple objectives is adaptively integrated with that of data likelihood, which can reduce the needless comprise of generation quality while achieving Pareto optimality in terms of multiple property objectives.
-
•
We apply our PROUD to optimizing multiple objectives in the tasks of controllable image generation and protein design. Additionally, we establish various baselines based on diffusion models to demonstrate the superiority of our PROUD.
2 Related Work
In the section, we summarize the related works based on their strategies for integrating the optimization of multiple property objectives into deep generative models.
Single-objective generation (SOG) refers to approaches that simply combine multiple objectives into a single one to guide the generation. Extensive efforts have been devoted to controllable generation with multiple properties independent of each other [28, 20, 26, 11, 54, 31]. Nevertheless, these methods fail to capture the correlation between properties and ignore the conflicting nature among properties, leading to an insufficient exploration of the solution space.
Multi-objective Generation (MOG) refers to approaches that introduce multi-objective optimization techniques into generative models. Wang et al [53] adopted a weighted-sum strategy to deal with the trade-offs between properties, which can only work in cases of convex Pareto fronts and a uniformly distributed grid of weighting cannot guarantee uniform points on the Pareto front [41, 33]. Stanton et al [49] proposed LaMBO (Latent Multi-objective Bayesian Optimization), which applies multi-objective Bayesian optimization in the latent space of denoising autoencoder to optimize the generated samples with multiple black-box objectives. Although it can characterize the Pareto front, the data generated by denoising autoencoder is of inferior quality. Gruver et al [19] further applied LaMBO to the latent space of discrete diffusion models. It generalized classifier-guided diffusion models [14] by replacing the classifier gradient with the gradient obtained by LaMBO. The combination of the score estimate of a diffusion model and the classifier gradient necessitates manual tuning of the combination coefficient, which is theoretically inappropriate for non-convex functions [17]. Tagasovska et al [50] proposed to use multiple gradient descent [12] for sampling within compositional energy-based models (EBMs) where each EBM is conditioned on one specific property, but training multiple conditional EBMs requires much more supervision than training discriminative models. Moreover, this kind of paradigm cannot enjoy post-hoc controls upon the pre-trained unconditional generative models. Multi-objective generative flow networks (GFlowNets) [25] fully integrated guidance from multiple objectives into the training process. So, they must be retrained whenever the objectives change and are also not suitable for use with pre-trained generative models. In addition, this kind of models are usually difficult to train [42].
Diffusion models [22, 43, 44, 48] represent the state-of-the-art (SOTA) in deep generative models. Therefore, we build our multiple-objective generation model based on diffusion models. While most related works design their methods based on other deep generative models, we apply their ideas to the diffusion model as much as possible for the sake of comparison. Please refer to Section 5 for more details.
3 Preliminaries
Before delving into our method, we introduce the technical background about multi-objective optimization in Section 3.1 and diffusion models in Section 3.2, respectively.
3.1 Multi-objective Optimization
Let be a decision variable. Assuming that be a set of objective functions, each of which represents a property and is preferred to have a smaller value. The multi-objective optimization problem [6, 9] can be conventionally expressed as:
(1) |
In this context, for , is said to dominate , i.e., , iff , and .
Definition 1 (Pareto optimality).
Definition 2 (Pareto stationarity).
Pareto stationarity is a necessary condition for Pareto optimality. A point is called Pareto stationary if there exists a set of scalar , such that .
Désidéri [12] proposed Multiple Gradient Descent (MGD) to find the Pareto optimal solutions of Eq.(1). To be specific, given any initial point , we can iteratively update according to:
(2) |
where is the iteration step. The update direction is expected to be close to each gradient as much as possible, which is therefore formulated into the following problem:
(3) |
Through Lagrange strong duality, the solution to Eq.(3) can be framed into
(4) |
where under the constraint that .
3.2 Diffusion Models
The idea of Diffusion models is to progressively diffuse data to noise, and then learn to reverse this process for sample generation. Considering a sequence of prescribed noise scales , Denoising Diffusion Probabilistic Model (DDPM) [22] diffuses data to noise via constructing a discrete Markov chain , where . This process is called the forwarded process or diffusion process. In particular, , where .
The key of diffusion-based generative models is to train a reverse Markov chain so that we can generate data starting from a Gaussian noise . The training loss of the reverse diffusion process, a.k.a. generative process, is to minimize a simplified variational bound of negative log likelihood. Namely,
(5) |
where is a neural network-based approximator to predict the noise from .
After training the neural network parameterized by to obtain the optimal , samples can be generated by starting from and reversing the Markov chain:
(6) |
where and . More variants of diffusion models can be seen in Yang et al [58].
Existing attempts for incorporating multiple desired properties into the diffusion model [19] can be straightforwardly adding the derived MGD in Eq.(4) to the noise predictor at each denoising step, namely,
(7) |
where . is a trade-off hyper-parameter which balances the generation quality (i.e., the noise predictor ) and multiple-objectives (i.e., the MGD ). Note that an inappropriate may lead to unsatisfied samples which either suffer from low quality or fail to possess required properties (Refer to experimental observations in Section 5).
4 Multi-Objective Generation
As discussed above, optimizing generative models in terms of objectives aims to produce samples that cannot be simultaneously improved for all objectives, namely, Pareto optimality (see Definition 1). Meanwhile, the generated samples are required to be as realistic as the training samples, which is usually achieved by enforcing distribution alignment between the generated samples and the training samples.
MOG compared with MOO
As shown in Table 1, both the MOO and MOG share the same objectives but differ in the space that resides in, which is termed as “decision space” or “solution space” in the MOO problem [6] and is termed as “data space” in the MOG problem [19, 54]. To be specific, the decision space of the MOO problem is defined as the whole space of [5], while the data space of the MOG problem only resides in a low-dimensional manifold embedded in (a.k.a. the ambient space) [15, 38, 35]. Such a difference highlights that the objectives to be optimized for MOG are only meaningful within the data manifold. When simply applying MOO algorithms to search for solutions in the high-dimensional sample space, the obtained solutions cannot guarantee residing within the data manifold, thus resulting in very low data quality (i.e., invalid samples in Fig. 1(a)) and a loss of practicability [40].
To sum up, the necessity to concurrently consider generation quality distinguishes the MOG problem from the MOO problem. Specifically, a dataset with real samples is required to define the data manifold on which the generated samples are expected to reside (Eq.(8)).
objectives | decision/data space | generation quality | |
MOO | ✗ | ||
MOG | ✓ |
4.1 Constrained Optimization for MOG
A straightforward solution of MOG is to take consideration of generation quality as an additional objective and formulate it into a objectives problem. However, the heterogeneity of multiple objective optimization (usually defined w.r.t. a single sample) and the distribution alignment (defined w.r.t. a dataset) would bring out the optimization difficulty for the resultant MOO. Although it is feasible to simplify the distribution divergence w.r.t. a dataset as quality scores for individual samples in some deep generative models [3], it is still challenging to obtain desired solutions that achieve Pareto optimality on objectives from the optimization of objectives which explore a much larger space, as empirically verified in the experiments. In addition, the complexity of multi-objective optimization increases significantly with the number of objectives [23].
Instead of formulating a complex and ineffective objective problem, we implement the multi-objective generation through a tailor-designed constrained optimization problem upon property objectives. Such a formulation also allows us to stress respective significance of data generation and -objective optimization, instead of treating them equally important. Specifically, let denote the target data distribution parameterized by , and denote the distribution of the solution samples on the Pareto front, our constrained optimization problem can be formulated as follows
(8) |
where denotes the distribution divergence and is a small positive value.
The loss in Eq.(8) controls the generation quality, which ensures the quality of the generated data as realistic as possible. The constraint in Eq.(8) ensures the generated data to be Pareto optimal (with a small bearable error). Overall, Eq.(8) provides certain quality assurance while obtaining samples that can approach Pareto optimality of multiple property objectives.
4.2 Langevin Dynamics for Data Distribution Approximation
It is difficult to directly solve Eq.(8) when both and are unknown. Motivated by those widely-developed techniques of sampling algorithms for approximating data distribution [2, 44, 33], we develop Langevin dynamic-based sampling techniques to solve Eq.(8). Specifically, Langevin dynamics are capable of generating samples from a given probability distribution solely by utilizing its score function . Given an initial value , the Langevin method recursively computes the following:
(9) |
where is the step size and can be fixed or dynamic, is sampled from the standard normal distribution and is the update direction for , equal to . The distribution of will be close to the given data distribution when and under some regularity conditions [57].
Before deriving the proper gradient to approximate the distribution optimized in Eq.(8) as a whole, we investigate the gradient-based strategies to optimize and via Langevin dynamics, separately.
Optimization of in Eq.(8). Actually, various generative models are deduced to approximate the minimization of the KL divergence between the data distribution and the model distribution [27, 47, 37]. Here, we choose diffusion models as the representative for optimizing given their equivalent form to Eq.(9) [22, 48]. Particularly, the time-dependent predicted noise in Eq.(6) is the update direction in anneal Langevin dynamics with a dynamic step size :
(10) |
Consequently, the distribution of will approach [47].
Optimization of in Eq.(8). On the other hand, we can integrate MGD (Eq.(4)) into Langevin dynamics to optimize , aiming to approximate the distribution of the Pareto set upon convergence. Namely,
(11) |
where is a fixed step size. The distribution of will converge to , as demonstrated in Theorem 3.3 of Liu et al [33].
4.3 Pareto-guided Diffusion Model
Based on the above analysis, the key to solving the constrained optimization problem (Eq.(8)) is to design a proper strategy for unifying the optimization of and within the framework of Langevin dynamic sampling. Therefore, we can indirectly solve Eq.(8) by designing the following strategies to update the gradient in Eq.(9):
-
1)
If the sample is far away from the Pareto front (constraint violation), is chosen to assure Pareto improvement (i.e., decreasing all the objectives) to . The amount of Pareto improvement is determinant by the distance of to the Pareto front.
-
2)
If there are multiple directions that can yield Pareto improvement (constraint violation), the direction of Pareto improvement that decreases most (reducing loss) is chosen as .
-
3)
If is close to the Pareto front (constraint satisfaction), i.e., having a small according to Definition 2, is chosen to fully optimize (reducing loss).
Following Ye and Liu [60], we design a new objective based on the gradients to achieve the above conditions. To be specific, since is the gradient for optimizing , and is the gradient for optimizing , the integrated gradient can be solved by the following objective:
(12) | ||||
where and are positive hyper-parameters. The constraint in Eq.(8) can be approximated by the small gradient norm due to Pareto stationarity (Definition 2). In particular, when , is set to be proportionate to . This will encourage the gradient to have positive inner products with all , approximating . Meanwhile, the amount of Pareto improvement is based on the distance of to the Pareto front. If has a very small norm, which means that the sample is close to the Pareto front, we will have with . Therefore, samples will be updated with a pure gradient descent on without taking into account the objectives , namely, .
At the situation of , the solution of Eq.(12) is expressed as:
(13) |
where is the solution of the following dual problem:
(14) |
Substituting the derived gradient (Eq.(13)) into Eq.(9) and adopting a dynamic step size , we can obtain a new kind of controllable diffusion modeling, which is named as PaRetO-gUided Diffusion model (PROUD):
(15) |
PROUD does not modify the training process of diffusion models but only updates gradients during the generative process, as summarized in Algorithm 1. Therefore, our PROUD can be plugged into any pre-trained diffusion model to gain post-hoc control during the generative process.
In contrast to existing methods that crudely combine generative models with multi-objective optimization techniques using a predefined balance coefficient, our constrained optimization formulation (Eq.(8)) allows to dynamically infer the balance coefficient (Eq.(14)), prioritizing the guarantee of Pareto optimality.
4.4 Diversity Regularization for Diversified Pareto Solutions
In practice, MGD integrated with Langevin dynamics fails to obtain diversified Pareto solutions although it can be guaranteed to obtain solutions on the Pareto front [33]. To make the solutions be evenly distributed on the Pareto front, we consider adding a diversity regularization, which can be enforced either in the sample space or the functionality space. Because we are interested in high-dimensional data generation, imposing larger distances between samples can be challenging. Furthermore, a significant separation between samples does not necessarily ensure a substantial distinction between their respective functionalities. Therefore, we define the diversity regularization based on the objective values.
5 Experiments
In this section, we evaluate the effectiveness of our PROUD in optimizing image generation and protein generation with multiple conflicting objectives. We study white-box multi-objectives in this work and particularly focus on using MGD as the MOO technique to obtain the gradient from multi-objectives. The exploration of the black-box setting, as mentioned in [49], is discussed in the conclusion and remains for future work.
Dataset. In the task of image generation, we use the CIFAR10 [29] dataset, which consists of 60,000 color images, each with a size of , distributed across 10 classes. Regarding protein generation, following Gruver et al [19], the experiment was conducted on the paired Observed Antibody Space (pOAS) dataset [36], which comprises antibody sequences, each processed to a fixed length of 300.
Baselines. First, we include the most closely-related and SOTA work in MOG that applies the MOO technique to the deep generative model [19]. This baseline is termed as “DM+-MGD”, where the MGD of objectives is used to guide the generation of diffusion models (DM). We also include the baseline regarding single-objective generation, termed as “DM+single”. It fuses multiple objectives into a single objective and uses the gradient of the obtained single objective to guide the generation of diffusion models. Another considered baseline is “-MGD”. It treats the objective of the diffusion model as an additional objective and formulates multi-objective generation as the optimization of objectives. MGD is then applied directly for the resultant objectives. To stress the necessity of quality assurance in the generation problem, which is the core difference between MOG and MOO, we include the MGD of objectives as the baseline, called “m-MGD”.
For all methods equipped with MGD, the diversity regularization (Eq.(16)) is included except for -MGD since its extra objective , i.e., data likelihood, is not accessible for the diffusion models.
Metrics. In terms of generation quality, the Frechet Inception Distance (FID) [21] is adopted as the metric for image quality, while the log-likelihood assigned by ProtGPT [16] is considered as the metric for the quality of protein sequences following Gruver et al [19]. Concerning Pareto optimality, Hypervolume (HV) [62] is adopted to measure how well the methods approximate the Pareto set.
5.1 Image Generation
We follow Liu et al [34]333As demonstrated in Section 3 and Figure 3(b) of their study, an objective that forces the center of generated images to be a black square can be used for constrained sampling on CIFAR10. Accordingly, they obtain samples that lie on the CIFAR10 data manifold and exhibit the black square in the middle, such as “black plane” and “black dog” images which contain a black square (smaller size than the object) in the middle. This task can be considered as image outpainting [59], namely, extrapolating images based on specified color patches on CIFAR10. to optimize CIFAR10 images with the objectives that force the middle of an image to be a specified color square.
(1) Controllable generation on CIFAR10 with two objectives (Fig. 1(b)):
-
, where represents the entire image, and is an image patch in the region , corresponding to the square at the center of the image. Similar to the practical relevance shown in Liu et al [34], this objective is to restrict the center of the generated images to be a white square, which is to sample CIFAR10 images that exhibit white color in their middle. The patch size is set to in the experiment.
-
with the similar setting. This objective is to constrain the center to be a grey square.
The desired generation for these two objectives would be those CIFAR10-like images with patches in normalized RGB color values444RGB values [0, 255] are divided by 255. between [0.5, 0.5, 0.5] (grey) and [1, 1, 1] (white), in the middle, according to Ishibuchi et al [24], Li et al [30]. Please refer to Appendix B for more details.
(2) Controllable generation on CIFAR10 with three objectives:
-
, where represents the entire image, and is an image patch in the region , corresponding to the square at the center of the image. This objective is to restrict the center of the generated images to be a black square. The patch size is set to in the experiment. .
-
with the similar setting. This objective is to constrain the center to be a deep red square. .
-
with the similar setting. This objective is to constrain the center to be a deep yellow square. .
The desired generation for these three objectives would be those CIFAR10-like images with patches in normalized RGB color values belonging to the convex triangle formed by the points [0, 0, 0] (black), [0.5, 0, 0] (deep red) and [0.5, 0.5, 0] (deep yellow). Please refer to Appendix B for more details. We adopt the diffusion model used in Song and Ermon [45] as the backbone for CIFAR10 image generation.
We sample images from our PROUD and other baselines using the same seeds for the sake of comparison. From Fig. 2, we can observe that: (1) our PROUD and two baselines, DM+-MGD and -MGD, can successfully generate harmonious images consistent with the patch-level constraints imposed by two conflicting objectives. Among them, the generated images of our PROUD exhibit better quality than DM+-MGD in some instances, as the latter tends to sacrifice generation quality to excessively meet Pareto optimality of the objectives due to the lack of a mechanism to emphasize the quality of generated samples. (2) -MGD fails to generate satisfactory images consistent with the patch-level constraints, as the new objective (i.e, generation quality) biases the optimization of the original two objectives. Although the Pareto set of the original -objectives resides within that of the -objectives [51], the proportion is negligible even when sampling a large number of images. Refer to Fig. 3(d)&(f) and Fig. 4(d)&(f) for more details. (3) -MGD, which does not consider generative quality in its optimization, generates meaningless images because the optimization of multiple objectives in the data generation task is only meaningful within the data manifold, as image data usually concentrate on low-dimensional manifolds embedded in a high-dimensional space.
For the MOG setting on CIFAR10 optimized with two objectives, we randomly select 1,000 generated images for each method, and calculate their objective values , respectively. Fig. 3 shows that: (1) our PROUD (Fig. 3(a)) and two baselines DM+-MGD (Fig. 3(b)) and -MGD (Fig. 3(e)) successfully generate samples which can cover the entire Pareto front. Among them, our PROUD and -MGD spread more evenly over the Pareto front. (2) DM+single only covers a partial Pareto front as shown in Fig. 3(c), because simply averaging multiple objectives into a single objective fails to explore the trade-off between multiple objectives and leads to insufficient solutions. (3) As discussed in Fig. 2, -MGD explores a much larger solution space (Fig. 3(f)), while only a few of them are located at the Pareto front of the original objectives (Fig. 3 (d)).
For the MOG setting on CIFAR10 optimized with three objectives, we randomly select 5,000 generated images for each method and calculate their objective values [], respectively. Fig. 4 shows that our PROUD exhibits significant superiority in evenly covering the Pareto front under this more challenging setting. This is because our constrained optimization formulation can better coordinate the generation quality and the optimization for multi-objectives, while ensuring sample diversity (Eq.(8), Eq.(16)). Although it is possible to force the two baselines DM+-MGD and -MGD to exhibit better diversity by setting a large diversity coefficient , but this would cause the samples they generate to violate Pareto optimality, as shown in Fig. 8 and Fig. 9 in the Appendix.
Method | CIFAR10 (2-obj) | CIFAR10 (3-obj) | pOAS | |||
HV () | FID | HV () | FID | HV | ProtGPT | |
PROUD (ours) | 5.210.00 | 31.390.05 | 3.260.00 | 44.220.13 | 2472.5560.15 | -645.930.99 |
DM+-MGD | 5.200.01 | 38.720.36 | 3.260.01 | 49.900.14 | 2289.6165.12 | -692.800.34 |
DM+single | 4.770.01 | 36.350.47 | 2.210.00 | 57.770.05 | 2302.2158.25 | -682.260.49 |
-MGD | 5.170.00 | 11.210.10 | 2.870.03 | 11.800.05 | 838.7414.08 | -662.860.76 |
-MGD | 5.210.00 | - | 3.260.01 | - | - | - |
To further demonstrate the superiority of our PROUD on multi-objective generation, we collect the quantitative evaluation for Pareto approximation and image quality in the left part of Table 2 by sampling images. It shows that: our PROUD achieves the best or the second best values in both two metrics, i.e., HV for Pareto approximation and FID for image quality. It demonstrates our claim that our PROUD can provide certain quality assurance for generated samples approaching the Pareto set of multiple properties. On the contrary, either single or multiple objective generation baselines, i.e., DM+single and DM+-MGD, would inevitably sacrifice generation quality to excessively optimize the objectives.
5.2 Protein Sequence Generation
To further verify our model in more challenging applications, we design multiple-objective generation task on the pOAS dataset which aims to optimize two conflicting objectives for antibody sequences:
-
•
, the solvent accessible surface area (SASA) of the protein’s predicted structure. Please refer to Ruffolo et al [39] for detailed procedures of calculating the SASA value using the protein sequences.
-
•
, the percentage of beta sheets (%Sheets), which is measured on protein sequences directly [7].
The ground-truth Pareto front is not available due to the complexity of property objectives. Since the evaluation functions for SASA and %Sheet are not differentiable, we adopt the network predictors as differential surrogate functions for all methods. We apply the ground-truth evaluation functions for calculating the HV values on the generated samples. We adopt the discrete diffusion model in Gruver et al [19] as the backbone for protein sequence generation.
To demonstrate the superiority of our PROUD in multi-objective protein generation, we initially sample protein sequences for each method and collect the non-dominated samples based on their two target properties, as depicted in Fig.5. The observations are as follows: (1) DM+single exhibits a wide coverage of the objective values. This could be attributed to the fact that the noise in discrete diffusion models can bring out large diversity [19]. By incorporating MGD into diffusion models, PROUD and DM+-MGD achieve larger coverage of the objective values. This verifies the superiority of MOG over SOG. Our PROUD and DM+-MGD emphasize respective Pareto improvement of the objectives. Nevertheless, Table 2 shows that our PROUD achieves a better HV. (2) Similar to the image generation task, -MGD demonstrates a much poorer approximation of the Pareto front for the original objectives. Meanwhile, -MGD even fails to generate any valid protein sequences, as the SASA evaluation () for all its generated samples is nonexistent. This further highlights the difference between MOG and MOO.
Furthermore, we collect the quantitative evaluation for Pareto approximation and protein quality in the right part of Table 2 by sampling protein sequence555We only sample protein sequence since the computation cost of SASA values is very high.. Benefiting from our constrained-optimization formulation, our PROUD can avoid unnecessary loss of protein quality compared to other MOG/SOG counterparts, DM+-MGD and DM+single. This improvement will greatly increase the practicality of its generated samples.
Metric | ||||||
FID | 31.58963073 | 31.48232218 | 31.5896311 | 31.58966697 | 31.48232218 | 31.58966696 |
HV | 0.05211343 | 0.05211350 | 0.05211343 | 0.05211343 | 0.05211350 | 0.05211343 |
Metric | ||||||
FID | 34.80 | 30.98 | 31.80 | 31.48 | 31.63 | 33.59 |
HV | 0.0483 | 0.0498 | 0.0521 | 0.0521 | 0.0521 | 0.0521 |
5.3 Hyper-parameter Sensitivity Study
We study PROUD with different configurations of the hyper-parameters, namely, and in Eq.(12) as well as the diversity coefficient in Eq.(16). The experiments are conducted on CIFAR10, with the same setting as Section 5.1.
We set as and as , respectively. We observe in Table 3 that PROUD is not sensitive to the choice of the hyper-parameters and .
We set as 0, 0.1, 0.2, 1. The results are summarized in Fig. 6(a) to Fig. 6(d), showing that: (1) With an appropriate diversity coefficient, our PROUD can well cover the Pareto front. (2) Without the diversity regularization, PROUD can only obtain a small set of Pareto solutions. This demonstrates the necessity of the diversity loss, consistent with the finding in the former work [33]. (3) With a too large value of , the generated samples could fall outside the Pareto front. The effect of the diversity coefficient on DM+-MGD (Fig. 6(e) to Fig. 6(h)) is similar.
To further investigate the effects of the diversity coefficient on the generation quality, we collect FID results in Table 4. With , PROUD obtains both the best FID and HV, which is thus set as the hyper-parameter used in Section 5.1.
To demonstrate that the single-objective generation would fail to cover the Pareto front even with a uniform grid of weighting, we set the weight coefficient for combining two objectives into a single objective in DM+single “” as 0 to 1 with a step . We put the results of 0, 0.1, 0.5, 1 in Fig. 6(i) to Fig. 6(l) and rest in Appendix. With , the single objective is dominated by . Consequently, the generated samples achieve the smallest value for but the largest one for ; vice versa. With an equal weight, the generated samples are supposed to obtain the comprise value between two objectives, i.e., (0.0625, 0.0625). We notice that the generated samples cover a small range around this point. This diversity could result from the diffusion noise in diffusion models.
6 Conclusion
This paper studies the problem of optimizing deep generative models with multiple conflicting objectives. We highlight this problem setting by treating the optimization of samples with multiple properties and the process of sample generation as a unified task. By analyzing the connections and differences from multi-objective optimization, we introduce a constrained optimization formulation to solve the multi-objective generation problem, based on which we developed PROUD. Our experiments demonstrate the efficacy of PROUD in both image and protein sequence generation. While we explored the white-box multi-objectives in this work, it would be interesting to explore our PROUD in the black-box setting in the future. The multiple gradient descent technique used can be replaced by methods such as Bayesian multi-objective optimization [49].
Declarations
Funding.
This work was supported by the Agency for Science, Technology and Research (A*STAR) Centre for Frontier AI Research, the A*STAR GAP project (Grant No.I23D1AG079), and the AISG Grand Challenge in AI for Materials Discovery (Grant No. AISG2-GC-2023-010).
Competing interests.
The authors have no financial or non-financial interests to disclose that are relevant to the content of this article.
Ethics approval.
Not applicable.
Consent to participate.
Not applicable.
Consent to publish.
Not applicable.
Availability of data and materials.
All datasets used in this work are available online and clearly cited.
Code availability.
The code of this work will be publicly released on github.
Authors’ contributions.
Idea: YY; Methodology & Experiment: YY, YP; Writing - comments/edits: all.
References
- \bibcommenthead
- Afshari et al [2019] Afshari H, Hare W, Tesfamariam S (2019) Constrained multi-objective optimization algorithms: Review and comparison with application in reinforced concrete structures. Applied Soft Computing 83:105631. 10.1016/J.ASOC.2019.105631
- Andrieu et al [2003] Andrieu C, De Freitas N, Doucet A, et al (2003) An introduction to mcmc for machine learning. Machine learning 50:5–43. 10.1023/A:1020281327116
- Arjovsky et al [2017] Arjovsky M, Chintala S, Bottou L (2017) Wasserstein generative adversarial networks. In: International Conference on Machine Learning, pp 214–223, URL https://proceedings.mlr.press/v70/arjovsky17a.html
- Borghi et al [2023] Borghi G, Herty M, Pareschi L (2023) An adaptive consensus based method for multi-objective optimization with uniform pareto front approximation. Applied Mathematics & Optimization 88(2):58. 10.1007/s00245-023-10036-y
- Cheng et al [2017] Cheng R, Li M, Tian Y, et al (2017) A benchmark test suite for evolutionary many-objective optimization. Complex & Intelligent Systems 3:67–81. 10.1007/s40747-017-0039-7
- Chinchuluun and Pardalos [2007] Chinchuluun A, Pardalos PM (2007) A survey of recent developments in multiobjective optimization. Annals of Operations Research 154(1):29–50. 10.1007/S10479-007-0186-0
- Cock et al [2009] Cock PJ, Antao T, Chang JT, et al (2009) Biopython: freely available python tools for computational molecular biology and bioinformatics. Bioinformatics 25(11):1422–1423. 10.1093/bioinformatics/btp163
- Dathathri et al [2020] Dathathri S, Madotto A, Lan J, et al (2020) Plug and play language models: A simple approach to controlled text generation. In: International Conference on Learning Representations, URL https://openreview.net/forum?id=H1edEyBKDS
- Deb [2001] Deb K (2001) Multi-objective optimization using evolutionary algorithms, vol 16. John Wiley & Sons
- Demšar [2006] Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. The Journal of Machine learning research 7:1–30. URL http://jmlr.org/papers/v7/demsar06a.html
- Deng et al [2020] Deng Y, Yang J, Chen D, et al (2020) Disentangled and controllable face image generation via 3d imitative-contrastive learning. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 5154–5163, 10.1109/CVPR42600.2020.00520
- Désidéri [2012] Désidéri JA (2012) Multiple-gradient descent algorithm (mgda) for multiobjective optimization. Comptes Rendus Mathematique 350(5-6):313–318. 10.1016/j.crma.2012.03.014
- Désidéri [2018] Désidéri JA (2018) Quasi-riemannian multiple gradient descent algorithm for constrained multiobjective differential optimization. PhD thesis, Inria Sophia-Antipolis; Project-Team Acumes, URL https://inria.hal.science/hal-01740075
- Dhariwal and Nichol [2021] Dhariwal P, Nichol A (2021) Diffusion models beat gans on image synthesis. In: Advances in Neural Information Processing Systems, pp 8780–8794, URL https://proceedings.neurips.cc/paper_files/paper/2021/file/49ad23d1ec9fa4bd8d77d02681df5cfa-Paper.pdf
- Fefferman et al [2016] Fefferman C, Mitter S, Narayanan H (2016) Testing the manifold hypothesis. Journal of the American Mathematical Society 29(4):983–1049. 10.1090/jams/852
- Ferruz et al [2022] Ferruz N, Schmidt S, Höcker B (2022) Protgpt2 is a deep unsupervised language model for protein design. Nature Communications 13(1):4348. 10.1038/s41467-022-32007-7
- Gong et al [2021] Gong C, Liu X, Liu Q (2021) Bi-objective trade-off with dynamic barrier gradient descent. In: Advances in Neural Information Processing Systems, pp 29630–29642, URL https://proceedings.neurips.cc/paper_files/paper/2021/file/f7b027d45fd7484f6d0833823b98907e-Paper.pdf
- Goodfellow et al [2014] Goodfellow IJ, Pouget-Abadie J, Mirza M, et al (2014) Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp 2672–2680, URL https://proceedings.neurips.cc/paper_files/paper/2014/file/5ca3e9b122f61f8f06494c97b1afccf3-Paper.pdf
- Gruver et al [2023] Gruver N, Stanton S, Frey NC, et al (2023) Protein design with guided discrete diffusion. In: Advances in Neural Information Processing Systems, pp 12489–12517, URL https://proceedings.neurips.cc/paper_files/paper/2023/file/29591f355702c3f4436991335784b503-Paper-Conference.pdf
- Guo et al [2020] Guo X, Du Y, Zhao L (2020) Property controllable variational autoencoder via invertible mutual dependence. In: International Conference on Learning Representations, URL https://openreview.net/forum?id=tYxG_OMs9WE
- Heusel et al [2017] Heusel M, Ramsauer H, Unterthiner T, et al (2017) Gans trained by a two time-scale update rule converge to a local nash equilibrium. In: Advances in Neural Information Processing Systems, pp 6629–6640, URL https://proceedings.neurips.cc/paper_files/paper/2017/file/8a1d694707eb0fefe65871369074926d-Paper.pdf
- Ho et al [2020] Ho J, Jain A, Abbeel P (2020) Denoising diffusion probabilistic models. In: Advances in Neural Information Processing Systems, pp 6840–6851, URL https://proceedings.neurips.cc/paper/2020/file/4c5bcfec8584af0d967f1ab10179ca4b-Paper.pdf
- Ishibuchi et al [2008] Ishibuchi H, Tsukamoto N, Nojima Y (2008) Evolutionary many-objective optimization: A short review. In: IEEE Congress on Evolutionary Computation, pp 2419–2426, 10.1109/CEC.2008.4631121
- Ishibuchi et al [2013] Ishibuchi H, Yamane M, Akedo N, et al (2013) Many-objective and many-variable test problems for visual examination of multiobjective search. In: IEEE Congress on Evolutionary Computation, pp 1491–1498, 10.1109/CEC.2013.6557739
- Jain et al [2023] Jain M, Raparthy SC, Hernández-Garcıa A, et al (2023) Multi-objective gflownets. In: International Conference on Machine Learning, pp 14631–14653, URL https://proceedings.mlr.press/v202/jain23a.html
- Jin et al [2020] Jin W, Barzilay R, Jaakkola T (2020) Multi-objective molecule generation using interpretable substructures. In: International Conference on Machine Learning, pp 4849–4859, URL http://proceedings.mlr.press/v119/jin20b.html
- Kingma and Welling [2014] Kingma DP, Welling M (2014) Auto-encoding variational bayes. In: International Conference on Learning Representations, URL https://openreview.net/forum?id=33X9fd2-9FyZd
- Klys et al [2018] Klys J, Snell J, Zemel R (2018) Learning latent subspaces in variational autoencoders. In: Advances in Neural Information Processing Systems, pp 6445–6455, URL https://proceedings.neurips.cc/paper_files/paper/2018/file/73e5080f0f3804cb9cf470a8ce895dac-Paper.pdf
- Krizhevsky et al [2009] Krizhevsky A, Hinton G, et al (2009) Learning multiple layers of features from tiny images URL https://www.cs.utoronto.ca/~kriz/learning-features-2009-TR.pdf
- Li et al [2017] Li M, Grosan C, Yang S, et al (2017) Multiline distance minimization: A visualized many-objective test problem suite. IEEE Transactions on Evolutionary Computation 22(1):61–78. 10.1109/TEVC.2017.2655451
- Li et al [2022] Li S, Liu M, Walder C (2022) Editvae: Unsupervised parts-aware controllable 3d point cloud shape generation. In: AAAI Conference on Artificial Intelligence, pp 1386–1394, 10.1609/AAAI.V36I2.20027
- Liao et al [2020] Liao Y, Schwarz K, Mescheder L, et al (2020) Towards unsupervised learning of generative models for 3d controllable image synthesis. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 5871–5880, 10.1109/CVPR42600.2020.00591
- Liu et al [2021a] Liu X, Tong X, Liu Q (2021a) Profiling pareto front with multi-objective stein variational gradient descent. In: Advances in Neural Information Processing Systems, pp 14721–14733, URL https://proceedings.neurips.cc/paper/2021/file/7bb16972da003e87724f048d76b7e0e1-Paper.pdf
- Liu et al [2021b] Liu X, Tong X, Liu Q (2021b) Sampling with trusthworthy constraints: A variational gradient framework. In: Advances in Neural Information Processing Systems, pp 23557–23568, URL https://papers.nips.cc/paper/2021/file/c61aed648da48aa3893fb3eaadd88a7f-Paper.pdf
- McInnes et al [2018] McInnes L, Healy J, Melville J (2018) Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint URL http://arxiv.org/abs/1802.03426
- Olsen et al [2022] Olsen TH, Boyles F, Deane CM (2022) Observed antibody space: A diverse database of cleaned, annotated, and translated unpaired and paired antibody sequences. Protein Science 31(1):141–146. 10.1002/pro.4205
- Papamakarios et al [2021] Papamakarios G, Nalisnick E, Rezende DJ, et al (2021) Normalizing flows for probabilistic modeling and inference. Journal of Machine Learning Research 22(57):1–64. URL http://jmlr.org/papers/v22/19-1028.html
- Roweis and Saul [2000] Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326. 10.1126/science.290.5500.2323
- Ruffolo et al [2023] Ruffolo JA, Chu LS, Mahajan SP, et al (2023) Fast, accurate antibody structure prediction from deep learning on massive set of natural antibodies. Nature Communications 14(1):2389. 10.5281/zenodo.7709609
- Sanchez-Lengeling and Aspuru-Guzik [2018] Sanchez-Lengeling B, Aspuru-Guzik A (2018) Inverse molecular design using machine learning: Generative models for matter engineering. Science 361(6400):360–365. 10.1126/science.aat2663
- Sener and Koltun [2018] Sener O, Koltun V (2018) Multi-task learning as multi-objective optimization. In: Advances in Neural Information Processing Systems, pp 525–536, URL https://proceedings.neurips.cc/paper/2018/file/432aca3a1e345e339f35a30c8f65edce-Paper.pdf
- Shen et al [2023] Shen MW, Bengio E, Hajiramezanali E, et al (2023) Towards understanding and improving gflownet training. In: International Conference on Machine Learning, pp 30956–30975, URL https://proceedings.mlr.press/v202/shen23a.html
- Sohl-Dickstein et al [2015] Sohl-Dickstein J, Weiss E, Maheswaranathan N, et al (2015) Deep unsupervised learning using nonequilibrium thermodynamics. In: International Conference on Machine Learning, pp 2256–2265, URL http://proceedings.mlr.press/v37/sohl-dickstein15.html
- Song and Ermon [2019] Song Y, Ermon S (2019) Generative modeling by estimating gradients of the data distribution. In: Advances in Neural Information Processing Systems, pp 11918–11930, URL https://proceedings.neurips.cc/paper_files/paper/2019/file/3001ef257407d5a371a96dcd947c7d93-Paper.pdf
- Song and Ermon [2020] Song Y, Ermon S (2020) Improved techniques for training score-based generative models. In: Advances in Neural Information Processing Systems, pp 12438–12448, URL https://papers.neurips.cc/paper_files/paper/2020/file/92c3b916311a5517d9290576e3ea37ad-Paper.pdf
- Song and Kingma [2021] Song Y, Kingma DP (2021) How to train your energy-based models. arXiv preprint URL https://arxiv.org/abs/2101.03288
- Song et al [2021a] Song Y, Durkan C, Murray I, et al (2021a) Maximum likelihood training of score-based diffusion models. In: Advances in Neural Information Processing Systems, pp 1415–1428, URL https://papers.nips.cc/paper/2021/file/0a9fdbb17feb6ccb7ec405cfb85222c4-Paper.pdf
- Song et al [2021b] Song Y, Sohl-Dickstein J, Kingma DP, et al (2021b) Score-based generative modeling through stochastic differential equations. In: International Conference on Learning Representations, URL https://openreview.net/forum?id=PxTIG12RRHS
- Stanton et al [2022] Stanton S, Maddox W, Gruver N, et al (2022) Accelerating bayesian optimization for biological sequence design with denoising autoencoders. In: International Conference on Machine Learning, pp 20459–20478, URL https://proceedings.mlr.press/v162/stanton22a.html
- Tagasovska et al [2022] Tagasovska N, Frey NC, Loukas A, et al (2022) A pareto-optimal compositional energy-based model for sampling and optimization of protein sequences. In: NeurIPS 2022 Workshop AI for Science: Progress and Promises, URL https://openreview.net/forum?id=U2rNXaTTXPQ
- Tanabe and Ishibuchi [2020] Tanabe R, Ishibuchi H (2020) An easy-to-use real-world multi-objective optimization problem suite. Applied Soft Computing 89:106078. 10.1016/J.ASOC.2020.106078
- Van Veldhuizen et al [1998] Van Veldhuizen DA, Lamont GB, et al (1998) Evolutionary computation and convergence to a pareto front. In: Late breaking papers at the genetic programming 1998 conference, pp 221–228, URL https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=f329eb18a4549daa83fae28043d19b83fe8356fa
- Wang et al [2022] Wang S, Guo X, Lin X, et al (2022) Multi-objective deep data generation with correlated property control. In: Advances in Neural Information Processing Systems, pp 28889–28901, URL https://proceedings.neurips.cc/paper_files/paper/2022/file/b9c2e8a0bbed5fcfaf62856a3a719ada-Paper-Conference.pdf
- Wang et al [2024] Wang S, Du Y, Guo X, et al (2024) Controllable data generation by deep learning: A review. ACM Comput Surv 56(9). 10.1145/3648609
- Wang et al [2023] Wang Z, Zhao L, Xing W (2023) Stylediffusion: Controllable disentangled style transfer via diffusion models. In: IEEE/CVF International Conference on Computer Vision, pp 7677–7689, 10.1109/ICCV51070.2023.00706
- Watson et al [2023] Watson JL, Juergens D, Bennett NR, et al (2023) De novo design of protein structure and function with rfdiffusion. Nature 620(7976):1089–1100. 10.1038/s41586-023-06415-8
- Welling and Teh [2011] Welling M, Teh YW (2011) Bayesian learning via stochastic gradient langevin dynamics. In: International Conference on Machine Learning, pp 681–688, URL https://icml.cc/2011/papers/398_icmlpaper.pdf
- Yang et al [2023] Yang L, Zhang Z, Song Y, et al (2023) Diffusion models: A comprehensive survey of methods and applications. ACM Computing Surveys 56(4):1–39. 10.1145/3626235
- Yao et al [2022] Yao K, Gao P, Yang X, et al (2022) Outpainting by queries. In: European Conference on Computer Vision, pp 153–169, 10.1007/978-3-031-20050-2_10
- Ye and Liu [2022] Ye M, Liu Q (2022) Pareto navigation gradient descent: a first-order algorithm for optimization in pareto set. In: Uncertainty in Artificial Intelligence, pp 2246–2255, URL https://proceedings.mlr.press/v180/ye22a.html
- Zhang et al [2023] Zhang S, Qian Z, Huang K, et al (2023) Robust generative adversarial network. Machine Learning 112:5135–5161. 10.1007/s10994-023-06367-0
- Zitzler and Thiele [1999] Zitzler E, Thiele L (1999) Multiobjective evolutionary algorithms: a comparative case study and the strength pareto approach. IEEE transactions on Evolutionary Computation 3(4):257–271. 10.1109/4235.797969
Appendix A Complete sensitivity analysis for single-objective generation
We set the weight coefficient for combining two objectives in DM+single “” as 0 to 1 with a step . The results is shown in Fig. 7:
-
•
when , the resultant final objective is dominated by . Consequently, the leading objective is optimized to the best where all the generated samples have the smallest value for but the largest one for .
-
•
when , the resultant final objective is dominated by . Therefore, the generated samples achieve the smallest value for the first objective but the largest one for the second objective.
-
•
when , the generated samples are supposed to obtain the comprise value between and , i.e., (0.0625, 0.0625). We notice that the generated samples cover a small range around this point. This diversity could result from the diffusion noise in diffusion models.
Appendix B More Experimental Settings and Analyses
Image Generation
According to Ishibuchi et al [24], Li et al [30]666Our problem setting is slightly different as we take the distance square in order to obtain a non-linear shape of the Pareto front. We also refer reviewer to example-1 in Liu et al [33] that defines a same two-objective problem but with 1-D decision variable for easy understanding., we can obtain that: (1) the Pareto solutions of the two objective setting are the points on the line between and . Namely, the Pareto solutions are 777We use to denote image patches in normalized RGB color values between [0.5, 0.5, 0.5] (grey) and [1, 1, 1] (white).. When taking images from CIFAR10 based on the Pareto set (Fig. 12), we follow Liu et al [34] to sample images in a small neighborhood around , namely, , where . (2) The Pareto solutions of the three objective setting are the points on the convex polygonal formed by three points . For easy understanding, we assume , which is actually to constrain the middle point of CIFAR10 images to be certain colors.
We visualize the Pareto front of these two settings in Fig. 11. Specifically, for the two objective setting, the Pareto optimal points lie on the line between [1, 1, 1] and [0.5, 0.5, 0.5] (Fig. 11(a)), which physically denote RGB values (normalized, RGB values [0, 255] divided by 255). Then, we calculate the objectives values for these points accordingly, shown in Fig. 11(b). Fig. 11(c) and (d) are plotted for the three objective setting in a similar way. According to their Pareto fronts, we select [0.25, 0.25] and [0.2, 0.1, 0.2] as reference points to calculate the hypervolume (HV) for the two objective setting and the three objective setting in Table 2, respectively.
We sample CIFAR10 image using the constraint with different patch sizes to demonstrate its effect in Fig. 13. With a smaller size of the region , more CIFAR10 images will meet the constraint.
Protein Sequence Generation
Our experiments in Section 5.2 adopted the same dataset and objectives as that in Section 5.2 of Gruver et al [19]. Note that we did not include their other experiments, because the experiment in their Section 5.1 is not a generation task equipped with property optimization and the dataset for the experiment in Section 5.3 and 5.4 has not been released due to private data. We select as a reference point to calculate the HV for this task.
Justification of Our Experiment Designs
Our experiment designs can appropriately justify the motivation of the MOG problem. Both CIFAR10 and protein datasets are real-world datasets whose data lie on low-dimensional manifolds in high-dimensional space [29, 19], thus applicable to our MOG problem setting. Meanwhile, the objectives considered for CIFAR10 are indeed benchmark multi-objective optimization problems with clear evaluations [24]; the objectives considered for the protein design task represent real-world scenarios [19]. Lastly, Fig. 2 and Table 2 demonstrate the necessity of considering generation quality, as the generation quality of all baseline methods suffers to some extent when optimizing multiple properties.
Significant Test
We apply the Friedman test under the null hypothesis positing that all methods perform similarly, alongside the Nemenyi post-hoc test for pairwise comparisons among the four methods [10]. The number of factors was set to four, given the failure of -MGD to produce qualified samples, leading to its exclusion. The dataset comprised 30 instances, with each of the four methods independently evaluated five times across three datasets, employing two evaluation criteria. The Friedman test shows that , greater than the critical value when . Therefore, the null hypothesis is rejected, which signifies a statistically significant difference among the four methods at the significance level of 0.05. Subsequent analysis via the Nemenyi post-hoc test in Fig. 14 unequivocally demonstrates that our PROUD exhibits marked superiority over the three baseline methods.
Appendix C Discussions
The constrained MOO problem defines its decision space on a constrained space expressed using specified linear, nonlinear, or box constraints [1, 13] in . Consequently, it is different from our MOG problems, whose manifold is delineated by a given dataset . Nevertheless, MOG problems could be understood as a type of constrained MOO problem in a broader context.
objectives | decision/data space | generation quality | |||||
MOO | ✗ | ||||||
|
|
✗ | |||||
MOG | ✓ |