Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Advertisement

PROUD: PaRetO-gUided diffusion model for multi-objective generation

  • Published:
Machine Learning Aims and scope Submit manuscript

Abstract

Recent advancements in the realm of deep generative models focus on generating samples that satisfy multiple desired properties. However, prevalent approaches optimize these property functions independently, thus omitting the trade-offs among them. In addition, the property optimization is often improperly integrated into the generative models, resulting in an unnecessary compromise on generation quality (i.e., the quality of generated samples). To address these issues, we formulate a constrained optimization problem. It seeks to optimize generation quality while ensuring that generated samples reside at the Pareto front of multiple property objectives. Such a formulation enables the generation of samples that cannot be further improved simultaneously on the conflicting property functions and preserves good quality of generated samples.Building upon this formulation, we introduce the ParetO-gUided Diffusion model (PROUD), wherein the gradients in the denoising process are dynamically adjusted to enhance generation quality while the generated samples adhere to Pareto optimality. Experimental evaluations on image generation and protein generation tasks demonstrate that our PROUD consistently maintains superior generation quality while approaching Pareto optimality across multiple property functions compared to various baselines

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Availability of data and materials

All datasets used in this work are available online and clearly cited.

Code availability

The code of this work is available at https://github.com/EvaFlower/Pareto-guided-diffusion-model.

Notes

  1. This relates to the manifold hypothesis that many real-world high-dimensional datasets lie on low-dimensional latent manifolds in the high-dimensional space (Fefferman et al., 2016).

  2. In other words, the generated samples is as realistic as samples in the given dataset X.

    We have checked all citations and DOIs and ensured that they are existent, true and duplicate-free.

  3. As demonstrated in Sect. 3 and Fig. 3b of their study, an objective that forces the center of generated images to be a black square can be used for constrained sampling on CIFAR10. Accordingly, they obtain samples that lie on the CIFAR10 data manifold and exhibit the black square in the middle, such as “black plane” and “black dog” images which contain a black square (smaller size than the object) in the middle. This task can be considered as image outpainting (Yao et al., 2022), namely, extrapolating images based on specified color patches on CIFAR10.

  4. RGB values [0, 255] are divided by 255.

  5. We only sample 5, 000 protein sequence since the computation cost of SASA values is very high.

  6. Our problem setting is slightly different as we take the distance square in order to obtain a non-linear shape of the Pareto front. We also refer reviewer to example-1 in Liu et al. (2021a) that defines a same two-objective problem but with 1-D decision variable for easy understanding.

  7. We use \([0.5_{\Omega }, 1_{\Omega }]\) to denote image patches in normalized RGB color values between [0.5, 0.5, 0.5] (grey) and [1, 1, 1] (white).

References

Download references

Funding

This work was supported by the National Research Foundation, Singapore and DSO National Laboratories under the AI Singapore Programme (AISG Award No: AISG2-GC-2023-010-T), the A*STAR GAP project (Grant No. I23D1AG079), the A*STAR Career Development Fund (Grant No. C222812019), the A*STAR Pitchfest for ECR 232D800027, the A*STAR Centre for Frontier AI Research, the Program for Guangdong Introducing Innovative and Entrepreneurial Teams (Grant No. 2017ZT07X386), NSFC (Grant No. 62250710682), and the Program for Guangdong Provincial Key Laboratory (Grant No. 2020B121201001).

Author information

Authors and Affiliations

Authors

Contributions

Idea: YY; Methodology and Experiment: YY, YP; Writing—comments/edits: all.

Corresponding author

Correspondence to Yuangang Pan.

Ethics declarations

Conflict of interest

The authors have no financial or non-financial interests to disclose that are relevant to the content of this article.

Ethics approval

Not applicable.

Consent to participate.

Not applicable.

Consent to publish

Not applicable.

Additional information

Editor: Myra Spiliopoulou.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Complete sensitivity analysis for single-objective generation

We set the weight coefficient w for combining two objectives in DM+single “\(w\times f_1(x)+(1-w)\times f_2(x)\)” as 0 to 1 with a step 0.1. The results is shown in Fig. 7:

  • when \(w < 0.5\), the resultant final objective is dominated by \(f_2(x)\). Consequently, the leading objective is optimized to the best where all the generated samples have the smallest value for \(f_2(x)\) but the largest one for \(f_1(x)\).

  • when \(w > 0.5\), the resultant final objective is dominated by \(f_1(x)\). Therefore, the generated samples achieve the smallest value for the first objective but the largest one for the second objective.

  • when \(w = 0.5 =\frac{1}{m}\), the generated samples are supposed to obtain the comprise value between \(f_1(x)\) and \(f_2(x)\), i.e., (0.0625, 0.0625). We notice that the generated samples cover a small range around this point. This diversity could result from the diffusion noise in diffusion models (Figs. 8, 9, 10).

Fig. 7
figure 7

Sensitivity analysis on the weight coefficient for combining two objectives, i.e., \((1-w)f_1 + wf_2\) in DM+single. The depth of color represents sample density, the deeper the higher

Fig. 8
figure 8

Different diversity coefficient \(\gamma\) for DM+m-MGD on CIFAR10 optimized with three objectives. 1000 generated samples are randomly selected for visualization

Fig. 9
figure 9

Different diversity coefficient \(\gamma\) for m-MGD on CIFAR10 optimized with three objectives. 1000 generated samples are randomly selected for visualization

Fig. 10
figure 10

Approximation of Pareto front of various methods on CIFAR10 optimized with three objectives. The first row presents 50,000 generated samples while the second row presents non-dominated points out of 50,000 sample points, verifying the HV results obtained in Table 2

Appendix B: More experimental settings and analyses

Image Generation

According to Ishibuchi et al. (2013); Li et al. (2017)Footnote 6, we can obtain that: (1) the Pareto solutions of the two objective setting are the points on the line between \(1_{\Omega }\) and \(0.5_{\Omega }\). Namely, the Pareto solutions are \(\{x|x_{\Omega }=\kappa _{\Omega }, \kappa _{\Omega } \in [0.5_{\Omega }, 1_{\Omega }]\}\).Footnote 7 When taking images from CIFAR10 based on the Pareto set (Fig. 12), we follow Liu et al. (2021b) to sample images in a small neighborhood around \(\kappa _{\Omega }\), namely, \(\Vert x_\Omega -\kappa _\Omega \Vert _2^2 \le \epsilon\), where \(\epsilon =8\times 10^{-4}\). (2) The Pareto solutions of the three objective setting are the points on the convex polygonal formed by three points \(a_{\Omega }, b_{\Omega }, c_{\Omega }\). For easy understanding, we assume \(\Omega =3\times 1\times 1\), which is actually to constrain the middle point of CIFAR10 images to be certain colors.

We visualize the Pareto front of these two settings in Fig. 11. Specifically, for the two objective setting, the Pareto optimal points lie on the line between [1, 1, 1] and [0.5, 0.5, 0.5] (Fig. 11a), which physically denote RGB values (normalized, RGB values [0, 255] divided by 255). Then, we calculate the objectives values \([f_1(x), f_2(x)]\) for these points accordingly, shown in Fig. 11b. Figure 11c, d are plotted for the three objective setting in a similar way. According to their Pareto fronts, we select [0.25, 0.25] and [0.2, 0.1, 0.2] as reference points to calculate the hypervolume (HV) for the two objective setting and the three objective setting in Table 2, respectively.

We sample CIFAR10 image using the constraint with different patch sizes to demonstrate its effect in Fig. 13. With a smaller size of the region \(\Omega\), more CIFAR10 images will meet the constraint.

Fig. 11
figure 11

Pareto front of two and three objectives in data space and functionality space optimized for CIFAR10 image generation

Protein Sequence Generation

Our experiments in Section 5.2 adopted the same dataset and objectives as that in Section 5.2 of Gruver et al. (2023). Note that we did not include their other experiments, because the experiment in their Section 5.1 is not a generation task equipped with property optimization and the dataset for the experiment in Section 5.3 and 5.4 has not been released due to private data. We select \([1\times 10^4, 0]\) as a reference point to calculate the HV for this task.

Justification of Our Experiment Designs

Our experiment designs can appropriately justify the motivation of the MOG problem. Both CIFAR10 and protein datasets are real-world datasets whose data lie on low-dimensional manifolds in high-dimensional space (Krizhevsky and Hinton, 2009; Gruver et al., 2023), thus applicable to our MOG problem setting. Meanwhile, the objectives considered for CIFAR10 are indeed benchmark multi-objective optimization problems with clear evaluations (Ishibuchi et al., 2013); the objectives considered for the protein design task represent real-world scenarios (Gruver et al., 2023). Lastly, Fig. 2 and Table 2 demonstrate the necessity of considering generation quality, as the generation quality of all baseline methods suffers to some extent when optimizing multiple properties.

Significant Test

We apply the Friedman test under the null hypothesis positing that all methods perform similarly, alongside the Nemenyi post-hoc test for pairwise comparisons among the four methods (Demšar, 2006). The number of factors was set to four, given the failure of m-MGD to produce qualified samples, leading to its exclusion. The dataset comprised 30 instances, with each of the four methods independently evaluated five times across three datasets, employing two evaluation criteria. The Friedman test shows that \(\tau _F=18.24\), greater than the critical value \(F_{3,87} = 2.709\) when \(\alpha = 0.05\). Therefore, the null hypothesis is rejected, which signifies a statistically significant difference among the four methods at the significance level of 0.05. Subsequent analysis via the Nemenyi post-hoc test in Fig. 14 unequivocally demonstrates that our PROUD exhibits marked superiority over the three baseline methods.

Fig. 12
figure 12

Full resolution CIFAR10 images (\(3\times 32\times 32\)) in Fig. 1b of the manuscript. The red box denotes the region \(\Omega\) (\(3\times 8\times 8\)) in the two objectives in Sect. 5.1

Fig. 13
figure 13

Sampling CIFAR-10 images with regions of different patch sizes

Fig. 14
figure 14

Nemenyi post-hoc test over four methods

Appendix C: Discussions

The constrained MOO problem defines its decision space S on a constrained space expressed using specified linear, nonlinear, or box constraints (Afshari et al., 2019; Désidéri, 2018) in \(\mathbb {R}^d\). Consequently, it is different from our MOG problems, whose manifold is delineated by a given dataset \(\mathcal {X}\). Nevertheless, MOG problems could be understood as a type of constrained MOO problem in a broader context (Table 5).

Table 5 Comparison of the MOG problem with the relevant MOO problems

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yao, Y., Pan, Y., Li, J. et al. PROUD: PaRetO-gUided diffusion model for multi-objective generation. Mach Learn 113, 6511–6538 (2024). https://doi.org/10.1007/s10994-024-06575-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10994-024-06575-2

Keywords