Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

HTML conversions sometimes display errors due to content that did not convert correctly from the source. This paper uses the following packages that are not yet supported by the HTML conversion tool. Feedback on these issues are not necessary; they are known and are being worked on.

  • failed: axessibility

Authors: achieve the best HTML results from your LaTeX submissions by following these best practices.

License: arXiv.org perpetual non-exclusive license
arXiv:2311.00318v2 [cs.LG] 18 Mar 2024

Flooding Regularization for Stable Training of
Generative Adversarial Networks

Iu Yahiro The University of Tokyo, Bunkyo-ku, Tokyo, Japan Takashi Ishida RIKEN AIP, Chuo-ku, Tokyo, Japan The University of Tokyo, Bunkyo-ku, Tokyo, Japan Naoto Yokoya The University of Tokyo, Bunkyo-ku, Tokyo, Japan RIKEN AIP, Chuo-ku, Tokyo, Japan
Abstract

Generative Adversarial Networks (GANs) have shown remarkable performance in image generation. However, GAN training suffers from the problem of instability. One of the main approaches to address this problem is to modify the loss function, often using regularization terms in addition to changing the type of adversarial losses. This paper focuses on directly regularizing the adversarial loss function. We propose a method that applies flooding, an overfitting suppression method in supervised learning, to GANs to directly prevent the discriminator’s loss from becoming excessively low. Flooding requires tuning the flood level, but when applied to GANs, we propose that the appropriate range of flood level settings is determined by the adversarial loss function, supported by theoretical analysis of GANs using the binary cross entropy loss. We experimentally verify that flooding stabilizes GAN training and can be combined with other stabilization techniques. We also show that by restricting the discriminator’s loss to be no less than the flood level, the training proceeds stably even when the flood level is somewhat high.

Keywords GANs  \cdot Flooding  \cdot Regularization

1 Introduction

Generative Adversarial Networks (GANs) are one of the learning frameworks for generative models proposed by Goodfellow et al. [9], and they have shown remarkable performance in a wide range of image generation tasks [25, 42, 19, 6, 5]. GANs are based on a training strategy where two models, a generator G𝐺Gitalic_G and a discriminator D𝐷Ditalic_D, are trained adversarily. The generator takes a noise vector z𝑧zitalic_z sampled from a known distribution zsubscript𝑧\mathbb{P}_{z}blackboard_P start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT (usually the standard normal distribution) as input and produces generated data G(𝒛)𝐺𝒛G(\bm{z})italic_G ( bold_italic_z ) as output. The discriminator D𝐷Ditalic_D takes either real data sampled from the target underlying distribution rsubscript𝑟\mathbb{P}_{r}blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT or generated data as input 𝒙𝒙\bm{x}bold_italic_x and outputs the probability D(𝒙)𝐷𝒙D(\bm{x})italic_D ( bold_italic_x ) that the input is real data. The discriminator aims to correctly distinguish between real and generated data, while the generator aims to reduce the discriminator’s performance on generated data. By designing the loss function in this way, Jensen-Shannon divergence of generated data distribution gsubscript𝑔\mathbb{P}_{g}blackboard_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT and rsubscript𝑟\mathbb{P}_{r}blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT is minimized [9]. Goodfellow et al. [9] defined this adversarial structure using min-max formulation of the following value function V𝑉Vitalic_V, involving the generator G𝐺Gitalic_G and the discriminator D𝐷Ditalic_D:

min𝐺max𝐷V(D,G)=𝔼𝒙r[log(D(𝒙))]+𝔼𝒛z[log(1D(G(𝒛))].\begin{split}\underset{G}{\min}\ \underset{D}{\max}\ V(D,G)=&\mathbb{E}_{\bm{x% }\sim{\mathbb{P}_{r}}}[\log(D(\bm{x}))]+\mathbb{E}_{\bm{z}\sim{\mathbb{P}_{z}}% }[\log(1-D(G(\bm{z}))].\end{split}start_ROW start_CELL underitalic_G start_ARG roman_min end_ARG underitalic_D start_ARG roman_max end_ARG italic_V ( italic_D , italic_G ) = end_CELL start_CELL blackboard_E start_POSTSUBSCRIPT bold_italic_x ∼ blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ roman_log ( italic_D ( bold_italic_x ) ) ] + blackboard_E start_POSTSUBSCRIPT bold_italic_z ∼ blackboard_P start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ roman_log ( 1 - italic_D ( italic_G ( bold_italic_z ) ) ] . end_CELL end_ROW (1)
(a) Ideal training
(b) Training collapse
Refer to caption
Refer to caption
Refer to caption
(a) Ideal training
(b) Training collapse
(c) Effect of flooding
Figure 1: Illustration of the discriminator’s loss (LDsubscript𝐿𝐷L_{D}italic_L start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT) progression in GANs and the effect of flooding. In ideal training (a), the discriminator’s loss rises steadily to loss at the convergence (dotted line), and the training will converge. On the other hand, when the training collapses (b), the discriminator’s loss falls sharply and then stays at a very low value. By applying flooding with flood level b𝑏bitalic_b, we can suppress the rapid decline in the discriminator’s loss, thereby stabilizing the training.

Although GANs successfully tackle image generation with this approach, GAN training suffers from instability. Previous work [1, 28] pointed out that the training is unstable because of the instability of the discriminator, the discriminator’s loss becomes excessively low, and the discriminator overwhelms the generator. In training, the generator is updated based on the discriminator’s predictions. However, Arjovsky et al. [1] showed theoretically that if the discriminator is always optimal, it leads to a vanishing gradient of the generator and unstable training.

Previous research has proposed methods to solve the instability. These can be categorized as changes in adversarial losses, regularization, and architectural changes [36]. The adversarial loss is the loss that creates an adversarial structure between the generator and discriminator. For example, GANs originally used the binary cross entropy loss (BCE loss) as the adversarial loss based on the min-max formula (1). However, previous research [21, 22, 2] showed that using the BCE loss causes instability and proposed replacement of the BCE loss.

Another main approach is the addition of a regularization term to the adversarial loss that leads to training stabilization. It can be combined with changes in adversarial losses without affecting the theoretical convergence (g=rsubscript𝑔subscript𝑟\mathbb{P}_{g}=\mathbb{P}_{r}blackboard_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT = blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT). For example, gradient penalty [10] adds a regularization term that keeps the gradient norm close to 1, which prevents gradient explosion and vanishing. WGAN-GP [10] uses it to enforce Lipschitz constraint. The loss functions of the discriminator and generator can be divided into the adversarial loss and the addition of regularization terms as

LD=LD,adv+iλD,iLD,aux,i,LG=LG,adv+iλG,iLG,aux,i,formulae-sequencesubscript𝐿𝐷subscript𝐿𝐷advsubscript𝑖subscript𝜆𝐷𝑖subscript𝐿𝐷aux𝑖subscript𝐿𝐺subscript𝐿𝐺advsubscript𝑖subscript𝜆𝐺𝑖subscript𝐿𝐺aux𝑖\begin{split}L_{D}&=L_{D,\mathrm{adv}}+\sum_{i}\lambda_{D,i}L_{D,\mathrm{aux},% i},\ L_{G}=L_{G,\mathrm{adv}}+\sum_{i}\lambda_{G,i}L_{G,\mathrm{aux},i},\end{split}start_ROW start_CELL italic_L start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT end_CELL start_CELL = italic_L start_POSTSUBSCRIPT italic_D , roman_adv end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_λ start_POSTSUBSCRIPT italic_D , italic_i end_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT italic_D , roman_aux , italic_i end_POSTSUBSCRIPT , italic_L start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT = italic_L start_POSTSUBSCRIPT italic_G , roman_adv end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_λ start_POSTSUBSCRIPT italic_G , italic_i end_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT italic_G , roman_aux , italic_i end_POSTSUBSCRIPT , end_CELL end_ROW (2)

where LD,advsubscript𝐿𝐷advL_{D,\mathrm{adv}}italic_L start_POSTSUBSCRIPT italic_D , roman_adv end_POSTSUBSCRIPT and LG,advsubscript𝐿𝐺advL_{G,\mathrm{adv}}italic_L start_POSTSUBSCRIPT italic_G , roman_adv end_POSTSUBSCRIPT are the adversarial losses for the discriminator and generator, respectively, and LD,aux,isubscript𝐿𝐷aux𝑖L_{D,\mathrm{aux},i}italic_L start_POSTSUBSCRIPT italic_D , roman_aux , italic_i end_POSTSUBSCRIPT and LG,aux,isubscript𝐿𝐺aux𝑖L_{G,\mathrm{aux},i}italic_L start_POSTSUBSCRIPT italic_G , roman_aux , italic_i end_POSTSUBSCRIPT are the i𝑖iitalic_ith regularization terms with the weight coefficients λD,isubscript𝜆𝐷𝑖\lambda_{D,i}italic_λ start_POSTSUBSCRIPT italic_D , italic_i end_POSTSUBSCRIPT and λG,isubscript𝜆𝐺𝑖\lambda_{G,i}italic_λ start_POSTSUBSCRIPT italic_G , italic_i end_POSTSUBSCRIPT. The adversarial losses are calculated using the discriminator outputs for real or generated data as

LD,adv=LD,real+LD,fake=𝔼𝒙r[fD,real(𝒙)]+𝔼𝒙g[fD,fake(𝒙)],LG,adv=𝔼𝒙g[fG(𝒙)],formulae-sequencesubscript𝐿𝐷advsubscript𝐿𝐷realsubscript𝐿𝐷fakesubscript𝔼similar-to𝒙subscript𝑟delimited-[]subscript𝑓𝐷real𝒙subscript𝔼similar-to𝒙subscript𝑔delimited-[]subscript𝑓𝐷fake𝒙subscript𝐿𝐺advsubscript𝔼similar-to𝒙subscript𝑔delimited-[]subscript𝑓𝐺𝒙\begin{split}L_{D,\mathrm{adv}}&=L_{D,\text{real}}+L_{D,\text{fake}}=\mathbb{E% }_{\bm{x}\sim{\mathbb{P}_{r}}}[f_{D,\text{real}}(\bm{x})]+\mathbb{E}_{\bm{x}% \sim{\mathbb{P}_{g}}}[f_{D,\text{fake}}(\bm{x})],L_{G,\mathrm{adv}}=\mathbb{E}% _{\bm{x}\sim{\mathbb{P}_{g}}}[f_{G}(\bm{x})],\end{split}start_ROW start_CELL italic_L start_POSTSUBSCRIPT italic_D , roman_adv end_POSTSUBSCRIPT end_CELL start_CELL = italic_L start_POSTSUBSCRIPT italic_D , real end_POSTSUBSCRIPT + italic_L start_POSTSUBSCRIPT italic_D , fake end_POSTSUBSCRIPT = blackboard_E start_POSTSUBSCRIPT bold_italic_x ∼ blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_f start_POSTSUBSCRIPT italic_D , real end_POSTSUBSCRIPT ( bold_italic_x ) ] + blackboard_E start_POSTSUBSCRIPT bold_italic_x ∼ blackboard_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_f start_POSTSUBSCRIPT italic_D , fake end_POSTSUBSCRIPT ( bold_italic_x ) ] , italic_L start_POSTSUBSCRIPT italic_G , roman_adv end_POSTSUBSCRIPT = blackboard_E start_POSTSUBSCRIPT bold_italic_x ∼ blackboard_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_f start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ( bold_italic_x ) ] , end_CELL end_ROW (3)

where LD,realsubscript𝐿𝐷realL_{D,\text{real}}italic_L start_POSTSUBSCRIPT italic_D , real end_POSTSUBSCRIPT and LD,fakesubscript𝐿𝐷fakeL_{D,\text{fake}}italic_L start_POSTSUBSCRIPT italic_D , fake end_POSTSUBSCRIPT are the discriminator’s losses for real and generated data, respectively, and fD,realsubscript𝑓𝐷realf_{D,\text{real}}italic_f start_POSTSUBSCRIPT italic_D , real end_POSTSUBSCRIPT, fD,fakesubscript𝑓𝐷fakef_{D,\text{fake}}italic_f start_POSTSUBSCRIPT italic_D , fake end_POSTSUBSCRIPT, and fGsubscript𝑓𝐺f_{G}italic_f start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT are functions from the discriminator’s outputs to the losses. Some studies further improved the stability by combining changes in the type of adversarial losses with the addition of regularization terms. For example, WGAN-GP [10] improves performance with gradient penalty as a regularization term and Wasserstein loss [2] as the adversarial loss. However, adding regularization terms requires tuning λD,aux,isubscript𝜆𝐷𝑎𝑢𝑥𝑖\lambda_{D,aux,i}italic_λ start_POSTSUBSCRIPT italic_D , italic_a italic_u italic_x , italic_i end_POSTSUBSCRIPT and λG,aux,isubscript𝜆𝐺𝑎𝑢𝑥𝑖\lambda_{G,aux,i}italic_λ start_POSTSUBSCRIPT italic_G , italic_a italic_u italic_x , italic_i end_POSTSUBSCRIPT.

This paper proposes a direct regularization technique for the adversarial loss values to stabilize GAN training. We explore a new technique that directly prevents the discriminator from becoming too accurate and taking too low loss. This low-loss state can be regarded as the discriminator’s overfitting to the current distribution. Therefore, we propose the application of flooding [13], a method for preventing overfitting in supervised image classification, to GANs. In order to prevent an excessive decrease in classification loss L𝐿Litalic_L when the prediction model overfits, flooding recalculates loss hhitalic_h given by

h(L,b)=|Lb|+b.𝐿𝑏𝐿𝑏𝑏\begin{split}h(L,b)=|L-b|+b.\end{split}start_ROW start_CELL italic_h ( italic_L , italic_b ) = | italic_L - italic_b | + italic_b . end_CELL end_ROW (4)

Here, b𝑏bitalic_b is called the flood level. Due to the absolute value, the gradient will be flipped when the loss becomes smaller than b𝑏bitalic_b, preventing the overfitting. Note that the adding back b𝑏bitalic_b of Eq. (4) does not affect the gradient but ensures that h(L,b)=L𝐿𝑏𝐿h(L,b)=Litalic_h ( italic_L , italic_b ) = italic_L where Lb𝐿𝑏L\geq bitalic_L ≥ italic_b [13]. The flood level is a hyperparameter that requires tuning. Our contributions are as follows.

  1. 1.

    We propose to apply flooding, a simple method that prevents overfitting in supervised learning, to GAN training. It stabilizes the training process, as depicted in Figure 1.

  2. 2.

    Unlike in supervised learning, the discriminator’s losses at the training convergence are uniquely determined, as proven theoretically. We introduce a novel approach to set the flood level based on the losses at the training convergence.

  3. 3.

    We demonstrate that flooding stabilizes GAN training experimentally. We also show that flooding is effective when the flood level is not too low and in combination with existing stabilization methods, such as changes in adversarial loss type and architecture. Furthermore, we demonstrate that applying flooding only to either LD,realsubscript𝐿𝐷realL_{D,\text{real}}italic_L start_POSTSUBSCRIPT italic_D , real end_POSTSUBSCRIPT or LD,fakesubscript𝐿𝐷fakeL_{D,\text{fake}}italic_L start_POSTSUBSCRIPT italic_D , fake end_POSTSUBSCRIPT significantly impacts performance, indicating whether the discriminator overfits real or generated data.

2 Related Work

We first review stabilization methods for GANs. Next, we show applications of adversarial architectures and methods for overfitting in supervised learning.

2.1 Stabilization methods for training GANs

There are three categories in stabilization methods for the GAN training: change of adversarial loss type, regularization, and change of architectures [36].

Adversarial loss fD,realsubscript𝑓𝐷realf_{D,\text{real}}italic_f start_POSTSUBSCRIPT italic_D , real end_POSTSUBSCRIPT fD,fakesubscript𝑓𝐷fakef_{D,\text{fake}}italic_f start_POSTSUBSCRIPT italic_D , fake end_POSTSUBSCRIPT fGsubscript𝑓𝐺f_{G}italic_f start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT
BCE loss [9] log(D(𝒙))𝐷𝒙-\log(D(\bm{x}))- roman_log ( italic_D ( bold_italic_x ) ) log(1D(G(𝒛)))1𝐷𝐺𝒛-\log(1-D(G(\bm{z})))- roman_log ( 1 - italic_D ( italic_G ( bold_italic_z ) ) ) log(1D(G(𝒛))\log(1-D(G(\bm{z}))roman_log ( 1 - italic_D ( italic_G ( bold_italic_z ) )
BCE loss (non-saturating) [9] log(D(𝒙))𝐷𝒙-\log(D(\bm{x}))- roman_log ( italic_D ( bold_italic_x ) ) log(1D(G(𝒛)))1𝐷𝐺𝒛-\log(1-D(G(\bm{z})))- roman_log ( 1 - italic_D ( italic_G ( bold_italic_z ) ) ) log(D(G(𝒛))-\log(D(G(\bm{z}))- roman_log ( italic_D ( italic_G ( bold_italic_z ) )
Wasserstein loss [2] D(𝒙)𝐷𝒙-D(\bm{x})- italic_D ( bold_italic_x ) D(𝒙)𝐷𝒙D(\bm{x})italic_D ( bold_italic_x ) D(G(𝒛))𝐷𝐺𝒛-D(G(\bm{z}))- italic_D ( italic_G ( bold_italic_z ) )
Hinge loss [21] max(0,1D(𝒙))01𝐷𝒙\max(0,1-D(\bm{x}))roman_max ( 0 , 1 - italic_D ( bold_italic_x ) ) max(0,1+D(G(𝒛))\max(0,1+D(G(\bm{z}))roman_max ( 0 , 1 + italic_D ( italic_G ( bold_italic_z ) ) D(G(𝒛))𝐷𝐺𝒛-D(G(\bm{z}))- italic_D ( italic_G ( bold_italic_z ) )
Least squares loss [22] 12(D(𝒙)bLS)212superscript𝐷𝒙subscript𝑏LS2\frac{1}{2}(D(\bm{x})-b_{\mathrm{LS}})^{2}divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( italic_D ( bold_italic_x ) - italic_b start_POSTSUBSCRIPT roman_LS end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT 12(D(G(𝒛))aLS)212superscript𝐷𝐺𝒛subscript𝑎LS2\frac{1}{2}(D(G(\bm{z}))-a_{\mathrm{LS}})^{2}divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( italic_D ( italic_G ( bold_italic_z ) ) - italic_a start_POSTSUBSCRIPT roman_LS end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT 12(D(G(𝒛))cLS)212superscript𝐷𝐺𝒛subscript𝑐LS2\frac{1}{2}(D(G(\bm{z}))-c_{\mathrm{LS}})^{2}divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( italic_D ( italic_G ( bold_italic_z ) ) - italic_c start_POSTSUBSCRIPT roman_LS end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
Table 1: Examples of adversarial loss functions. One of the setting aLSsubscript𝑎LSa_{\mathrm{LS}}italic_a start_POSTSUBSCRIPT roman_LS end_POSTSUBSCRIPT, bLSsubscript𝑏LSb_{\mathrm{LS}}italic_b start_POSTSUBSCRIPT roman_LS end_POSTSUBSCRIPT, and cLSsubscript𝑐LSc_{\mathrm{LS}}italic_c start_POSTSUBSCRIPT roman_LS end_POSTSUBSCRIPT proposed by Mao et al. [22] is (0,1,1)011(0,1,1)( 0 , 1 , 1 ).

Change of adversarial loss type Goodfellow et al. [9] pointed out that the BCE loss can lead to a problem that the discriminator can correctly discriminate the generated data (D(G(𝒛))0similar-to-or-equals𝐷𝐺𝒛0D(G(\bm{z}))\simeq 0italic_D ( italic_G ( bold_italic_z ) ) ≃ 0), resulting in saturated gradients. To address it, a non-saturating BCE loss is proposed by modifying fGsubscript𝑓𝐺f_{G}italic_f start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT shown in Table 1.

However, the instability still remains [1, 28]. Goodfellow et al. [9]have shown that the training with the BCE loss minimizes the Jensen-Shannon divergence between the real data distribution rsubscript𝑟\mathbb{P}_{r}blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT and generated data distribution gsubscript𝑔\mathbb{P}_{g}blackboard_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT. Arjovsky et al. [2] showed that the minimization causes instability, and they proposed Wasserstein loss based on Earth Mover (EM) distance. Moreover, Lim et al. [21] proposed a hinge loss, and Mao et al. [22] proposed a least squares loss based on minimizing Peason χ2superscript𝜒2\chi^{2}italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT distance. Table 1 shows the losses. These methods can be regarded as integral probability metric (IPM)-based regularization [36], where the generators and discriminators belong to a particular function class, such as models with Lipschitz continuity.

It is crucial to mathematically prove that gsubscript𝑔\mathbb{P}_{g}blackboard_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT at training convergence matches rsubscript𝑟\mathbb{P}_{r}blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT when changing the adversarial loss functions [9, 21, 22]. Note that the proof assumes infinite data from rsubscript𝑟\mathbb{P}_{r}blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT and ideal models for the training. Therefore, it cannot be perfectly reproduced in experiments, but it is useful to make theoretical analysis easy and acquire insightful knowledge. Previous research [9, 22] follows the proof procedure described below.

  1. 1.

    Find a discriminator D*superscript𝐷D^{*}italic_D start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT, which minimizes the loss for a fixed generator G𝐺Gitalic_G.

  2. 2.

    Find a generator Goptsubscript𝐺optG_{\mathrm{opt}}italic_G start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT, which minimizes D*superscript𝐷D^{*}italic_D start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT loss.

Let LD*=LD*,real+LD*,fakesubscript𝐿superscript𝐷subscript𝐿superscript𝐷𝑟𝑒𝑎𝑙subscript𝐿superscript𝐷𝑓𝑎𝑘𝑒L_{D^{*}}=L_{D^{*},real}+L_{D^{*},fake}italic_L start_POSTSUBSCRIPT italic_D start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = italic_L start_POSTSUBSCRIPT italic_D start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , italic_r italic_e italic_a italic_l end_POSTSUBSCRIPT + italic_L start_POSTSUBSCRIPT italic_D start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , italic_f italic_a italic_k italic_e end_POSTSUBSCRIPT denote the loss of D*superscript𝐷D^{*}italic_D start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT, LDoptsubscript𝐿subscript𝐷optL_{D_{\mathrm{opt}}}italic_L start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT end_POSTSUBSCRIPT denote discriminator’s loss in step 2 (Doptsubscript𝐷optD_{\mathrm{opt}}italic_D start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT), and LGoptsubscript𝐿subscript𝐺optL_{G_{\mathrm{opt}}}italic_L start_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT end_POSTSUBSCRIPT denote generator’s loss in step 2.

Adversarial loss LDopt,realsubscript𝐿subscript𝐷optrealL_{D_{\mathrm{opt}},\mathrm{real}}italic_L start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT , roman_real end_POSTSUBSCRIPT LDopt,fakesubscript𝐿subscript𝐷optfakeL_{D_{\mathrm{opt}},\mathrm{fake}}italic_L start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT , roman_fake end_POSTSUBSCRIPT LDoptsubscript𝐿subscript𝐷optL_{D_{\mathrm{opt}}}italic_L start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT end_POSTSUBSCRIPT LGoptsubscript𝐿subscript𝐺optL_{G_{\mathrm{opt}}}italic_L start_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT end_POSTSUBSCRIPT
BCE loss [9] log22\log 2roman_log 2 log22\log 2roman_log 2 - log22-\log 2- roman_log 2
BCE loss (non-saturating) [9] log22\log 2roman_log 2 log22\log 2roman_log 2 - log22\log 2roman_log 2
Wasserstein loss [2] - - 00 -
Hinge loss [21] - - 2222 1111
Least squares loss [22] (aLSbLS)28superscriptsubscript𝑎LSsubscript𝑏LS28\frac{(a_{\mathrm{LS}}-b_{\mathrm{LS}})^{2}}{8}divide start_ARG ( italic_a start_POSTSUBSCRIPT roman_LS end_POSTSUBSCRIPT - italic_b start_POSTSUBSCRIPT roman_LS end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 8 end_ARG (aLSbLS)28superscriptsubscript𝑎LSsubscript𝑏LS28\frac{(a_{\mathrm{LS}}-b_{\mathrm{LS}})^{2}}{8}divide start_ARG ( italic_a start_POSTSUBSCRIPT roman_LS end_POSTSUBSCRIPT - italic_b start_POSTSUBSCRIPT roman_LS end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 8 end_ARG - (aLS+bLS2cLS)28superscriptsubscript𝑎LSsubscript𝑏LS2subscript𝑐LS28\frac{(a_{\mathrm{LS}}+b_{\mathrm{LS}}-2c_{\mathrm{LS}})^{2}}{8}divide start_ARG ( italic_a start_POSTSUBSCRIPT roman_LS end_POSTSUBSCRIPT + italic_b start_POSTSUBSCRIPT roman_LS end_POSTSUBSCRIPT - 2 italic_c start_POSTSUBSCRIPT roman_LS end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 8 end_ARG
Table 2: The theoretical loss values for each adversarial loss when the training reaches the convergence. If LDopt,realsubscript𝐿subscript𝐷optrealL_{D_{\mathrm{opt}},\mathrm{real}}italic_L start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT , roman_real end_POSTSUBSCRIPT and LDopt,fakesubscript𝐿subscript𝐷optfakeL_{D_{\mathrm{opt}},\mathrm{fake}}italic_L start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT , roman_fake end_POSTSUBSCRIPT cannot be uniquely determined, only LDoptsubscript𝐿subscript𝐷optL_{D_{\mathrm{opt}}}italic_L start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT end_POSTSUBSCRIPT is shown, and if those can be determined, the apparent LDoptsubscript𝐿subscript𝐷optL_{D_{\mathrm{opt}}}italic_L start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT end_POSTSUBSCRIPT (=LDopt,real+LDopt,fakeabsentsubscript𝐿subscript𝐷optrealsubscript𝐿subscript𝐷optfake=L_{D_{\mathrm{opt}},\mathrm{real}}+L_{D_{\mathrm{opt}},\mathrm{fake}}= italic_L start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT , roman_real end_POSTSUBSCRIPT + italic_L start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT , roman_fake end_POSTSUBSCRIPT) is omitted. Also, omit the Wasserstein loss LGoptsubscript𝐿subscript𝐺optL_{G_{\mathrm{opt}}}italic_L start_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT end_POSTSUBSCRIPT since it is not uniquely determined.

Table 2 shows LDopt,realsubscript𝐿subscript𝐷optrealL_{D_{\mathrm{opt}},\mathrm{real}}italic_L start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT , roman_real end_POSTSUBSCRIPT, LDopt,fakesubscript𝐿subscript𝐷optfakeL_{D_{\mathrm{opt}},\mathrm{fake}}italic_L start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT , roman_fake end_POSTSUBSCRIPT, LDoptsubscript𝐿subscript𝐷optL_{D_{\mathrm{opt}}}italic_L start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT end_POSTSUBSCRIPT, and LGoptsubscript𝐿subscript𝐺optL_{G_{\mathrm{opt}}}italic_L start_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT end_POSTSUBSCRIPT for each adversarial loss.

Regularization There are various regularization techniques to stabilize the GAN training. One major approach is the addition of regularization terms to the adversarial loss. This directly prevents gradient explosion and vanishing [18, 26], model overfitting [3, 2], and mode collapse [4]. Gradient penalty [10] is an example of adding a regularization term, which improves the discriminator stability by adding a squared error between the gradient norm and 1111 so that the gradient norm approaches 1111. It contributes to Lipschitz continuity and training stabilization. These methods require tuning of the coefficients to balance the adversarial loss. Label smoothing [28] regularizes through the target labels. However, its stabilization effect on adversarial losses other than the BCE loss is unknown. Normalization is another common approach to regularization [37, 24]. Spectral normalization [24], a representative normalization technique for GANs, stabilizes training with Lipschitz continuity of the discriminator through the normalization of the weight matrices. Unlike the existing regularization techniques for GANs, our method directly regularizes the adversarial loss.

Change of architecture Changing the architecture is a commonly used approach to stabilization. For example, when generating images with a high resolution, a method that efficiently preserves the entire image features is essential. Deep convolutional GAN (DCGAN) [27], which employs a convolutional layer, and self-atteintion GAN (SAGAN) [39], which introduces an attention mechanism, have been proposed. Some studies [15, 3, 16, 17] have proposed to generate high-resolution and photo-realistic images by devising architectures. These model changes are relatively easy to combine with loss changes because they do not disrupt the competing structure, which is represented by adversarial losses.

2.2 Various application of adversarial architectures

Adversarial architectures for an efficient high-dimension distribution generator training have been applied in many research fields. For instance, Mirza et al. [23] proposed conditional GANs to control generated images with labels. The idea was employed for a wide range of applications, such as text-to-image generation [40] and image-to-image translation [14]. Additionally, domain-adversarial neural network (DANN) [8], adversarial discriminative domain adaptation (ADDA) [35], and Wasserstein distance guided representation learning (WDGRL) [29] proposed the use of adversarial frameworks for domain adaptation, which alleviates the gap between source and target domain in classification tasks. These architectures also have challenges with training multiple models.

2.3 Methods for overfitting in supervised learning

In supervised learning, overfitting is a well-known phenomenon in which a model performs well on training data but fails on unknown data. Typical methods for preventing overfitting in supervised learning are dropout [32], batch normalization [12], data augmentation [30], and label smoothing [33]. Many of these techniques can be applied not only to supervised learning but also to other domains, and they are sometimes adopted in GANs as well [20]. Ishida et al. [13] proposed flooding, a method that recalculates the loss based on the formula (4) with respect to the flood level b𝑏bitalic_b so that the loss does not become extremely small and avoids overfitting. On the other hand, Xie et al. [38] proposed individual flood (iFlood) to apply flooding before taking the expected value of losses. The method is effective because it can regularize only the loss of overfitted instances. While these methods perform well in supervised learning, there is no standard for setting b𝑏bitalic_b, and it requires a tuning process.

3 Method

In this section, we first provide an overview of GAN training and propose the application of flooding to GANs. In the proposal, we show that there are several ways to apply flooding to GANs and discuss the flood level setting.

3.1 Overview of GAN training

GAN training has a generator G𝐺Gitalic_G and a discriminator D𝐷Ditalic_D and proceeds based on losses defined in Eq. (2). This section assumes simple GANs without any regularization term to simplify the discussion. For example, with the BCE loss [9], LDsubscript𝐿𝐷L_{D}italic_L start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT and LGsubscript𝐿𝐺L_{G}italic_L start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT can be written as

LD=𝔼𝒙r[log(D(𝒙))]+𝔼𝒙g[log(1D(𝒙))],LG=𝔼𝒙g[log(1D(𝒙))].formulae-sequencesubscript𝐿𝐷subscript𝔼similar-to𝒙subscript𝑟delimited-[]𝐷𝒙subscript𝔼similar-to𝒙subscript𝑔delimited-[]1𝐷𝒙subscript𝐿𝐺subscript𝔼similar-to𝒙subscript𝑔delimited-[]1𝐷𝒙\begin{split}L_{D}=&\mathbb{E}_{\bm{x}\sim\mathbb{P}_{r}}[-\log(D(\bm{x}))]+% \mathbb{E}_{\bm{x}\sim\mathbb{P}_{g}}[-\log(1-D(\bm{x}))],L_{G}=\mathbb{E}_{% \bm{x}\sim\mathbb{P}_{g}}[\log(1-D(\bm{x}))].\end{split}start_ROW start_CELL italic_L start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT = end_CELL start_CELL blackboard_E start_POSTSUBSCRIPT bold_italic_x ∼ blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ - roman_log ( italic_D ( bold_italic_x ) ) ] + blackboard_E start_POSTSUBSCRIPT bold_italic_x ∼ blackboard_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ - roman_log ( 1 - italic_D ( bold_italic_x ) ) ] , italic_L start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT = blackboard_E start_POSTSUBSCRIPT bold_italic_x ∼ blackboard_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ roman_log ( 1 - italic_D ( bold_italic_x ) ) ] . end_CELL end_ROW (5)

3.2 Application of flooding to GANs

Instability in GANs can occur when the discriminator’s loss is too low, indicating overfitting of the discriminator. We aim to improve the stability by applying flooding, which is a method for preventing overfitting in supervised learning. In this case, we need to consider “how to apply flooding to GANs” and “how to set the flood level,” which are explained in the following sections.

How to apply flooding to GANs We propose to apply flooding to LDsubscript𝐿𝐷L_{D}italic_L start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT to avoid the overfitting of the discriminator. There are three ways to apply flooding, depending on the inserting position of the operation hhitalic_h defined in Eq. (4),

LD,flood,1=𝔼𝒙r[h(fD,real(D(𝒙)),breal)]+𝔼𝒙g[h(fD,fake(D(𝒙)),bfake)],LD,flood,2=h(𝔼𝒙r[fD,real(D(𝒙))],breal)+h(𝔼𝒙g[fD,fake(D(𝒙))],bfake),LD,flood,3=h(𝔼𝒙r[fD,real(D(𝒙))]+𝔼𝒙g[fD,fake(D(𝒙))],ball),formulae-sequencesubscript𝐿𝐷flood1subscript𝔼similar-to𝒙subscript𝑟delimited-[]subscript𝑓𝐷real𝐷𝒙subscript𝑏realsubscript𝔼similar-to𝒙subscript𝑔delimited-[]subscript𝑓𝐷fake𝐷𝒙subscript𝑏fakeformulae-sequencesubscript𝐿𝐷flood2subscript𝔼similar-to𝒙subscript𝑟delimited-[]subscript𝑓𝐷real𝐷𝒙subscript𝑏realsubscript𝔼similar-to𝒙subscript𝑔delimited-[]subscript𝑓𝐷fake𝐷𝒙subscript𝑏fakesubscript𝐿𝐷flood3subscript𝔼similar-to𝒙subscript𝑟delimited-[]subscript𝑓𝐷real𝐷𝒙subscript𝔼similar-to𝒙subscript𝑔delimited-[]subscript𝑓𝐷fake𝐷𝒙subscript𝑏all\begin{split}L_{D,\text{flood},1}=&\mathbb{E}_{\bm{x}\sim\mathbb{P}_{r}}[h(f_{% D,\text{real}}(D(\bm{x})),b_{\,\text{real}})]+\mathbb{E}_{\bm{x}\sim\mathbb{P}% _{g}}[h(f_{D,\text{fake}}(D(\bm{x})),b_{\,\text{fake}})],\\ L_{D,\text{flood},2}=&h(\mathbb{E}_{\bm{x}\sim\mathbb{P}_{r}}[f_{D,\text{real}% }(D(\bm{x}))],b_{\,\text{real}})+h(\mathbb{E}_{\bm{x}\sim\mathbb{P}_{g}}[f_{D,% \text{fake}}(D(\bm{x}))],b_{\,\text{fake}}),\\ L_{D,\text{flood},3}=&h(\mathbb{E}_{\bm{x}\sim\mathbb{P}_{r}}[f_{D,\text{real}% }(D(\bm{x}))]+\mathbb{E}_{\bm{x}\sim\mathbb{P}_{g}}[f_{D,\text{fake}}(D(\bm{x}% ))],b_{\,\text{all}}),\end{split}start_ROW start_CELL italic_L start_POSTSUBSCRIPT italic_D , flood , 1 end_POSTSUBSCRIPT = end_CELL start_CELL blackboard_E start_POSTSUBSCRIPT bold_italic_x ∼ blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_h ( italic_f start_POSTSUBSCRIPT italic_D , real end_POSTSUBSCRIPT ( italic_D ( bold_italic_x ) ) , italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT ) ] + blackboard_E start_POSTSUBSCRIPT bold_italic_x ∼ blackboard_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_h ( italic_f start_POSTSUBSCRIPT italic_D , fake end_POSTSUBSCRIPT ( italic_D ( bold_italic_x ) ) , italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT ) ] , end_CELL end_ROW start_ROW start_CELL italic_L start_POSTSUBSCRIPT italic_D , flood , 2 end_POSTSUBSCRIPT = end_CELL start_CELL italic_h ( blackboard_E start_POSTSUBSCRIPT bold_italic_x ∼ blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_f start_POSTSUBSCRIPT italic_D , real end_POSTSUBSCRIPT ( italic_D ( bold_italic_x ) ) ] , italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT ) + italic_h ( blackboard_E start_POSTSUBSCRIPT bold_italic_x ∼ blackboard_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_f start_POSTSUBSCRIPT italic_D , fake end_POSTSUBSCRIPT ( italic_D ( bold_italic_x ) ) ] , italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT ) , end_CELL end_ROW start_ROW start_CELL italic_L start_POSTSUBSCRIPT italic_D , flood , 3 end_POSTSUBSCRIPT = end_CELL start_CELL italic_h ( blackboard_E start_POSTSUBSCRIPT bold_italic_x ∼ blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_f start_POSTSUBSCRIPT italic_D , real end_POSTSUBSCRIPT ( italic_D ( bold_italic_x ) ) ] + blackboard_E start_POSTSUBSCRIPT bold_italic_x ∼ blackboard_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_f start_POSTSUBSCRIPT italic_D , fake end_POSTSUBSCRIPT ( italic_D ( bold_italic_x ) ) ] , italic_b start_POSTSUBSCRIPT all end_POSTSUBSCRIPT ) , end_CELL end_ROW (6)

where brealsubscript𝑏realb_{\,\text{real}}italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT and bfakesubscript𝑏fakeb_{\,\text{fake}}italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT are the flood levels for adversarial losses for real and generated data, respectively, and ballsubscript𝑏allb_{\,\text{all}}italic_b start_POSTSUBSCRIPT all end_POSTSUBSCRIPT is the flood level for the sum of the adversarial losses. As flooding and iFlood, the difference in the flood level and inserting position can cause performance improvement.

How to set the flood level The appropriate setting of the flood level brealsubscript𝑏realb_{\,\text{real}}italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT, bfakesubscript𝑏fakeb_{\,\text{fake}}italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT, and ballsubscript𝑏allb_{\,\text{all}}italic_b start_POSTSUBSCRIPT all end_POSTSUBSCRIPT is important. In supervised learning, because the loss at convergence is not uniquely determined but it depends on models and datasets, the flood level is a hyperparameter, and the appropriate range of it has not yet been shown. On the other hand, we propose the following hypotheses about the setting of the flood level for GANs by using property that LDopt,realsubscript𝐿subscript𝐷optrealL_{D_{\mathrm{opt}},\mathrm{real}}italic_L start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT , roman_real end_POSTSUBSCRIPT, LDopt,fakesubscript𝐿subscript𝐷optfakeL_{D_{\mathrm{opt}},\mathrm{fake}}italic_L start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT , roman_fake end_POSTSUBSCRIPT, or the sum of the losses LDoptsubscript𝐿subscript𝐷optL_{D_{\mathrm{opt}}}italic_L start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT end_POSTSUBSCRIPT is uniquely determined, as summarized in Table 2.

Hypothesis 1

If LDopt,realsubscript𝐿subscript𝐷normal-optnormal-realL_{D_{\mathrm{opt}},\mathrm{real}}italic_L start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT , roman_real end_POSTSUBSCRIPT and LDopt,fakesubscript𝐿subscript𝐷normal-optnormal-fakeL_{D_{\mathrm{opt}},\mathrm{fake}}italic_L start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT , roman_fake end_POSTSUBSCRIPT are uniquely determined, then brealsubscript𝑏normal-realb_{\,\mathrm{real}}italic_b start_POSTSUBSCRIPT roman_real end_POSTSUBSCRIPT and bfakesubscript𝑏normal-fakeb_{\,\mathrm{fake}}italic_b start_POSTSUBSCRIPT roman_fake end_POSTSUBSCRIPT should be set to satisfy two conditions, breal<LDopt,realsubscript𝑏normal-realsubscript𝐿subscript𝐷normal-optnormal-realb_{\,\mathrm{real}}<L_{D_{\mathrm{opt}},\mathrm{real}}italic_b start_POSTSUBSCRIPT roman_real end_POSTSUBSCRIPT < italic_L start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT , roman_real end_POSTSUBSCRIPT and bfake<LDopt,fakesubscript𝑏normal-fakesubscript𝐿subscript𝐷normal-optnormal-fakeb_{\,\mathrm{fake}}<L_{D_{\mathrm{opt}},\mathrm{fake}}italic_b start_POSTSUBSCRIPT roman_fake end_POSTSUBSCRIPT < italic_L start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT , roman_fake end_POSTSUBSCRIPT, for LD,flood,1subscript𝐿𝐷normal-flood1L_{D,\mathrm{flood},1}italic_L start_POSTSUBSCRIPT italic_D , roman_flood , 1 end_POSTSUBSCRIPT or LD,flood,2subscript𝐿𝐷normal-flood2L_{D,\mathrm{flood},2}italic_L start_POSTSUBSCRIPT italic_D , roman_flood , 2 end_POSTSUBSCRIPT.

Hypothesis 2

If LDoptsubscript𝐿subscript𝐷normal-optL_{D_{\mathrm{opt}}}italic_L start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT end_POSTSUBSCRIPT is uniquely determined, then brealsubscript𝑏normal-realb_{\,\mathrm{real}}italic_b start_POSTSUBSCRIPT roman_real end_POSTSUBSCRIPT and bfakesubscript𝑏normal-fakeb_{\,\mathrm{fake}}italic_b start_POSTSUBSCRIPT roman_fake end_POSTSUBSCRIPT should be set to satisfy a condition breal+bfake<LDoptsubscript𝑏normal-realsubscript𝑏normal-fakesubscript𝐿subscript𝐷normal-optb_{\,\mathrm{real}}+b_{\,\mathrm{fake}}<L_{D_{\mathrm{opt}}}italic_b start_POSTSUBSCRIPT roman_real end_POSTSUBSCRIPT + italic_b start_POSTSUBSCRIPT roman_fake end_POSTSUBSCRIPT < italic_L start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT end_POSTSUBSCRIPT, for LD,flood,1subscript𝐿𝐷normal-flood1L_{D,\mathrm{flood},1}italic_L start_POSTSUBSCRIPT italic_D , roman_flood , 1 end_POSTSUBSCRIPT or LD,flood,2subscript𝐿𝐷normal-flood2L_{D,\mathrm{flood},2}italic_L start_POSTSUBSCRIPT italic_D , roman_flood , 2 end_POSTSUBSCRIPT. Moreover, ballsubscript𝑏normal-allb_{\,\mathrm{all}}italic_b start_POSTSUBSCRIPT roman_all end_POSTSUBSCRIPT should be set to satisfy a condition ball<LDoptsubscript𝑏normal-allsubscript𝐿subscript𝐷normal-optb_{\,\mathrm{all}}<L_{D_{\mathrm{opt}}}italic_b start_POSTSUBSCRIPT roman_all end_POSTSUBSCRIPT < italic_L start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT end_POSTSUBSCRIPT, for LD,flood,3subscript𝐿𝐷normal-flood3L_{D,\mathrm{flood},3}italic_L start_POSTSUBSCRIPT italic_D , roman_flood , 3 end_POSTSUBSCRIPT.

If LDopt,realsubscript𝐿subscript𝐷optrealL_{D_{\mathrm{opt}},\mathrm{real}}italic_L start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT , roman_real end_POSTSUBSCRIPT and LDopt,fakesubscript𝐿subscript𝐷optfakeL_{D_{\mathrm{opt}},\mathrm{fake}}italic_L start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT , roman_fake end_POSTSUBSCRIPT are uniquely determined, we can suppose excessively low loss for each losses. This is the inspiration for Hypothesis 1. For example, since LDopt,realsubscript𝐿subscript𝐷optrealL_{D_{\mathrm{opt}},\mathrm{real}}italic_L start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT , roman_real end_POSTSUBSCRIPT and LDopt,fakesubscript𝐿subscript𝐷optfakeL_{D_{\mathrm{opt}},\mathrm{fake}}italic_L start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT , roman_fake end_POSTSUBSCRIPT are log22\log 2roman_log 2 for GANs with the BCE loss, brealsubscript𝑏realb_{\,\text{real}}italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT and bfakesubscript𝑏fakeb_{\,\text{fake}}italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT should be set lower than log22\log 2roman_log 2. On the other hand, there are cases where LDoptsubscript𝐿subscript𝐷optL_{D_{\mathrm{opt}}}italic_L start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT end_POSTSUBSCRIPT rather than LDopt,realsubscript𝐿subscript𝐷optrealL_{D_{\mathrm{opt}},\mathrm{real}}italic_L start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT , roman_real end_POSTSUBSCRIPT and LDopt,fakesubscript𝐿subscript𝐷optfakeL_{D_{\mathrm{opt}},\mathrm{fake}}italic_L start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT , roman_fake end_POSTSUBSCRIPT are uniquely determined as the hinge loss. In such cases, because LDoptsubscript𝐿subscript𝐷optL_{D_{\mathrm{opt}}}italic_L start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT end_POSTSUBSCRIPT is expressed as LDopt,real+LDopt,fakesubscript𝐿subscript𝐷optrealsubscript𝐿subscript𝐷optfakeL_{D_{\mathrm{opt}},\mathrm{real}}+L_{D_{\mathrm{opt}},\mathrm{fake}}italic_L start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT , roman_real end_POSTSUBSCRIPT + italic_L start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT , roman_fake end_POSTSUBSCRIPT, we should set brealsubscript𝑏realb_{\,\text{real}}italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT and bfakesubscript𝑏fakeb_{\,\text{fake}}italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT according to the sum. Moreover, the setting ballsubscript𝑏allb_{\,\text{all}}italic_b start_POSTSUBSCRIPT all end_POSTSUBSCRIPT should be lower than LDoptsubscript𝐿subscript𝐷optL_{D_{\mathrm{opt}}}italic_L start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT end_POSTSUBSCRIPT as Hypothesis 1. This is the inspiration for Hypothesis 2.

To provide theoretical support for Hypotheses 1 and 2, let us consider the case of GANs with the BCE loss. For each adversarial loss, g=rsubscript𝑔subscript𝑟\mathbb{P}_{g}=\mathbb{P}_{r}blackboard_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT = blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT at the training convergence is proved following the procedure outlined in Section 2.1. In the early stages of the training, the difference between gsubscript𝑔\mathbb{P}_{g}blackboard_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT and rsubscript𝑟\mathbb{P}_{r}blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT is more significant, so LD*subscript𝐿superscript𝐷L_{D^{*}}italic_L start_POSTSUBSCRIPT italic_D start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT is lower than LDoptsubscript𝐿subscript𝐷optL_{D_{\mathrm{opt}}}italic_L start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT end_POSTSUBSCRIPT because the discriminator can solve the discrimination task well by its distribution difference. For example, Goodfellow et al. [9] showed that with the BCE loss and a fixed generator G𝐺Gitalic_G,

D*(𝒙)=r(𝒙)r(𝒙)+g(𝒙).superscript𝐷𝒙subscript𝑟𝒙subscript𝑟𝒙subscript𝑔𝒙\begin{split}D^{*}(\bm{x})=\frac{\mathbb{P}_{r}(\bm{x})}{\mathbb{P}_{r}(\bm{x}% )+\mathbb{P}_{g}(\bm{x})}.\end{split}start_ROW start_CELL italic_D start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( bold_italic_x ) = divide start_ARG blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_x ) end_ARG start_ARG blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_x ) + blackboard_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ( bold_italic_x ) end_ARG . end_CELL end_ROW (7)

It can also be proven that LD*,realsubscript𝐿superscript𝐷realL_{D^{*},\mathrm{real}}italic_L start_POSTSUBSCRIPT italic_D start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , roman_real end_POSTSUBSCRIPT and LD*,fakesubscript𝐿superscript𝐷fakeL_{D^{*},\mathrm{fake}}italic_L start_POSTSUBSCRIPT italic_D start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , roman_fake end_POSTSUBSCRIPT are smaller than LDopt,realsubscript𝐿subscript𝐷optrealL_{D_{\mathrm{opt}},\mathrm{real}}italic_L start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT , roman_real end_POSTSUBSCRIPT and LDopt,fakesubscript𝐿subscript𝐷optfakeL_{D_{\mathrm{opt}},\mathrm{fake}}italic_L start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT , roman_fake end_POSTSUBSCRIPT. We can now show the following theorem.

Theorem 3.1

In training GANs with the BCE loss based on LD,flood,1subscript𝐿𝐷flood1L_{D,\textup{flood},1}italic_L start_POSTSUBSCRIPT italic_D , flood , 1 end_POSTSUBSCRIPT, on Supp(r(x))Supp(g(x))𝑆𝑢𝑝𝑝subscript𝑟𝑥𝑆𝑢𝑝𝑝subscript𝑔𝑥Supp(\mathbb{P}_{r}(x))\cup Supp(\mathbb{P}_{g}(x))italic_S italic_u italic_p italic_p ( blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_x ) ) ∪ italic_S italic_u italic_p italic_p ( blackboard_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ( italic_x ) ),

D*(𝒙){1ebfake,ebreal},superscript𝐷𝒙1superscript𝑒subscript𝑏fakesuperscript𝑒subscript𝑏real\displaystyle D^{*}(\bm{x})\in\{1-e^{-b_{\,\mathrm{fake}}},e^{-b_{\,\mathrm{% real}}}\},italic_D start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( bold_italic_x ) ∈ { 1 - italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT roman_fake end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT roman_real end_POSTSUBSCRIPT end_POSTSUPERSCRIPT } , (8)

with ebreal+ebfake1superscript𝑒subscript𝑏normal-realsuperscript𝑒subscript𝑏normal-fake1e^{-b_{\,\mathrm{real}}}+e^{-b_{\,\mathrm{fake}}}\leq 1italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT roman_real end_POSTSUBSCRIPT end_POSTSUPERSCRIPT + italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT roman_fake end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ≤ 1. On the other hand, with ebreal+ebfake>1superscript𝑒subscript𝑏normal-realsuperscript𝑒subscript𝑏normal-fake1e^{-b_{\,\mathrm{real}}}+e^{-b_{\,\mathrm{fake}}}>1italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT roman_real end_POSTSUBSCRIPT end_POSTSUPERSCRIPT + italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT roman_fake end_POSTSUBSCRIPT end_POSTSUPERSCRIPT > 1,

{D*(𝒙)=r(𝒙)r(𝒙)+g(𝒙)if g(𝐱) satisfies an inequality (10),D*(𝒙){1ebfake,ebreal}otherwise,casessuperscript𝐷𝒙subscript𝑟𝒙subscript𝑟𝒙subscript𝑔𝒙if g(𝐱) satisfies an inequality (10),superscript𝐷𝒙1superscript𝑒subscript𝑏fakesuperscript𝑒subscript𝑏realotherwise,\begin{split}\begin{cases}D^{*}(\bm{x})=\displaystyle\frac{\mathbb{P}_{r}(\bm{% x})}{\mathbb{P}_{r}(\bm{x})+\mathbb{P}_{g}(\bm{x})}&\text{if\> $\mathbb{P}_{g}% (\bm{x})$ satisfies}\text{ an inequality \eqref{condition},}\\ D^{*}(\bm{x})\in\{1-e^{-b_{\,\mathrm{fake}}},e^{-b_{\,\mathrm{real}}}\}&\text{% otherwise,}\end{cases}\end{split}start_ROW start_CELL { start_ROW start_CELL italic_D start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( bold_italic_x ) = divide start_ARG blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_x ) end_ARG start_ARG blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_x ) + blackboard_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ( bold_italic_x ) end_ARG end_CELL start_CELL italic_if italic_Pg(x) italic_satisfies italic_an italic_inequality italic_(), end_CELL end_ROW start_ROW start_CELL italic_D start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( bold_italic_x ) ∈ { 1 - italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT roman_fake end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT roman_real end_POSTSUBSCRIPT end_POSTSUPERSCRIPT } end_CELL start_CELL otherwise, end_CELL end_ROW end_CELL end_ROW (9)

where the inequality is

1ebfake<r(𝒙)r(𝒙)+g(𝒙)<ebreal.1superscript𝑒subscript𝑏fakesubscript𝑟𝒙subscript𝑟𝒙subscript𝑔𝒙superscript𝑒subscript𝑏real\displaystyle 1-e^{-b_{\,\mathrm{fake}}}<\frac{\mathbb{P}_{r}(\bm{x})}{\mathbb% {P}_{r}(\bm{x})+\mathbb{P}_{g}(\bm{x})}<e^{-b_{\,\mathrm{real}}}.1 - italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT roman_fake end_POSTSUBSCRIPT end_POSTSUPERSCRIPT < divide start_ARG blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_x ) end_ARG start_ARG blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_x ) + blackboard_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ( bold_italic_x ) end_ARG < italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT roman_real end_POSTSUBSCRIPT end_POSTSUPERSCRIPT . (10)
Proof

See Supplementary Section 0.A.

The theorem is important in the following three points.

  1. 1.

    When ebreal+ebfake<1superscript𝑒subscript𝑏realsuperscript𝑒subscript𝑏fake1e^{-b_{\,\text{real}}}+e^{-b_{\,\text{fake}}}<1italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT end_POSTSUPERSCRIPT + italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT end_POSTSUPERSCRIPT < 1, the output of D*superscript𝐷D^{*}italic_D start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT becomes a constant, which does not relate to gsubscript𝑔\mathbb{P}_{g}blackboard_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT and rsubscript𝑟\mathbb{P}_{r}blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT, and then training will collapse.

  2. 2.

    When ebreal+ebfake>1superscript𝑒subscript𝑏realsuperscript𝑒subscript𝑏fake1e^{-b_{\,\text{real}}}+e^{-b_{\,\text{fake}}}>1italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT end_POSTSUPERSCRIPT + italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT end_POSTSUPERSCRIPT > 1 and the inequality  (10) is held, the output of D*superscript𝐷D^{*}italic_D start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT is the same as Eq. (7). Moreover, if breal<log2subscript𝑏real2b_{\,\text{real}}<\log 2italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT < roman_log 2 and bfake<log2subscript𝑏fake2b_{\,\text{fake}}<\log 2italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT < roman_log 2 is satisfied, ebreal+ebfake>1superscript𝑒subscript𝑏realsuperscript𝑒subscript𝑏fake1e^{-b_{\,\text{real}}}+e^{-b_{\,\text{fake}}}>1italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT end_POSTSUPERSCRIPT + italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT end_POSTSUPERSCRIPT > 1 and the inequality (10) are held where g=rsubscript𝑔subscript𝑟\mathbb{P}_{g}=\mathbb{P}_{r}blackboard_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT = blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT. Because the goal of GANs is that the generator satisfies g=rsubscript𝑔subscript𝑟\mathbb{P}_{g}=\mathbb{P}_{r}blackboard_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT = blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT, breal<log2subscript𝑏real2b_{\,\text{real}}<\log 2italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT < roman_log 2 and bfake<log2subscript𝑏fake2b_{\,\text{fake}}<\log 2italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT < roman_log 2 is required for GAN training convergence.

  3. 3.

    When ebreal+ebfake>1superscript𝑒subscript𝑏realsuperscript𝑒subscript𝑏fake1e^{-b_{\,\text{real}}}+e^{-b_{\,\text{fake}}}>1italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT end_POSTSUPERSCRIPT + italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT end_POSTSUPERSCRIPT > 1, the discriminator with a higher flood level is more difficult to satisfy the inequality (10).

The details are shown in Supplementary Section 0.B. We assume that higher flood levels increase the dangers mentioned in the third point while lower flood levels diminish the stabilizing effect. Therefore, it is necessary to investigate the appropriate setting of the flood level through experiments.

4 Experiment

We show experimentally the appropriate application and the effect of flooding on GANs discussed in Section 3.

4.1 Implementation

We briefly summarize the implementation below. The details are provided in Supplementary Section 0.D.

Synthetic Dataset To examine the effect of flooding for training GANs, we used the ring of 2D Gaussian dataset (2D Ring) as previous research [21, 7, 10, 34, 22]. We did five runs and evaluated the variety and quality of generated samples with ‘modes’ and ‘high quality (HQ),’ proposed in [31]. For instance, for 2D Ring, it holds that modes 8absent8\leq 8≤ 8 and HQ 1absent1\leq 1≤ 1. In order to confirm how much flooding prevents mode collapse, we consider higher modes as better performance. If modes are the same, we consider one with higher HQ as better performance.

DCGAN We used unconditional DCGAN [27] to evaluate the performance of image generation. We used CIFAR10, CIFAR100 at 32×\times×32, STL10 at 64×\times×64, CelebA at both 64×\times×64 and 128×\times×128. We conducted each experiment five times and evaluated the generated images using Fréchet Inception Distance (FID) [11].

Large model We conducted experiments using StarGAN V2 [5] to investigate the effect of flooding in the generation of larger images. We followed the author’s implementation and used CelebAHQ. We conducted each experiment five times and evaluated the generated images with FID and LPIPS [41].

Flooding type Eval w/o flooding Small Medium Near Opt Opt Over Opt
1 Modes 4.8 (1.2) 6.6 (0.8) 7.8 (0.4) 7.0 (0.0) 2.0 (0.6) 0.0 (0.0)
HQ 0.90 (0.11) 0.94 (0.03) 0.87 (0.07) 0.90 (0.03) 0.00 (0.00) 0.00 (0.00)
2 Modes - 4.0 (2.0) 4.2 (1.3) 7.0 (0.6) 0.2 (0.4) 0.0 (0.0)
HQ - 0.65 (0.35) 0.75 (0.18) 0.28 (0.18) 0.00 (0.00) 0.00 (0.00)
3 Modes - 4.0 (2.3) 5.2 (1.9) 7.2 (0.4) 2.8 (1.6) 0.0 (0.0)
HQ - 0.68 (0.35) 0.91 (0.06) 0.24 (0.19) 0.01 (0.01) 0.00 (0.00)
Table 3: Average (standard deviation) of modes and HQ with the BCE loss, different flooding types, and flood levels.

4.2 How to apply flooding

First, we investigated the stabilizing effect of GAN training by flooding with a synthetic dataset and determined which flooding type in Eq. (6) is appropriate. We used the non-saturating BCE loss, which will be referred to as the BCE loss in the latter part of the paper. Note that the discussion in Section 3.2 does not use fG=log(1D(G(z)))subscript𝑓𝐺1𝐷𝐺𝑧f_{G}=\log(1-D(G(z)))italic_f start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT = roman_log ( 1 - italic_D ( italic_G ( italic_z ) ) ), and we can make the same arguments with the non-saturating BCE loss. To compare a case without flooding and to find the appropriate flood level, we conducted experiments on five different settings (Small, Medium, Near Opt, Opt, Over opt) with condition breal=bfakesubscript𝑏realsubscript𝑏fakeb_{\,\text{real}}=b_{\,\text{fake}}italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT = italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT. The details of the setting are described in Supplementary Section 0.D.

Table 3 shows the average and standard deviation of the modes and HQ. Results without flooding indicate a high HQ (0.90) but low modes (4.8). This suggests that only a few of the eight Gaussian centers are accurately represented, implying that mode collapse occurred. With flooding type LD,flood,1subscript𝐿𝐷flood1L_{D,\text{flood},1}italic_L start_POSTSUBSCRIPT italic_D , flood , 1 end_POSTSUBSCRIPT, the results of flood levels at Small, Medium, and Near Opt show that the modes are better and HQ achieves a high value. Specifically, with flood level Medium, the generator expresses all of the Gaussian centers in four out of five runs. On the other hand, with flooding type LD,flood,2subscript𝐿𝐷flood2L_{D,\text{flood},2}italic_L start_POSTSUBSCRIPT italic_D , flood , 2 end_POSTSUBSCRIPT and LD,flood,3subscript𝐿𝐷flood3L_{D,\text{flood},3}italic_L start_POSTSUBSCRIPT italic_D , flood , 3 end_POSTSUBSCRIPT, the results of flood level Small and Medium are poor in mode, and the result of flood level Near Opt is poor in HQ. This suggests that instance-level flooding LD,flood,1subscript𝐿𝐷flood1L_{D,\text{flood},1}italic_L start_POSTSUBSCRIPT italic_D , flood , 1 end_POSTSUBSCRIPT and flood level under LDopt,realsubscript𝐿subscript𝐷optrealL_{D_{\mathrm{opt}},\mathrm{real}}italic_L start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT , roman_real end_POSTSUBSCRIPT and LDopt,fakesubscript𝐿subscript𝐷optfakeL_{D_{\mathrm{opt}},\mathrm{fake}}italic_L start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT , roman_fake end_POSTSUBSCRIPT is effective in GAN training. Because without flooding the discriminator avoids taking losses below the flood level, it also shows that training becomes unstable when the loss is too low at the instance-level. It is also worth noting that the flood level Near Opt with flooding type LD,flood,1subscript𝐿𝐷flood1L_{D,\text{flood},1}italic_L start_POSTSUBSCRIPT italic_D , flood , 1 end_POSTSUBSCRIPT stabilizes the training. It suggests that GAN training can progress even if the discriminator loses its ability to take low losses. As shown in Theorem 3.1, setting a larger flood level has a drawback that reduces the probability of satisfying the inequality (10). However, the experimental result suggests that preventing destabilization is more beneficial than the drawback. We can also see that the training completely collapses when the flood level brealsubscript𝑏realb_{\,\text{real}}italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT, bfakesubscript𝑏fakeb_{\,\text{fake}}italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT, and ballsubscript𝑏allb_{\,\text{all}}italic_b start_POSTSUBSCRIPT all end_POSTSUBSCRIPT takes more than LDopt,realsubscript𝐿subscript𝐷optrealL_{D_{\mathrm{opt}},\mathrm{real}}italic_L start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT , roman_real end_POSTSUBSCRIPT, LDopt,fakesubscript𝐿subscript𝐷optfakeL_{D_{\mathrm{opt}},\mathrm{fake}}italic_L start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT , roman_fake end_POSTSUBSCRIPT, and LDoptsubscript𝐿subscript𝐷optL_{D_{\mathrm{opt}}}italic_L start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT end_POSTSUBSCRIPT. As shown in Figure 2 (a) and (b), compared with the modes and HQ transition without flooding, the collapse with flood level Opt and Over Opt started early on. It indicates that such a configuration disrupts the GAN training, as discussed in Section 3.2. We provide analysis on the loss and gradient with flooding in Supplementary Section 0.F.

(a) Transition of modes
(b) Transition of HQ
Refer to caption
Refer to caption
Refer to caption
(a) Transition of modes
(b) Transition of HQ
(c) Generated samples
Figure 2: (a), (b): Relationship between flood levels and modes and HQ transition with the moving average. (c): Transition of 1,000 generated samples. The horizontal axis indicates the iterations. In the early phase (10k, 30k), the performance of all experiments is comparatively good. However, after the mid-phase (50k), without flooding or with one-sided (real) flooding, some modes disappear. In contrast, with two-sided flooding or one-sided (fake) flooding, the training collapse is avoided.
Adversarial loss Eval w/o flooding Small Medium Near Opt Opt Over Opt
BCE loss Modes 4.8 (1.2) 6.6 (0.8) 7.8 (0.4) 7.0 (0.0) 2.0 (0.6) 0.0 (0.0)
HQ 0.90 (0.11) 0.94 (0.03) 0.87 (0.07) 0.90 (0.03) 0.00 (0.00) 0.00 (0.00)
Hinge loss Modes 7.4 (0.8) 6.6 (0.8) 8.0 (0.0) 7.4 (0.8) 0.6 (0.5) 0.0 (0.0)
HQ 0.83 (0.13) 0.73 (0.20) 0.78 (0.04) 0.81 (0.07) 0.00 (0.00) 0.00 (0.00)
Least squares loss Modes 6.6 (0.8) 6.8 (0.4) 7.8 (0.4) 7.4 (0.5) 0.0 (0.0) 0.0 (0.0)
HQ 0.80 (0.09) 0.79 (0.09) 0.90 (0.02) 0.78 (0.07) 0.00 (0.00) 0.00 (0.00)
Wasserstein loss Modes 8.0 (0.0) 8.0 (0.0) 8.0 (0.0) 0.0 (0.0) 0.0 (0.0) 0.0 (0.0)
HQ 0.93 (0.01) 0.95 (0.01) 0.95 (0.01) 0.00 (0.00) 0.00 (0.00) 0.00 (0.00)
Table 4: Average (standard deviation) of modes and HQ with various adversarial losses and flood levels.

4.3 Flooding for various adversarial losses

We also tested the effect of flooding with other adversarial losses: the hinge loss, the least squares loss, and the Wasserstein loss with gradient penalty (WGAN-GP). The flood level setting is provided in Table 10.

The results are shown in Table 4. It indicates that flooding with various adversarial losses and flood level Medium stabilizes the training.

Adversarial loss Eval w/o flooding Two-sided One-sided (real) One-sided (fake) Smoothing
BCE Loss Modes 4.8 (1.2) 7.8 (0.4) 4.0 (1.8) 8.0 (0.0) 7.4 (0.8)
HQ 0.90 (0.11) 0.87 (0.07) 0.77 (0.13) 0.89 (0.08) 0.88 (0.06)
Hinge Loss Modes 7.4 (0.8) 8.0 (0.0) 5.4 (1.5) 8.0 (0.0) 7.4 (0.5)
HQ 0.83 (0.13) 0.78 (0.04) 0.79 (0.05) 0.81 (0.06) 0.87 (0.05)
Least squares loss Modes 6.6 (0.8) 7.8 (0.4) 5.6 (0.5) 7.8 (0.4) 2.6 (0.5)
HQ 0.80 (0.09) 0.90 (0.02) 0.83 (0.08) 0.85 (0.12) 0.92 (0.06)
Wasserstein loss Modes 8.0 (0.0) 8.0 (0.0) 8.0 (0.0) 8.0 (0.0) 0.2 (0.4)
HQ 0.93 (0.01) 0.95 (0.01) 0.95 (0.01) 0.93 (0.02) 0.00 (0.00)
Table 5: Average (standard deviation) of modes and HQ with various adversarial losses with one-sided or two-sided flooding with flood level Medium.

4.4 One-sided flooding

In Section 4.2 we confirmed the effect of flooding with flooding type LD,flood,1subscript𝐿𝐷flood1L_{D,\text{flood},1}italic_L start_POSTSUBSCRIPT italic_D , flood , 1 end_POSTSUBSCRIPT, where flooding is applied to the discriminator’s loss for both real and generated data, we call it ‘two-sided flooding’. As a special case, we investigated what happens if flooding was applied only to the loss for real data (‘one-sided (real)’ flooding) or generated data (‘one-sided (fake)’ flooding), i.e., when we set only brealsubscript𝑏realb_{\mathrm{real}}italic_b start_POSTSUBSCRIPT roman_real end_POSTSUBSCRIPT or bfakesubscript𝑏fakeb_{\mathrm{fake}}italic_b start_POSTSUBSCRIPT roman_fake end_POSTSUBSCRIPT, respectively. We did experiments on various adversarial losses with flood level Medium.

Table 5 shows the results. We can see that the performance with one-sided (fake) flooding is comparable to that of two-sided flooding except for the Wasserstein loss. It is noteworthy that one-sided (real) led to significant performance degradation in the BCE loss, the hinge loss, and the least-squares loss. It suggests that the discriminator overfits the generated data rather than real data with the setting. In other words, although it is difficult for the discriminator to overfit real samples drawn from the true distribution at each iteration, it is easier to overfit the generated samples from the generator whose expressiveness is low. In Figure 2 (c), we show the transition of generated samples with the BCE loss.

4.5 Comparison with label smoothing

Next, we compared the effect of flooding and label smoothing. Label smoothing and the proposed method share similarities as stabilization techniques that do not add regularization terms while preserving the type of adversarial losses. Label smoothing calculates LD,realsubscript𝐿𝐷realL_{D,\text{real}}italic_L start_POSTSUBSCRIPT italic_D , real end_POSTSUBSCRIPT with a=0.9𝑎0.9a=0.9italic_a = 0.9 (recommended by [28]),

LD,real=𝔼𝒙r[afD,real(D(𝒙))+(1a)fD,fake(D(𝒙))].subscript𝐿𝐷realsubscript𝔼similar-to𝒙subscript𝑟delimited-[]𝑎subscript𝑓𝐷real𝐷𝒙1𝑎subscript𝑓𝐷fake𝐷𝒙\begin{split}L_{D,\text{real}}=\mathbb{E}_{\bm{x}\sim\mathbb{P}_{r}}[&a\cdot f% _{D,\text{real}}(D(\bm{x}))+(1-a)\cdot f_{D,\text{fake}}(D(\bm{x}))].\end{split}start_ROW start_CELL italic_L start_POSTSUBSCRIPT italic_D , real end_POSTSUBSCRIPT = blackboard_E start_POSTSUBSCRIPT bold_italic_x ∼ blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ end_CELL start_CELL italic_a ⋅ italic_f start_POSTSUBSCRIPT italic_D , real end_POSTSUBSCRIPT ( italic_D ( bold_italic_x ) ) + ( 1 - italic_a ) ⋅ italic_f start_POSTSUBSCRIPT italic_D , fake end_POSTSUBSCRIPT ( italic_D ( bold_italic_x ) ) ] . end_CELL end_ROW (11)

Table 5 shows that the label smoothing has a positive effect only for the BCE loss and the hinge loss, and it worsens the performance for the least squares loss and the Wasserstein loss while flooding performs well for all four losses.

We provide additional experiments in Supplementary Section 0.E.

4.6 Flooding for DCGAN

(a) Adversarial loss w/o flooding Two-sided One-sided (real) One-sided (fake)
BCE loss 317.4 (129.0) 236.2 (125.1) 112.2 (145.1) 392.4 (46.5)
Hinge loss 74.4 (33.6) 237.5 (143.6) 44.6 (3.2) 368.6 (37.8)
Least squares loss 67.9 (34.8) 204.8 (82.8) 51.2 (13.3) 417.8 (109.0)
Wasserstein loss 82.7 (13.5) 402.6 (35.5) 75.7 (13.8) 79.6 (3.1)
(b) Regularization
Gradient Penalty 214.0 (151.8) 184.9 (147.9) 62.5 (8.8) 195.5 (173.2)
Spectral Normalization 270.5 (149.4) 314.0 (56.9) 34.9 (2.5) 290.8 (95.9)
Table 6: Results with CelebA (128×\times×128) with or without flooding for (a) each adversarial loss and (b) regularization with the BCE loss.
(a) Training images
(b) Without flooding
Refer to caption
Refer to caption
Refer to caption
(a) Training images
(b) Without flooding
(c) With flooding
Figure 3: Comparison of generated images with CelebA (128×\times×128) and the BCE loss. Without flooding, training completely collapses in four out of five trials with generated images like noise. In one trial, there was no such collapse, but compared to the training images (a), the generated image gradually collapses, for example, the graying ((b), FID: 60.5). In contrast, if flooding is applied at the appropriate flood level, the generated images of four trials are not collapsed, as shown in ((c), FID: 38.4).

Flooding for various adversarial losses Next, we examined the effect of flooding on image generation tasks. We use various adversarial losses, DCGAN, and CelebA (128×\times×128) with flood level Medium.

We show the results in Table 6 (a) and the generated images in Figure 3. The results show that the combination of one-sided (real) flooding and the change in the type of adversarial loss further stabilizes the training. On the other hand, it seems that two-sided or one-sided (fake) is not effective in most cases.

Dataset w/o flooding Two-sided One-sided (real) One-sided (fake)
CIFAR10 35.5 (0.8) 37.2 (1.5) 34.3 (0.8) 36.5 (1.0)
CIFAR100 41.7 (1.7) 38.9 (1.4) 36.2 (1.4) 41.8 (1.6)
STL10 154.1 (9.8) 133.6 (4.9) 133.6 (3.4) 144.3 (13.0)
CelebA (64×\times×64) 91.0 (2.7) 89.8 (1.4) 87.3 (1.1) 179.8 (105.6)
CelebA (128×\times×128) 317.4 (129.0) 236.2 (125.1) 112.2 (145.1) 392.4 (46.5)
Table 7: FID for each dataset with or without flooding.

Flooding for various datasets Next, we verified the stabilization effect of flooding regardless of the dataset. We conducted experiments on CIFAR10, CIFAR100, and STL10, with DCGAN, the BCE loss, and flood level Medium.

We show the results in Table 7. Although the difficulty of the image generation task is different among the datasets, one-sided (real) flooding is effective on various datasets. One reason why one-sided (real) flooding is effective is because it prevents the discriminator from memorization in GANs, hypothesized in Section 4.2 of [3]. In other words, if the discriminator memorizes the limited dataset, its loss would sharply drop, but flooding can prevent it.

4.7 Flooding with other techniques

Additional regularization terms and architectural changes can stabilize the GAN training. We examined the effect of flooding in combination with spectral normalization and gradient penalty, commonly used as improvement methods, on DCGAN and CelebA (128×\times×128), with the BCE loss. Note that we used gradient penalty with the BCE loss as ‘GAN-GP’ in [24], while we used it with the Wasserstein loss in Section 3 , which is necessary for the theoretical proof of training convergence [10].

Table 6 (b) shows the results. Even when using the gradient penalty for the BCE loss or spectral normalization alone, they could not prevent the collapse of GAN training on the dataset. On the other hand, using these regularization with flooding improves the performance, which shows the effect of flooding.

Application w/o flooding Two-sided One-sided (real) One-sided (fake)
CDCGAN 90.0 (19.6) 88.7 (16.2) 66.2 (4.9) 108.7 (14.3)
ADDA 0.60 (0.07) 0.76 (0.05) 0.70 (0.06) 0.24 (0.25)
DANN 0.74 (0.02) 0.67 (0.05) 0.72 (0.03) 0.71 (0.02)
WDGRL 0.65 (0.08) 0.57 (0.07) 0.67 (0.07) 0.66 (0.08)
Table 8: Results with or without in combination with various adversarial frameworks. We evaluate the performance of conditional generation (CDCGAN) with FID and domain adaptation (ADDA, DANN, and WDGRL) with accuracy for the target data.

4.8 Flooding with other adversarial application

We examined the effect of flooding with other adversarial applications, such as conditional GANs [23] and domain adaptation. For the experiments of conditional GANs, we conduct experiments with conditional DCGAN (CDCGAN) and CIFAR10. To verify the effect of flooding on adversarial learning for domain adaptation, we used ADDA [35], DANN [8], and WDGRL [29]. We conducted experiments with MNIST (source domain) and MNIST-M (target domain).

Table 8 shows the result. First, the results demonstrate that flooding performs well on CDCGAN. Moreover, it shows that flooding was significantly effective on ADDA, but not on DANN and WDGRL. One possible interpretation is that the discriminator only in ADDA causes overfitting. DANN and WDGRL update the model that generates features from the source domain data during the training, whereas ADDA fixes the model. Therefore, it’s possible that in ADDA, the discriminator overfits fixed source domain features from the fixed model, but flooding prevents it. Therefore, it supports the effect of flooding in a wide range of adversarial applications to prevent overfitting and increase performance.

4.9 Flooding with large models

We verified the effect of flooding in large-scale GANs. StarGAN V2  [5] demonstrates high performance in image-to-image translation. The generator generates style codes from random latent codes or reference images, and then generates images from source images and the style codes. We tried one-sided (real) flooding because it performs better for image generation in Section 4.6.

Table 9 shows the performance. One-sided (real) flooding improved three out of four metrics. For the FID (reference) measurement where performance dropped, the variance is significantly larger than that of other measurements. Therefore, we believe its reliability is lower than the others. Figure 4 shows examples of the generated images.

FID (latent) (\downarrow) LPIPS (latent) (\uparrow) FID (reference) (\downarrow) LPIPS (reference) (\uparrow)
w/o flooding 13.92 (0.38) 0.4494 (0.004) 23.51 (1.59) 0.3886 (0.001)
Flooding 13.23 (0.37) 0.4555 (0.003) 24.64 (0.99) 0.3925 (0.003)
Table 9: Results of the image generation from random latent vector (latent) and reference images (reference) on StarGAN V2.
Refer to caption
Figure 4: Generated images with StarGAN V2 and CelebAHQ. The latent vector or reference images are fixed in the rows, while the source images are fixed in the columns.

4.10 Limitations

While flooding demonstrates stabilization effects in the training of various GANs, we observed collapsed results with CelebA (128×\times×128) on one for five runs. Therefore, we should be aware that flooding cannot prevent all instabilities of GAN training. For example, well-known existing methods, such as gradient penalty and spectral normalization ensure Lipschitz continuity of the discriminator; however, it is beyond the scope of our approach. Therefore, flooding should be combined with other stabilization techniques appropriately.

5 Conclusion and future work

We proposed to apply flooding, a method for preventing overfitting in supervised learning, to GANs. Although our proposed method has an additional hyperparameter b𝑏bitalic_b, we demonstrated how we consider a range for the flood level. We support the proposal through the theoretical analysis of the relationship between the flood level and the distribution of generated data. The stabilization effect of flooding and the proposal’s validity were demonstrated through experiments. We also showed that flooding is effective when combined with existing training stabilization methods.

Further investigation is necessary to understand why GAN training with flooding can progress stably.

Acknowledgment

We appreciate Johannes Ackermann for reviewing the paper.

References

  • [1] Arjovsky, M., Bottou, L.: Towards principled methods for training generative adversarial networks. In: ICLR (2017)
  • [2] Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein generative adversarial networks. In: ICML (2017)
  • [3] Brock, A., Donahue, J., Simonyan, K.: Large scale GAN training for high fidelity natural image synthesis. In: ICLR (2019)
  • [4] Che, T., Li, Y., Jacob, A., Bengio, Y., Li, W.: Mode regularized generative adversarial networks. In: ICLR (2017)
  • [5] Choi, Y., Uh, Y., Yoo, J., Ha, J.W.: Stargan v2: Diverse image synthesis for multiple domains. In: CVPR (2020)
  • [6] Demir, U., Ünal, G.B.: Patch-based image inpainting with generative adversarial networks. arXiv preprint arXiv:1803.07422 (2018)
  • [7] Eghbal-zadeh, H., Zellinger, W., Widmer, G.: Mixture density generative adversarial networks. In: CVPR (2019)
  • [8] Ganin, Y., Ustinova, E., Ajakan, H., Germain, P., Larochelle, H., Laviolette, F., March, M., Lempitsky, V.: Domain-adversarial training of neural networks. Journal of Machine Learning Research 17(59), 1–35 (2016)
  • [9] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: NeurIPS (2014)
  • [10] Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.C.: Improved training of wasserstein GANs. In: NeurIPS (2017)
  • [11] Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: Gans trained by a two time-scale update rule converge to a local nash equilibrium. In: NeurIPS (2017)
  • [12] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: ICML (2015)
  • [13] Ishida, T., Yamane, I., Sakai, T., Niu, G., Sugiyama, M.: Do we need zero training loss after achieving zero training error? In: ICML (2020)
  • [14] Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: CVPR (2017)
  • [15] Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of GANs for improved quality, stability, and variation. In: ICLR (2018)
  • [16] Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: CVPR (2019)
  • [17] Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., Aila, T.: Analyzing and improving the image quality of stylegan. In: CVPR (2020)
  • [18] Kodali, N., Abernethy, J.D., Hays, J., Kira, Z.: How to train your DRAGAN. CoRR abs/1705.07215 (2017), http://arxiv.org/abs/1705.07215
  • [19] Ledig, C., Theis, L., Huszar, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A., Tejani, A., Totz, J., Wang, Z., Shi, W.: Photo-realistic single image super-resolution using a generative adversarial network. In: CVPR (2017)
  • [20] Li, Z., Usman, M., Tao, R., Xia, P., Wang, C., Chen, H., Li, B.: A systematic survey of regularization and normalization in gans. ACM Comput. Surv. 55(11) (2023)
  • [21] Lim, J.H., Ye, J.C.: Geometric gan. arXiv preprint arXiv:1705.02894 (2017)
  • [22] Mao, X., Li, Q., Xie, H., Lau, R.Y., Wang, Z., Paul Smolley, S.: Least squares generative adversarial networks. In: ICCV (2017)
  • [23] Mirza, M., Osindero, S.: Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014)
  • [24] Miyato, T., Kataoka, T., Koyama, M., Yoshida, Y.: Spectral normalization for generative adversarial networks. In: ICLR (2018)
  • [25] Park, T., Liu, M.Y., Wang, T.C., Zhu, J.Y.: Semantic image synthesis with spatially-adaptive normalization. In: CVPR (2019)
  • [26] Petzka, H., Fischer, A., Lukovnikov, D.: On the regularization of wasserstein GANs. In: ICLR (2018)
  • [27] Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434 (2016)
  • [28] Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X., Chen, X.: Improved techniques for training GANs. In: NeurIPS (2016)
  • [29] Shen, J., Qu, Y., Zhang, W., Yu, Y.: Wasserstein distance guided representation learning for domain adaptation. AAAI’18/IAAI’18/EAAI’18, AAAI Press (2018)
  • [30] Shorten, C., Khoshgoftaar, T.M.: A survey on image data augmentation for deep learning. Journal of Big Data 6(1),  60 (2019)
  • [31] Srivastava, A., Valkov, L., Russell, C., Gutmann, M.U., Sutton, C.: Veegan: Reducing mode collapse in gans using implicit variational learning. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) NeurIPS. vol. 30. Curran Associates, Inc. (2017)
  • [32] Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research 15(56), 1929–1958 (2014), http://jmlr.org/papers/v15/srivastava14a.html
  • [33] Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: CVPR (2016)
  • [34] Thanh-Tung, H., Tran, T.: Catastrophic forgetting and mode collapse in gans. In: IJCNN. pp. 1–10 (2020)
  • [35] Tzeng, E., Hoffman, J., Saenko, K., Darrell, T.: Adversarial discriminative domain adaptation. In: CVPR (2017)
  • [36] Wang, Z., She, Q., Ward, T.E.: Generative adversarial networks in computer vision: A survey and taxonomy. ACM Comput. Surv. 54(2) (feb 2021)
  • [37] Xiang, S., Li, H.: On the effects of batch and weight normalization in generative adversarial networks. arXiv preprint arXiv:1704.03971 (2017)
  • [38] Xie, Y., WANG, Z., Li, Y., Zhang, C., Zhou, J., Ding, B.: iflood: A stable and effective regularizer. In: ICLR (2022)
  • [39] Zhang, H., Goodfellow, I., Metaxas, D., Odena, A.: Self-attention generative adversarial networks. In: ICML (2019)
  • [40] Zhang, H., Xu, T., Li, H., Zhang, S., Wang, X., Huang, X., Metaxas, D.: Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. In: ICCV (2017)
  • [41] Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: CVPR (2018)
  • [42] Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: ICCV (2017)

Appendix 0.A Proof of Theorem 1

Proof

The discriminator’s loss LD,flood,1subscript𝐿𝐷flood1L_{D,\text{flood},1}italic_L start_POSTSUBSCRIPT italic_D , flood , 1 end_POSTSUBSCRIPT with the BCE loss is expressed as

LD,flood,1=𝔼𝒙r[h(fD,real(D(𝒙)),breal)]+𝔼𝒙g[h(fD,fake(D(𝒙)),bfake)]=𝔼𝒙r[h(log(D(𝒙)),breal)]+𝔼𝒙g[h(log(1D(𝒙)),bfake)]=r(𝒙)[h(log(D(𝒙)),breal)]+g(𝒙)[h(log(1D(𝒙)),bfake)]d𝒙.subscript𝐿𝐷𝑓𝑙𝑜𝑜𝑑1absentsubscript𝔼similar-to𝒙subscript𝑟delimited-[]subscript𝑓𝐷real𝐷𝒙subscript𝑏realsubscript𝔼similar-to𝒙subscript𝑔delimited-[]subscript𝑓𝐷fake𝐷𝒙subscript𝑏fakesubscript𝔼similar-to𝒙subscript𝑟delimited-[]𝐷𝒙subscript𝑏realsubscript𝔼similar-to𝒙subscript𝑔delimited-[]1𝐷𝒙subscript𝑏fakesubscript𝑟𝒙delimited-[]𝐷𝒙subscript𝑏realsubscript𝑔𝒙delimited-[]1𝐷𝒙subscript𝑏faked𝒙\displaystyle\begin{aligned} L_{D,flood,1}=&\mathbb{E}_{\bm{x}\sim\mathbb{P}_{% r}}[h(f_{D,\mathrm{real}}(D(\bm{x})),b_{\,\text{real}})]+\mathbb{E}_{\bm{x}% \sim\mathbb{P}_{g}}[h(f_{D,\mathrm{fake}}(D(\bm{x})),b_{\,\text{fake}})]\\ =&\mathbb{E}_{\bm{x}\sim\mathbb{P}_{r}}[h(-\log(D(\bm{x})),b_{\,\text{real}})]% +\mathbb{E}_{\bm{x}\sim\mathbb{P}_{g}}[h(-\log(1-D(\bm{x})),b_{\,\text{fake}})% ]\\ =&\int\mathbb{P}_{r}(\bm{x})[h(-\log(D(\bm{x})),b_{\,\text{real}})]+\mathbb{P}% _{g}(\bm{x})[h(-\log(1-D(\bm{x})),b_{\,\text{fake}})]\mathrm{d}\bm{x}.\\ \end{aligned}start_ROW start_CELL italic_L start_POSTSUBSCRIPT italic_D , italic_f italic_l italic_o italic_o italic_d , 1 end_POSTSUBSCRIPT = end_CELL start_CELL blackboard_E start_POSTSUBSCRIPT bold_italic_x ∼ blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_h ( italic_f start_POSTSUBSCRIPT italic_D , roman_real end_POSTSUBSCRIPT ( italic_D ( bold_italic_x ) ) , italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT ) ] + blackboard_E start_POSTSUBSCRIPT bold_italic_x ∼ blackboard_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_h ( italic_f start_POSTSUBSCRIPT italic_D , roman_fake end_POSTSUBSCRIPT ( italic_D ( bold_italic_x ) ) , italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT ) ] end_CELL end_ROW start_ROW start_CELL = end_CELL start_CELL blackboard_E start_POSTSUBSCRIPT bold_italic_x ∼ blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_h ( - roman_log ( italic_D ( bold_italic_x ) ) , italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT ) ] + blackboard_E start_POSTSUBSCRIPT bold_italic_x ∼ blackboard_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_h ( - roman_log ( 1 - italic_D ( bold_italic_x ) ) , italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT ) ] end_CELL end_ROW start_ROW start_CELL = end_CELL start_CELL ∫ blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_x ) [ italic_h ( - roman_log ( italic_D ( bold_italic_x ) ) , italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT ) ] + blackboard_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ( bold_italic_x ) [ italic_h ( - roman_log ( 1 - italic_D ( bold_italic_x ) ) , italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT ) ] roman_d bold_italic_x . end_CELL end_ROW (12)

We introduce f(D(𝒙))𝑓𝐷𝒙f(D(\bm{x}))italic_f ( italic_D ( bold_italic_x ) ) to examine the relationship between D𝐷Ditalic_D and LD,flood,1subscript𝐿𝐷𝑓𝑙𝑜𝑜𝑑1L_{D,flood,1}italic_L start_POSTSUBSCRIPT italic_D , italic_f italic_l italic_o italic_o italic_d , 1 end_POSTSUBSCRIPT shown as

f(D(𝒙))=r(𝒙)[h(log(D(𝒙)),breal)]+g(𝒙)[h(log(1D(𝒙)),bfake)].𝑓𝐷𝒙absentsubscript𝑟𝒙delimited-[]𝐷𝒙subscript𝑏realsubscript𝑔𝒙delimited-[]1𝐷𝒙subscript𝑏fake\displaystyle\begin{aligned} f(D(\bm{x}))=&\mathbb{P}_{r}(\bm{x})[h(-\log(D(% \bm{x})),b_{\,\text{real}})]+\mathbb{P}_{g}(\bm{x})[h(-\log(1-D(\bm{x})),b_{\,% \text{fake}})].\end{aligned}start_ROW start_CELL italic_f ( italic_D ( bold_italic_x ) ) = end_CELL start_CELL blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_x ) [ italic_h ( - roman_log ( italic_D ( bold_italic_x ) ) , italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT ) ] + blackboard_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ( bold_italic_x ) [ italic_h ( - roman_log ( 1 - italic_D ( bold_italic_x ) ) , italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT ) ] . end_CELL end_ROW (13)

We demonstrate minimization of f(D(𝒙))𝑓𝐷𝒙f(D(\bm{x}))italic_f ( italic_D ( bold_italic_x ) ) in two cases with respect to whether ebreal+ebfake1superscript𝑒subscript𝑏realsuperscript𝑒subscript𝑏fake1e^{-b_{\,\text{real}}}+e^{-b_{\,\text{fake}}}\leq 1italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT end_POSTSUPERSCRIPT + italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ≤ 1 is satisfied or not.

Case 1. If the flood level satisfies ebreal+ebfake1superscript𝑒subscript𝑏realsuperscript𝑒subscript𝑏fake1e^{-b_{\,\text{real}}}+e^{-b_{\,\text{fake}}}\leq 1italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT end_POSTSUPERSCRIPT + italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ≤ 1, brealsubscript𝑏realb_{\,\text{real}}italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT and bfakesubscript𝑏fakeb_{\,\text{fake}}italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT satisfy

ebreal1ebfakesuperscript𝑒subscript𝑏real1superscript𝑒subscript𝑏fake\displaystyle\begin{aligned} e^{-b_{\,\text{real}}}\leq 1-e^{-b_{\,\text{fake}% }}\end{aligned}start_ROW start_CELL italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ≤ 1 - italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_CELL end_ROW (14)

We divide D(𝒙)(0,1)𝐷𝒙01D(\bm{x})\in(0,1)italic_D ( bold_italic_x ) ∈ ( 0 , 1 ) to three intervals with respect to ebrealsuperscript𝑒subscript𝑏reale^{-b_{\,\text{real}}}italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT end_POSTSUPERSCRIPT and 1ebfake1superscript𝑒subscript𝑏fake1-e^{-b_{\,\text{fake}}}1 - italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT end_POSTSUPERSCRIPT.

Case 1(a). If we assume 0<D(𝒙)ebreal0𝐷𝒙superscript𝑒subscript𝑏real0<D(\bm{x})\leq e^{-b_{\,\text{real}}}0 < italic_D ( bold_italic_x ) ≤ italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, we can transform f(D(𝒙))𝑓𝐷𝒙f(D(\bm{x}))italic_f ( italic_D ( bold_italic_x ) ) as

f(D(𝒙))=r(𝒙)[log(D(𝒙))]+g(𝒙)[log(1D(𝒙))+2bfake].𝑓𝐷𝒙absentsubscript𝑟𝒙delimited-[]𝐷𝒙subscript𝑔𝒙delimited-[]1𝐷𝒙2subscript𝑏fake\displaystyle\begin{aligned} f(D(\bm{x}))=&\mathbb{P}_{r}(\bm{x})[-\log(D(\bm{% x}))]+\mathbb{P}_{g}(\bm{x})[\log(1-D(\bm{x}))+2b_{\,\text{fake}}].\\ \end{aligned}start_ROW start_CELL italic_f ( italic_D ( bold_italic_x ) ) = end_CELL start_CELL blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_x ) [ - roman_log ( italic_D ( bold_italic_x ) ) ] + blackboard_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ( bold_italic_x ) [ roman_log ( 1 - italic_D ( bold_italic_x ) ) + 2 italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT ] . end_CELL end_ROW (15)

Therefore, the derivative with respect to D(𝒙)𝐷𝒙D(\bm{x})italic_D ( bold_italic_x ) is expressed as

f(D(𝒙))=r(𝒙)D(𝒙)g(𝒙)1D(𝒙).superscript𝑓𝐷𝒙absentsubscript𝑟𝒙𝐷𝒙subscript𝑔𝒙1𝐷𝒙\displaystyle\begin{aligned} f^{\prime}(D(\bm{x}))=&-\frac{\mathbb{P}_{r}(\bm{% x})}{D(\bm{x})}-\frac{\mathbb{P}_{g}(\bm{x})}{1-D(\bm{x})}.\end{aligned}start_ROW start_CELL italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_D ( bold_italic_x ) ) = end_CELL start_CELL - divide start_ARG blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_x ) end_ARG start_ARG italic_D ( bold_italic_x ) end_ARG - divide start_ARG blackboard_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ( bold_italic_x ) end_ARG start_ARG 1 - italic_D ( bold_italic_x ) end_ARG . end_CELL end_ROW (16)

We obtain f(D(𝒙))<0superscript𝑓𝐷𝒙0f^{\prime}(D(\bm{x}))<0italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_D ( bold_italic_x ) ) < 0 because of (r(𝒙),g(𝒙))02\{(0,0)}subscript𝑟𝒙subscript𝑔𝒙\superscriptsubscriptabsent0200(\mathbb{P}_{r}(\bm{x}),\mathbb{P}_{g}(\bm{x}))\in\mathbb{R}_{\geq 0}^{2}\ % \backslash\ \{(0,0)\}( blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_x ) , blackboard_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ( bold_italic_x ) ) ∈ blackboard_R start_POSTSUBSCRIPT ≥ 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT \ { ( 0 , 0 ) } and D(𝒙)(0,1)𝐷𝒙01D(\bm{x})\in(0,1)italic_D ( bold_italic_x ) ∈ ( 0 , 1 ). Therefore, f(D(𝒙))𝑓𝐷𝒙f(D(\bm{x}))italic_f ( italic_D ( bold_italic_x ) ) in the interval 0<D(𝒙)ebreal0𝐷𝒙superscript𝑒subscript𝑏real0<D(\bm{x})\leq e^{-b_{\,\text{real}}}0 < italic_D ( bold_italic_x ) ≤ italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT end_POSTSUPERSCRIPT monotonically decrease, and D(𝒙)=ebreal𝐷𝒙superscript𝑒subscript𝑏realD(\bm{x})=e^{-b_{\,\text{real}}}italic_D ( bold_italic_x ) = italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT end_POSTSUPERSCRIPT gives the minimum of f(D(𝒙))𝑓𝐷𝒙f(D(\bm{x}))italic_f ( italic_D ( bold_italic_x ) ) in the interval.

Case 1(b). If we assume ebrealD(𝒙)1ebfakesuperscript𝑒subscript𝑏real𝐷𝒙1superscript𝑒subscript𝑏fakee^{-b_{\,\text{real}}}\leq D(\bm{x})\leq 1-e^{-b_{\,\text{fake}}}italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ≤ italic_D ( bold_italic_x ) ≤ 1 - italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, we can obtain f(D(𝒙))𝑓𝐷𝒙f(D(\bm{x}))italic_f ( italic_D ( bold_italic_x ) ) and f(D(𝒙))superscript𝑓𝐷𝒙f^{\prime}(D(\bm{x}))italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_D ( bold_italic_x ) ) as

f(D(𝒙))=r(𝒙)[log(D(𝒙))+2breal]+g(𝒙)[log(1D(𝒙))+2bfake],f(D(𝒙))=r(𝒙)D(𝒙)g(𝒙)1D(𝒙).𝑓𝐷𝒙absentsubscript𝑟𝒙delimited-[]𝐷𝒙2subscript𝑏realsubscript𝑔𝒙delimited-[]1𝐷𝒙2subscript𝑏fakesuperscript𝑓𝐷𝒙subscript𝑟𝒙𝐷𝒙subscript𝑔𝒙1𝐷𝒙\displaystyle\begin{aligned} f(D(\bm{x}))=&\mathbb{P}_{r}(\bm{x})[\log(D(\bm{x% }))+2b_{\,\text{real}}]+\mathbb{P}_{g}(\bm{x})[\log(1-D(\bm{x}))+2b_{\,\text{% fake}}],\ f^{\prime}(D(\bm{x}))=\frac{\mathbb{P}_{r}(\bm{x})}{D(\bm{x})}-\frac% {\mathbb{P}_{g}(\bm{x})}{1-D(\bm{x})}.\end{aligned}start_ROW start_CELL italic_f ( italic_D ( bold_italic_x ) ) = end_CELL start_CELL blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_x ) [ roman_log ( italic_D ( bold_italic_x ) ) + 2 italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT ] + blackboard_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ( bold_italic_x ) [ roman_log ( 1 - italic_D ( bold_italic_x ) ) + 2 italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT ] , italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_D ( bold_italic_x ) ) = divide start_ARG blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_x ) end_ARG start_ARG italic_D ( bold_italic_x ) end_ARG - divide start_ARG blackboard_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ( bold_italic_x ) end_ARG start_ARG 1 - italic_D ( bold_italic_x ) end_ARG . end_CELL end_ROW (17)

Now, we can obtain D(𝒙)𝐷𝒙D(\bm{x})italic_D ( bold_italic_x ) where f(D(𝒙))=0superscript𝑓𝐷𝒙0f^{\prime}(D(\bm{x}))=0italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_D ( bold_italic_x ) ) = 0 as

D(𝒙)=r(𝒙)r(𝒙)+g(𝒙).𝐷𝒙subscript𝑟𝒙subscript𝑟𝒙subscript𝑔𝒙\displaystyle\begin{aligned} D(\bm{x})=\frac{\mathbb{P}_{r}(\bm{x})}{\mathbb{P% }_{r}(\bm{x})+\mathbb{P}_{g}(\bm{x})}.\end{aligned}start_ROW start_CELL italic_D ( bold_italic_x ) = divide start_ARG blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_x ) end_ARG start_ARG blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_x ) + blackboard_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ( bold_italic_x ) end_ARG . end_CELL end_ROW (18)

Moreover, f(ebreal)superscript𝑓superscript𝑒subscript𝑏realf^{\prime}(e^{-b_{\,\text{real}}})italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) and f(1ebfake)superscript𝑓1superscript𝑒subscript𝑏fakef^{\prime}(1-e^{-b_{\,\text{fake}}})italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( 1 - italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) are expressed as

f(ebreal)=r(𝒙)ebrealg(𝒙)1ebreal=(1ebreal)r(𝒙)ebrealg(𝒙)ebreal(1ebreal)=r(𝒙)ebreal(r(𝒙)+g(𝒙))ebreal(1ebreal),f(1ebfake)=r(𝒙)1ebfakeg(𝒙)ebfake=ebfaker(𝒙)(1ebfake)g(𝒙)ebfake(1ebfake)=r(𝒙)(1ebfake)(r(𝒙)+g(𝒙))ebfake(1ebfake).superscript𝑓superscript𝑒subscript𝑏realabsentsubscript𝑟𝒙superscript𝑒subscript𝑏realsubscript𝑔𝒙1superscript𝑒subscript𝑏real1superscript𝑒subscript𝑏realsubscript𝑟𝒙superscript𝑒subscript𝑏realsubscript𝑔𝒙superscript𝑒subscript𝑏real1superscript𝑒subscript𝑏realsubscript𝑟𝒙superscript𝑒subscript𝑏realsubscript𝑟𝒙subscript𝑔𝒙superscript𝑒subscript𝑏real1superscript𝑒subscript𝑏realsuperscript𝑓1superscript𝑒subscript𝑏fakeabsentsubscript𝑟𝒙1superscript𝑒subscript𝑏fakesubscript𝑔𝒙superscript𝑒subscript𝑏fakesuperscript𝑒subscript𝑏fakesubscript𝑟𝒙1superscript𝑒subscript𝑏fakesubscript𝑔𝒙superscript𝑒subscript𝑏fake1superscript𝑒subscript𝑏fakesubscript𝑟𝒙1superscript𝑒subscript𝑏fakesubscript𝑟𝒙subscript𝑔𝒙superscript𝑒subscript𝑏fake1superscript𝑒subscript𝑏fake\displaystyle\begin{aligned} f^{\prime}(e^{-b_{\,\text{real}}})=&\frac{\mathbb% {P}_{r}(\bm{x})}{e^{-b_{\,\text{real}}}}-\frac{\mathbb{P}_{g}(\bm{x})}{1-e^{-b% _{\,\text{real}}}}=\frac{(1-e^{-b_{\,\text{real}}})\mathbb{P}_{r}(\bm{x})-e^{-% b_{\,\text{real}}}\mathbb{P}_{g}(\bm{x})}{e^{-b_{\,\text{real}}}(1-e^{-b_{\,% \text{real}}})}\\ =&\frac{\mathbb{P}_{r}(\bm{x})-e^{-b_{\,\text{real}}}(\mathbb{P}_{r}(\bm{x})+% \mathbb{P}_{g}(\bm{x}))}{e^{-b_{\,\text{real}}}(1-e^{-b_{\,\text{real}}})},\\ f^{\prime}(1-e^{-b_{\,\text{fake}}})=&\frac{\mathbb{P}_{r}(\bm{x})}{1-e^{-b_{% \,\text{fake}}}}-\frac{\mathbb{P}_{g}(\bm{x})}{e^{-b_{\,\text{fake}}}}=\frac{e% ^{-b_{\,\text{fake}}}\mathbb{P}_{r}(\bm{x})-(1-e^{-b_{\,\text{fake}}})\mathbb{% P}_{g}(\bm{x})}{e^{-b_{\,\text{fake}}}(1-e^{-b_{\,\text{fake}}})}\\ =&\frac{\mathbb{P}_{r}(\bm{x})-(1-e^{-b_{\,\text{fake}}})(\mathbb{P}_{r}(\bm{x% })+\mathbb{P}_{g}(\bm{x}))}{e^{-b_{\,\text{fake}}}(1-e^{-b_{\,\text{fake}}})}.% \\ \end{aligned}start_ROW start_CELL italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) = end_CELL start_CELL divide start_ARG blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_x ) end_ARG start_ARG italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG - divide start_ARG blackboard_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ( bold_italic_x ) end_ARG start_ARG 1 - italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG = divide start_ARG ( 1 - italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_x ) - italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT end_POSTSUPERSCRIPT blackboard_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ( bold_italic_x ) end_ARG start_ARG italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( 1 - italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) end_ARG end_CELL end_ROW start_ROW start_CELL = end_CELL start_CELL divide start_ARG blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_x ) - italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_x ) + blackboard_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ( bold_italic_x ) ) end_ARG start_ARG italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( 1 - italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) end_ARG , end_CELL end_ROW start_ROW start_CELL italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( 1 - italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) = end_CELL start_CELL divide start_ARG blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_x ) end_ARG start_ARG 1 - italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG - divide start_ARG blackboard_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ( bold_italic_x ) end_ARG start_ARG italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG = divide start_ARG italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT end_POSTSUPERSCRIPT blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_x ) - ( 1 - italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) blackboard_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ( bold_italic_x ) end_ARG start_ARG italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( 1 - italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) end_ARG end_CELL end_ROW start_ROW start_CELL = end_CELL start_CELL divide start_ARG blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_x ) - ( 1 - italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) ( blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_x ) + blackboard_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ( bold_italic_x ) ) end_ARG start_ARG italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( 1 - italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) end_ARG . end_CELL end_ROW (19)

If g(𝒙)subscript𝑔𝒙\mathbb{P}_{g}(\bm{x})blackboard_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ( bold_italic_x ) satisfies an inequality

ebreal<r(𝒙)r(𝒙)+g(𝒙)<1ebfake,superscript𝑒subscript𝑏realsubscript𝑟𝒙subscript𝑟𝒙subscript𝑔𝒙1superscript𝑒subscript𝑏fake\displaystyle\begin{aligned} e^{-b_{\,\text{real}}}<\frac{\mathbb{P}_{r}(\bm{x% })}{\mathbb{P}_{r}(\bm{x})+\mathbb{P}_{g}(\bm{x})}<1-e^{-b_{\,\text{fake}}},% \end{aligned}start_ROW start_CELL italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT end_POSTSUPERSCRIPT < divide start_ARG blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_x ) end_ARG start_ARG blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_x ) + blackboard_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ( bold_italic_x ) end_ARG < 1 - italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , end_CELL end_ROW (20)

we obtain f(ebreal)>0superscript𝑓superscript𝑒subscript𝑏real0f^{\prime}(e^{-b_{\,\text{real}}})>0italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) > 0 and f(1ebfake)<0superscript𝑓1superscript𝑒subscript𝑏fake0f^{\prime}(1-e^{-b_{\,\text{fake}}})<0italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( 1 - italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) < 0. Therefore, D(𝒙){1ebfake,ebreal}𝐷𝒙1superscript𝑒subscript𝑏fakesuperscript𝑒subscript𝑏realD(\bm{x})\in\{1-e^{-b_{\,\text{fake}}},e^{-b_{\,\text{real}}}\}italic_D ( bold_italic_x ) ∈ { 1 - italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT end_POSTSUPERSCRIPT } gives the minimum. If g(𝒙)subscript𝑔𝒙\mathbb{P}_{g}(\bm{x})blackboard_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ( bold_italic_x ) does not satisfy the inequality, f(D(𝒙))𝑓𝐷𝒙f(D(\bm{x}))italic_f ( italic_D ( bold_italic_x ) ) monotonically increase or decrease in the interval, and D(𝒙){1ebfake,ebreal}𝐷𝒙1superscript𝑒subscript𝑏fakesuperscript𝑒subscript𝑏realD(\bm{x})\in\{1-e^{-b_{\,\text{fake}}},e^{-b_{\,\text{real}}}\}italic_D ( bold_italic_x ) ∈ { 1 - italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT end_POSTSUPERSCRIPT } gives the minimum.

Case 1(c). If we assume 1ebfakeD(𝒙)<11superscript𝑒subscript𝑏fake𝐷𝒙11-e^{-b_{\,\text{fake}}}\leq D(\bm{x})<11 - italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ≤ italic_D ( bold_italic_x ) < 1, we can obtain f(D(𝒙))𝑓𝐷𝒙f(D(\bm{x}))italic_f ( italic_D ( bold_italic_x ) ) and f(D(𝒙))superscript𝑓𝐷𝒙f^{\prime}(D(\bm{x}))italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_D ( bold_italic_x ) ) as

f(D(𝒙))=r(𝒙)[log(D(𝒙))+2breal]+g(𝒙)[log(1D(𝒙))],f(D(𝒙))=r(𝒙)D(𝒙)+g(𝒙)1D(𝒙).𝑓𝐷𝒙absentsubscript𝑟𝒙delimited-[]𝐷𝒙2subscript𝑏realsubscript𝑔𝒙delimited-[]1𝐷𝒙superscript𝑓𝐷𝒙subscript𝑟𝒙𝐷𝒙subscript𝑔𝒙1𝐷𝒙\displaystyle\begin{aligned} f(D(\bm{x}))=&\mathbb{P}_{r}(\bm{x})[\log(D(\bm{x% }))+2b_{\,\text{real}}]+\mathbb{P}_{g}(\bm{x})[-\log(1-D(\bm{x}))],\ f^{\prime% }(D(\bm{x}))=\frac{\mathbb{P}_{r}(\bm{x})}{D(\bm{x})}+\frac{\mathbb{P}_{g}(\bm% {x})}{1-D(\bm{x})}.\end{aligned}start_ROW start_CELL italic_f ( italic_D ( bold_italic_x ) ) = end_CELL start_CELL blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_x ) [ roman_log ( italic_D ( bold_italic_x ) ) + 2 italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT ] + blackboard_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ( bold_italic_x ) [ - roman_log ( 1 - italic_D ( bold_italic_x ) ) ] , italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_D ( bold_italic_x ) ) = divide start_ARG blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_x ) end_ARG start_ARG italic_D ( bold_italic_x ) end_ARG + divide start_ARG blackboard_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ( bold_italic_x ) end_ARG start_ARG 1 - italic_D ( bold_italic_x ) end_ARG . end_CELL end_ROW (21)

As Case 1(a), we obtain f(D(𝒙))>0superscript𝑓𝐷𝒙0f^{\prime}(D(\bm{x}))>0italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_D ( bold_italic_x ) ) > 0. Therefore, f(D(𝒙))𝑓𝐷𝒙f(D(\bm{x}))italic_f ( italic_D ( bold_italic_x ) ) monotonically increase, and D(𝒙)=1ebfake𝐷𝒙1superscript𝑒subscript𝑏fakeD(\bm{x})=1-e^{-b_{\,\text{fake}}}italic_D ( bold_italic_x ) = 1 - italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT end_POSTSUPERSCRIPT gives the minimum of f(D(𝒙))𝑓𝐷𝒙f(D(\bm{x}))italic_f ( italic_D ( bold_italic_x ) ) in the interval.

Finally, from Case 1(a), 1(b), and 1(c), we can prove Eq. (8).

Case 2. If the flood level satisfies ebreal+ebfake>1superscript𝑒subscript𝑏realsuperscript𝑒subscript𝑏fake1e^{-b_{\,\text{real}}}+e^{-b_{\,\text{fake}}}>1italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT end_POSTSUPERSCRIPT + italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT end_POSTSUPERSCRIPT > 1, brealsubscript𝑏realb_{\,\text{real}}italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT and bfakesubscript𝑏fakeb_{\,\text{fake}}italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT satisfy

1ebfake<ebreal,1superscript𝑒subscript𝑏fakesuperscript𝑒subscript𝑏real\displaystyle\begin{aligned} 1-e^{-b_{\,\text{fake}}}<e^{-b_{\,\text{real}}},% \end{aligned}start_ROW start_CELL 1 - italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT end_POSTSUPERSCRIPT < italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , end_CELL end_ROW (22)

and we can divide (0,1)01(0,1)( 0 , 1 ) to three intervals with respect to 1ebfake1superscript𝑒subscript𝑏fake1-e^{-b_{\,\text{fake}}}1 - italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT end_POSTSUPERSCRIPT and ebrealsuperscript𝑒subscript𝑏reale^{-b_{\,\text{real}}}italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT end_POSTSUPERSCRIPT.

Case 2(a). If we assume 0<D(𝒙)1ebfake0𝐷𝒙1superscript𝑒subscript𝑏fake0<D(\bm{x})\leq 1-e^{-b_{\,\text{fake}}}0 < italic_D ( bold_italic_x ) ≤ 1 - italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, we have

f(D(𝒙))=r(𝒙)[log(D(𝒙))]+g(𝒙)[log(1D(𝒙))+2bfake],f(D(𝒙))=r(𝒙)D(𝒙)g(𝒙)1D(𝒙).𝑓𝐷𝒙absentsubscript𝑟𝒙delimited-[]𝐷𝒙subscript𝑔𝒙delimited-[]1𝐷𝒙2subscript𝑏fakesuperscript𝑓𝐷𝒙subscript𝑟𝒙𝐷𝒙subscript𝑔𝒙1𝐷𝒙\displaystyle\begin{aligned} f(D(\bm{x}))=&\mathbb{P}_{r}(\bm{x})[-\log(D(\bm{% x}))]+\mathbb{P}_{g}(\bm{x})[\log(1-D(\bm{x}))+2b_{\,\text{fake}}],\ f^{\prime% }(D(\bm{x}))=-\frac{\mathbb{P}_{r}(\bm{x})}{D(\bm{x})}-\frac{\mathbb{P}_{g}(% \bm{x})}{1-D(\bm{x})}.\end{aligned}start_ROW start_CELL italic_f ( italic_D ( bold_italic_x ) ) = end_CELL start_CELL blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_x ) [ - roman_log ( italic_D ( bold_italic_x ) ) ] + blackboard_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ( bold_italic_x ) [ roman_log ( 1 - italic_D ( bold_italic_x ) ) + 2 italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT ] , italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_D ( bold_italic_x ) ) = - divide start_ARG blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_x ) end_ARG start_ARG italic_D ( bold_italic_x ) end_ARG - divide start_ARG blackboard_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ( bold_italic_x ) end_ARG start_ARG 1 - italic_D ( bold_italic_x ) end_ARG . end_CELL end_ROW (23)

As Case 1(a), we obtain f(D(𝒙))<0superscript𝑓𝐷𝒙0f^{\prime}(D(\bm{x}))<0italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_D ( bold_italic_x ) ) < 0 in the interval. Therefore, D(𝒙)=1ebfake𝐷𝒙1superscript𝑒subscript𝑏fakeD(\bm{x})=1-e^{-b_{\,\text{fake}}}italic_D ( bold_italic_x ) = 1 - italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT end_POSTSUPERSCRIPT gives the minimum of f(D(𝒙))𝑓𝐷𝒙f(D(\bm{x}))italic_f ( italic_D ( bold_italic_x ) ) in the interval.

Case 2(b). If we assume 1ebfakeD(𝒙)ebreal1superscript𝑒subscript𝑏fake𝐷𝒙superscript𝑒subscript𝑏real1-e^{-b_{\,\text{fake}}}\leq D(\bm{x})\leq e^{-b_{\,\text{real}}}1 - italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ≤ italic_D ( bold_italic_x ) ≤ italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, we have

f(D(𝒙))=r(𝒙)[log(D(𝒙))]+g(𝒙)[log(1D(𝒙))],f(D(𝒙))=r(𝒙)D(𝒙)+g(𝒙)1D(𝒙).𝑓𝐷𝒙absentsubscript𝑟𝒙delimited-[]𝐷𝒙subscript𝑔𝒙delimited-[]1𝐷𝒙superscript𝑓𝐷𝒙subscript𝑟𝒙𝐷𝒙subscript𝑔𝒙1𝐷𝒙\displaystyle\begin{aligned} f(D(\bm{x}))=&\mathbb{P}_{r}(\bm{x})[-\log(D(\bm{% x}))]+\mathbb{P}_{g}(\bm{x})[-\log(1-D(\bm{x}))],\ f^{\prime}(D(\bm{x}))=-% \frac{\mathbb{P}_{r}(\bm{x})}{D(\bm{x})}+\frac{\mathbb{P}_{g}(\bm{x})}{1-D(\bm% {x})}.\end{aligned}start_ROW start_CELL italic_f ( italic_D ( bold_italic_x ) ) = end_CELL start_CELL blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_x ) [ - roman_log ( italic_D ( bold_italic_x ) ) ] + blackboard_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ( bold_italic_x ) [ - roman_log ( 1 - italic_D ( bold_italic_x ) ) ] , italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_D ( bold_italic_x ) ) = - divide start_ARG blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_x ) end_ARG start_ARG italic_D ( bold_italic_x ) end_ARG + divide start_ARG blackboard_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ( bold_italic_x ) end_ARG start_ARG 1 - italic_D ( bold_italic_x ) end_ARG . end_CELL end_ROW (24)

Therefore, D(𝒙)=r(𝒙)r(𝒙)+g(𝒙)𝐷𝒙subscript𝑟𝒙subscript𝑟𝒙subscript𝑔𝒙D(\bm{x})=\displaystyle\frac{\mathbb{P}_{r}(\bm{x})}{\mathbb{P}_{r}(\bm{x})+% \mathbb{P}_{g}(\bm{x})}italic_D ( bold_italic_x ) = divide start_ARG blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_x ) end_ARG start_ARG blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_x ) + blackboard_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ( bold_italic_x ) end_ARG gives f(D(𝒙))=0superscript𝑓𝐷𝒙0f^{\prime}(D(\bm{x}))=0italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_D ( bold_italic_x ) ) = 0 and satisfies 0<D(𝒙)<10𝐷𝒙10<D(\bm{x})<10 < italic_D ( bold_italic_x ) < 1. We can calculate f(1ebfake)superscript𝑓1superscript𝑒subscript𝑏fakef^{\prime}(1-e^{-b_{\,\text{fake}}})italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( 1 - italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) and f(ebreal)superscript𝑓superscript𝑒subscript𝑏realf^{\prime}(e^{-b_{\,\text{real}}})italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) as

f(ebreal)=r(𝒙)ebreal+g(𝒙)1ebreal=(1ebreal)r(𝒙)ebrealg(𝒙)ebreal(1ebreal)=r(𝒙)ebreal(r(𝒙)+g(𝒙))ebreal(1ebreal),f(1ebfake)=r(𝒙)1ebfake+g(𝒙)ebfake=ebfaker(𝒙)(1ebfake)g(𝒙)ebfake(1ebfake)=r(𝒙)(1ebfake)(r(𝒙)+g(𝒙))ebfake(1ebfake).superscript𝑓superscript𝑒subscript𝑏realabsentsubscript𝑟𝒙superscript𝑒subscript𝑏realsubscript𝑔𝒙1superscript𝑒subscript𝑏real1superscript𝑒subscript𝑏realsubscript𝑟𝒙superscript𝑒subscript𝑏realsubscript𝑔𝒙superscript𝑒subscript𝑏real1superscript𝑒subscript𝑏realsubscript𝑟𝒙superscript𝑒subscript𝑏realsubscript𝑟𝒙subscript𝑔𝒙superscript𝑒subscript𝑏real1superscript𝑒subscript𝑏realsuperscript𝑓1superscript𝑒subscript𝑏fakeabsentsubscript𝑟𝒙1superscript𝑒subscript𝑏fakesubscript𝑔𝒙superscript𝑒subscript𝑏fakesuperscript𝑒subscript𝑏fakesubscript𝑟𝒙1superscript𝑒subscript𝑏fakesubscript𝑔𝒙superscript𝑒subscript𝑏fake1superscript𝑒subscript𝑏fakesubscript𝑟𝒙1superscript𝑒subscript𝑏fakesubscript𝑟𝒙subscript𝑔𝒙superscript𝑒subscript𝑏fake1superscript𝑒subscript𝑏fake\displaystyle\begin{aligned} f^{\prime}(e^{-b_{\,\text{real}}})=&-\frac{% \mathbb{P}_{r}(\bm{x})}{e^{-b_{\,\text{real}}}}+\frac{\mathbb{P}_{g}(\bm{x})}{% 1-e^{-b_{\,\text{real}}}}=-\frac{(1-e^{-b_{\,\text{real}}})\mathbb{P}_{r}(\bm{% x})-e^{-b_{\,\text{real}}}\mathbb{P}_{g}(\bm{x})}{e^{-b_{\,\text{real}}}(1-e^{% -b_{\,\text{real}}})}\\ =&-\frac{\mathbb{P}_{r}(\bm{x})-e^{-b_{\,\text{real}}}(\mathbb{P}_{r}(\bm{x})+% \mathbb{P}_{g}(\bm{x}))}{e^{-b_{\,\text{real}}}(1-e^{-b_{\,\text{real}}})},\\ f^{\prime}(1-e^{-b_{\,\text{fake}}})=&-\frac{\mathbb{P}_{r}(\bm{x})}{1-e^{-b_{% \,\text{fake}}}}+\frac{\mathbb{P}_{g}(\bm{x})}{e^{-b_{\,\text{fake}}}}=-\frac{% e^{-b_{\,\text{fake}}}\mathbb{P}_{r}(\bm{x})-(1-e^{-b_{\,\text{fake}}})\mathbb% {P}_{g}(\bm{x})}{e^{-b_{\,\text{fake}}}(1-e^{-b_{\,\text{fake}}})}\\ =&-\frac{\mathbb{P}_{r}(\bm{x})-(1-e^{-b_{\,\text{fake}}})(\mathbb{P}_{r}(\bm{% x})+\mathbb{P}_{g}(\bm{x}))}{e^{-b_{\,\text{fake}}}(1-e^{-b_{\,\text{fake}}})}% .\\ \end{aligned}start_ROW start_CELL italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) = end_CELL start_CELL - divide start_ARG blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_x ) end_ARG start_ARG italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG + divide start_ARG blackboard_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ( bold_italic_x ) end_ARG start_ARG 1 - italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG = - divide start_ARG ( 1 - italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_x ) - italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT end_POSTSUPERSCRIPT blackboard_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ( bold_italic_x ) end_ARG start_ARG italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( 1 - italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) end_ARG end_CELL end_ROW start_ROW start_CELL = end_CELL start_CELL - divide start_ARG blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_x ) - italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_x ) + blackboard_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ( bold_italic_x ) ) end_ARG start_ARG italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( 1 - italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) end_ARG , end_CELL end_ROW start_ROW start_CELL italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( 1 - italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) = end_CELL start_CELL - divide start_ARG blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_x ) end_ARG start_ARG 1 - italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG + divide start_ARG blackboard_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ( bold_italic_x ) end_ARG start_ARG italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG = - divide start_ARG italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT end_POSTSUPERSCRIPT blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_x ) - ( 1 - italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) blackboard_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ( bold_italic_x ) end_ARG start_ARG italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( 1 - italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) end_ARG end_CELL end_ROW start_ROW start_CELL = end_CELL start_CELL - divide start_ARG blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_x ) - ( 1 - italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) ( blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_x ) + blackboard_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ( bold_italic_x ) ) end_ARG start_ARG italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( 1 - italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) end_ARG . end_CELL end_ROW (25)

Now we obtain f(1ebfake)<0superscript𝑓1superscript𝑒subscript𝑏fake0f^{\prime}(1-e^{-b_{\,\text{fake}}})<0italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( 1 - italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) < 0 and f(ebreal)>0superscript𝑓superscript𝑒subscript𝑏real0f^{\prime}(e^{-b_{\,\text{real}}})>0italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) > 0 if g(𝒙)subscript𝑔𝒙\mathbb{P}_{g}(\bm{x})blackboard_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ( bold_italic_x ) satisfies an inequality

1ebfake<r(𝒙)r(𝒙)+g(𝒙)<ebreal.1superscript𝑒subscript𝑏fakesubscript𝑟𝒙subscript𝑟𝒙subscript𝑔𝒙superscript𝑒subscript𝑏real\displaystyle\begin{aligned} 1-e^{-b_{\,\text{fake}}}<\frac{\mathbb{P}_{r}(\bm% {x})}{\mathbb{P}_{r}(\bm{x})+\mathbb{P}_{g}(\bm{x})}<e^{-b_{\,\text{real}}}.% \end{aligned}start_ROW start_CELL 1 - italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT end_POSTSUPERSCRIPT < divide start_ARG blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_x ) end_ARG start_ARG blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_x ) + blackboard_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ( bold_italic_x ) end_ARG < italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT end_POSTSUPERSCRIPT . end_CELL end_ROW (26)

Therefore, D(𝒙)=r(𝒙)r(𝒙)+g(𝒙)𝐷𝒙subscript𝑟𝒙subscript𝑟𝒙subscript𝑔𝒙D(\bm{x})=\displaystyle\frac{\mathbb{P}_{r}(\bm{x})}{\mathbb{P}_{r}(\bm{x})+% \mathbb{P}_{g}(\bm{x})}italic_D ( bold_italic_x ) = divide start_ARG blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_x ) end_ARG start_ARG blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_x ) + blackboard_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ( bold_italic_x ) end_ARG gives the minimum in the interval. On the other hand, if g(𝒙)subscript𝑔𝒙\mathbb{P}_{g}(\bm{x})blackboard_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ( bold_italic_x ) does not satisfy the inequality, f(D(𝒙))𝑓𝐷𝒙f(D(\bm{x}))italic_f ( italic_D ( bold_italic_x ) ) monotonically increase or decrease, and D(𝒙){1ebfake,ebreal}𝐷𝒙1superscript𝑒subscript𝑏fakesuperscript𝑒subscript𝑏realD(\bm{x})\in\{1-e^{-b_{\,\text{fake}}},e^{-b_{\,\text{real}}}\}italic_D ( bold_italic_x ) ∈ { 1 - italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT end_POSTSUPERSCRIPT } gives the minimum in the interval.

Case 2(c). If we assume ebrealD(𝒙)<1superscript𝑒subscript𝑏real𝐷𝒙1e^{-b_{\,\text{real}}}\leq D(\bm{x})<1italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ≤ italic_D ( bold_italic_x ) < 1, we have

f(D(𝒙))=r(𝒙)[log(D(𝒙))+2breal]+g(𝒙)[log(1D(𝒙))],𝑓𝐷𝒙absentsubscript𝑟𝒙delimited-[]𝐷𝒙2subscript𝑏realsubscript𝑔𝒙delimited-[]1𝐷𝒙\displaystyle\begin{aligned} f(D(\bm{x}))=&\mathbb{P}_{r}(\bm{x})[\log(D(\bm{x% }))+2b_{\,\text{real}}]+\mathbb{P}_{g}(\bm{x})[-\log(1-D(\bm{x}))],\end{aligned}start_ROW start_CELL italic_f ( italic_D ( bold_italic_x ) ) = end_CELL start_CELL blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_x ) [ roman_log ( italic_D ( bold_italic_x ) ) + 2 italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT ] + blackboard_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ( bold_italic_x ) [ - roman_log ( 1 - italic_D ( bold_italic_x ) ) ] , end_CELL end_ROW (27)

As Case 1(a) we obtain f(D(𝒙))>0superscript𝑓𝐷𝒙0f^{\prime}(D(\bm{x}))>0italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_D ( bold_italic_x ) ) > 0. Therefore, D(𝒙)=ebreal𝐷𝒙superscript𝑒subscript𝑏realD(\bm{x})=e^{-b_{\,\text{real}}}italic_D ( bold_italic_x ) = italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT end_POSTSUPERSCRIPT gives the minimum of f(D(𝒙))𝑓𝐷𝒙f(D(\bm{x}))italic_f ( italic_D ( bold_italic_x ) ) in the interval.

Finally, from Case 2(a), 2(b), and 2(c), we can prove Eq. (9). ∎

Appendix 0.B Discussions of Theorem 1

We can assume bfake>0subscript𝑏fake0b_{\,\text{fake}}>0italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT > 0 with flooding and the BCE loss because the BCE loss is non-negative. If and only if g(𝒙)subscript𝑔𝒙\mathbb{P}_{g}(\bm{x})blackboard_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ( bold_italic_x ) satisfies the inequality (10), which assumes ebreal+ebfake>1superscript𝑒subscript𝑏realsuperscript𝑒subscript𝑏fake1e^{-b_{\,\text{real}}}+e^{-b_{\,\text{fake}}}>1italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT end_POSTSUPERSCRIPT + italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT end_POSTSUPERSCRIPT > 1,

g(𝒙)((1ebreal1)r(𝒙),(11ebfake1)r(𝒙)),subscript𝑔𝒙1superscript𝑒subscript𝑏real1subscript𝑟𝒙11superscript𝑒subscript𝑏fake1subscript𝑟𝒙\begin{split}\mathbb{P}_{g}(\bm{x})\in((\frac{1}{e^{-b_{\,\text{real}}}}-1)% \mathbb{P}_{r}(\bm{x}),(\frac{1}{1-e^{-b_{\,\text{fake}}}}-1)\mathbb{P}_{r}(% \bm{x})),\end{split}start_ROW start_CELL blackboard_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ( bold_italic_x ) ∈ ( ( divide start_ARG 1 end_ARG start_ARG italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG - 1 ) blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_x ) , ( divide start_ARG 1 end_ARG start_ARG 1 - italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG - 1 ) blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_x ) ) , end_CELL end_ROW (28)

on Supp(r(𝒙))𝑆𝑢𝑝𝑝subscript𝑟𝒙Supp(\mathbb{P}_{r}(\bm{x}))italic_S italic_u italic_p italic_p ( blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_x ) ). We refer to the interval as I(r(𝒙),breal,bfake)𝐼subscript𝑟𝒙subscript𝑏realsubscript𝑏fakeI(\mathbb{P}_{r}(\bm{x}),b_{\,\text{real}},b_{\,\text{fake}})italic_I ( blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_x ) , italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT ) and then obtain

breal,0breal,1I(r(𝒙),breal,0,bfake)I(r(𝒙),breal,1,bfake),bfake,0bfake,1I(r(𝒙),breal,bfake,0)I(r(𝒙),breal,bfake,1).formulae-sequencesubscript𝑏real0subscript𝑏real1𝐼subscript𝑟𝒙subscript𝑏real0subscript𝑏fakesuperset-of-or-equals𝐼subscript𝑟𝒙subscript𝑏real1subscript𝑏fakesubscript𝑏fake0subscript𝑏fake1𝐼subscript𝑟𝒙subscript𝑏realsubscript𝑏fake0superset-of-or-equals𝐼subscript𝑟𝒙subscript𝑏realsubscript𝑏fake1\begin{split}&b_{\,\text{real},0}\leq b_{\,\text{real},1}\Rightarrow I(\mathbb% {P}_{r}(\bm{x}),b_{\,\text{real},0},b_{\,\text{fake}})\supseteq I(\mathbb{P}_{% r}(\bm{x}),b_{\,\text{real},1},b_{\,\text{fake}}),\\ &b_{\,\text{fake},0}\leq b_{\,\text{fake},1}\Rightarrow I(\mathbb{P}_{r}(\bm{x% }),b_{\,\text{real}},b_{\,\text{fake},0})\supseteq I(\mathbb{P}_{r}(\bm{x}),b_% {\,\text{real}},b_{\,\text{fake},1}).\end{split}start_ROW start_CELL end_CELL start_CELL italic_b start_POSTSUBSCRIPT real , 0 end_POSTSUBSCRIPT ≤ italic_b start_POSTSUBSCRIPT real , 1 end_POSTSUBSCRIPT ⇒ italic_I ( blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_x ) , italic_b start_POSTSUBSCRIPT real , 0 end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT ) ⊇ italic_I ( blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_x ) , italic_b start_POSTSUBSCRIPT real , 1 end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT ) , end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL italic_b start_POSTSUBSCRIPT fake , 0 end_POSTSUBSCRIPT ≤ italic_b start_POSTSUBSCRIPT fake , 1 end_POSTSUBSCRIPT ⇒ italic_I ( blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_x ) , italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT fake , 0 end_POSTSUBSCRIPT ) ⊇ italic_I ( blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_x ) , italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT fake , 1 end_POSTSUBSCRIPT ) . end_CELL end_ROW (29)

From this perspective, the discriminator with higher flood level is more difficult to satisfy the inequality (10).

Moreover, Theorem 3.1 indicates that even if either brealsubscript𝑏realb_{\,\text{real}}italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT or bfakesubscript𝑏fakeb_{\,\text{fake}}italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT is greater than log22\log 2roman_log 2, there are some settings (brealsubscript𝑏realb_{\,\text{real}}italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT, bfakesubscript𝑏fakeb_{\,\text{fake}}italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT) which satisfy D*(𝒙)=r(𝒙)r(𝒙)+g(𝒙)superscript𝐷𝒙subscript𝑟𝒙subscript𝑟𝒙subscript𝑔𝒙D^{*}(\bm{x})=\displaystyle\frac{\mathbb{P}_{r}(\bm{x})}{\mathbb{P}_{r}(\bm{x}% )+\mathbb{P}_{g}(\bm{x})}italic_D start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( bold_italic_x ) = divide start_ARG blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_x ) end_ARG start_ARG blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_x ) + blackboard_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ( bold_italic_x ) end_ARG where ebreal+ebfake>1superscript𝑒subscript𝑏realsuperscript𝑒subscript𝑏fake1e^{-b_{\,\text{real}}}+e^{-b_{\,\text{fake}}}>1italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT end_POSTSUPERSCRIPT + italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT end_POSTSUPERSCRIPT > 1 and the inequality (10). It can explain that some training results at Table 11 and 12 (detailed in Supplementary Section 0.E) don’t collapse completely, which is that the average of modes and HQ are not zero, with the BCE loss and the setting that either brealsubscript𝑏realb_{\,\text{real}}italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT or bfakesubscript𝑏fakeb_{\,\text{fake}}italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT is greater than log22\log 2roman_log 2. On the other hand, it is still unclear why the overall performance is poor with such setting . We offer one explanation by using I(r(𝒙),breal,bfake)𝐼subscript𝑟𝒙subscript𝑏realsubscript𝑏fakeI(\mathbb{P}_{r}(\bm{x}),b_{\,\text{real}},b_{\,\text{fake}})italic_I ( blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_x ) , italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT ). The ultimate goal of GANs is to achieve r(𝒙)=g(𝒙)subscript𝑟𝒙subscript𝑔𝒙\mathbb{P}_{r}(\bm{x})=\mathbb{P}_{g}(\bm{x})blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_x ) = blackboard_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ( bold_italic_x ). Therefore, we confirm that I(r(𝒙),breal,bfake)𝐼subscript𝑟𝒙subscript𝑏realsubscript𝑏fakeI(\mathbb{P}_{r}(\bm{x}),b_{\,\text{real}},b_{\,\text{fake}})italic_I ( blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_x ) , italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT ) includes r(𝒙)subscript𝑟𝒙\mathbb{P}_{r}(\bm{x})blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_x ) on Supp(r(𝒙))𝑆𝑢𝑝𝑝subscript𝑟𝒙Supp(\mathbb{P}_{r}(\bm{x}))italic_S italic_u italic_p italic_p ( blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_x ) ),

r(𝒙)=g(𝒙)I(r(𝒙),breal,bfake)((1ebreal1)r(𝒙)<r(𝒙))(r(𝒙)<(11ebfake1)r(𝒙))(1ebreal1<1)(1<11ebfake1)(ebreal<2)(ebfake<2)(breal<log2)(bfake<log2).subscript𝑟𝒙subscript𝑔𝒙𝐼subscript𝑟𝒙subscript𝑏realsubscript𝑏fake1superscript𝑒subscript𝑏real1subscript𝑟𝒙subscript𝑟𝒙subscript𝑟𝒙11superscript𝑒subscript𝑏fake1subscript𝑟𝒙1superscript𝑒subscript𝑏real11111superscript𝑒subscript𝑏fake1superscript𝑒subscript𝑏real2superscript𝑒subscript𝑏fake2subscript𝑏real2subscript𝑏fake2\begin{split}&\mathbb{P}_{r}(\bm{x})=\mathbb{P}_{g}(\bm{x})\in I(\mathbb{P}_{r% }(\bm{x}),b_{\,\text{real}},b_{\,\text{fake}})\\ &\Leftrightarrow((\frac{1}{e^{-b_{\,\text{real}}}}-1)\mathbb{P}_{r}(\bm{x})<% \mathbb{P}_{r}(\bm{x}))\land(\mathbb{P}_{r}(\bm{x})<(\frac{1}{1-e^{-b_{\,\text% {fake}}}}-1)\mathbb{P}_{r}(\bm{x}))\\ &\Leftrightarrow(\frac{1}{e^{-b_{\,\text{real}}}}-1<1)\land(1<\frac{1}{1-e^{-b% _{\,\text{fake}}}}-1)\\ &\Leftrightarrow(e^{b_{\,\text{real}}}<2)\land(e^{b_{\,\text{fake}}}<2)\\ &\Leftrightarrow(b_{\,\text{real}}<\log 2)\land(b_{\,\text{fake}}<\log 2).\end% {split}start_ROW start_CELL end_CELL start_CELL blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_x ) = blackboard_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ( bold_italic_x ) ∈ italic_I ( blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_x ) , italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ⇔ ( ( divide start_ARG 1 end_ARG start_ARG italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG - 1 ) blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_x ) < blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_x ) ) ∧ ( blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_x ) < ( divide start_ARG 1 end_ARG start_ARG 1 - italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG - 1 ) blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_x ) ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ⇔ ( divide start_ARG 1 end_ARG start_ARG italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG - 1 < 1 ) ∧ ( 1 < divide start_ARG 1 end_ARG start_ARG 1 - italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG - 1 ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ⇔ ( italic_e start_POSTSUPERSCRIPT italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT end_POSTSUPERSCRIPT < 2 ) ∧ ( italic_e start_POSTSUPERSCRIPT italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT end_POSTSUPERSCRIPT < 2 ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ⇔ ( italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT < roman_log 2 ) ∧ ( italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT < roman_log 2 ) . end_CELL end_ROW (30)

It indicates that the appropriate flood levels brealsubscript𝑏realb_{\,\text{real}}italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT and bfakesubscript𝑏fakeb_{\,\text{fake}}italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT should be less than LDopt,realsubscript𝐿subscript𝐷optrealL_{D_{\mathrm{opt}},\mathrm{real}}italic_L start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT , roman_real end_POSTSUBSCRIPT and LDopt,fakesubscript𝐿subscript𝐷optfakeL_{D_{\mathrm{opt}},\mathrm{fake}}italic_L start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT , roman_fake end_POSTSUBSCRIPT, and the training does not converge to r(𝒙)=g(𝒙)subscript𝑟𝒙subscript𝑔𝒙\mathbb{P}_{r}(\bm{x})=\mathbb{P}_{g}(\bm{x})blackboard_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( bold_italic_x ) = blackboard_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ( bold_italic_x ) if the flood level does not follow the appropriate setting.

Appendix 0.C Code Implementation

We mainly referred to the following codes and customized them to conduct experiments with various losses and models.

Appendix 0.D Details of Experiments

0.D.1 Flood level settings

According to Hypotheses 1 and 2, we consider that LDopt,realsubscript𝐿subscript𝐷optrealL_{D_{\mathrm{opt}},\mathrm{real}}italic_L start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT , roman_real end_POSTSUBSCRIPT, LDopt,fakesubscript𝐿subscript𝐷optfakeL_{D_{\mathrm{opt}},\mathrm{fake}}italic_L start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT , roman_fake end_POSTSUBSCRIPT, and LDoptsubscript𝐿subscript𝐷optL_{D_{\mathrm{opt}}}italic_L start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT end_POSTSUBSCRIPT, which is the discriminator’s loss at the theoretical convergence, is crucial on the flood level b𝑏bitalic_b setting. To explore the optimal setting of the flood level, we verified the flood levels in Table 10. For the flood level brealsubscript𝑏realb_{\,\text{real}}italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT in the flooding types 1 and 2, we tried five different settings of the flood level in Table 10, and we use bfakesubscript𝑏fakeb_{\,\text{fake}}italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT calculated from Table 10 which replace LDopt,realsubscript𝐿subscript𝐷optrealL_{D_{\mathrm{opt}},\mathrm{real}}italic_L start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT , roman_real end_POSTSUBSCRIPT to LDopt,fakesubscript𝐿subscript𝐷optfakeL_{D_{\mathrm{opt}},\mathrm{fake}}italic_L start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT , roman_fake end_POSTSUBSCRIPT. We examined the flood level ballsubscript𝑏allb_{\,\text{all}}italic_b start_POSTSUBSCRIPT all end_POSTSUBSCRIPT in flooding type 3 for five different flood level settings in Table 10. Note that for the Wasserstein loss, we cannot use the flood level strategy as other adversarial losses like mLDopt𝑚subscript𝐿subscript𝐷optm\cdot L_{D_{\mathrm{opt}}}italic_m ⋅ italic_L start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT end_POSTSUBSCRIPT because of LDopt=0subscript𝐿subscript𝐷opt0L_{D_{\mathrm{opt}}}=0italic_L start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT end_POSTSUBSCRIPT = 0 for the Wasserstein loss. Therefore, we set an appropriate value while preserve the flood level Opt equals to 00 (=LDoptabsentsubscript𝐿subscript𝐷opt=L_{D_{\mathrm{opt}}}= italic_L start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT end_POSTSUBSCRIPT).

Flooding type Small Medium Near Opt Opt Over Opt
BCE loss and Least squares loss
1, 2 0.1LDopt,real0.1subscript𝐿subscript𝐷optreal0.1\cdot L_{D_{\mathrm{opt}},\mathrm{real}}0.1 ⋅ italic_L start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT , roman_real end_POSTSUBSCRIPT 0.5LDopt,real0.5subscript𝐿subscript𝐷optreal0.5\cdot L_{D_{\mathrm{opt}},\mathrm{real}}0.5 ⋅ italic_L start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT , roman_real end_POSTSUBSCRIPT 0.9LDopt,real0.9subscript𝐿subscript𝐷optreal0.9\cdot L_{D_{\mathrm{opt}},\mathrm{real}}0.9 ⋅ italic_L start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT , roman_real end_POSTSUBSCRIPT LDopt,realsubscript𝐿subscript𝐷optrealL_{D_{\mathrm{opt}},\mathrm{real}}italic_L start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT , roman_real end_POSTSUBSCRIPT 1.5LDopt,real1.5subscript𝐿subscript𝐷optreal1.5\cdot L_{D_{\mathrm{opt}},\mathrm{real}}1.5 ⋅ italic_L start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT , roman_real end_POSTSUBSCRIPT
3 0.1LDopt0.1subscript𝐿subscript𝐷opt0.1\cdot L_{D_{\mathrm{opt}}}0.1 ⋅ italic_L start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT end_POSTSUBSCRIPT 0.5LDopt0.5subscript𝐿subscript𝐷opt0.5\cdot L_{D_{\mathrm{opt}}}0.5 ⋅ italic_L start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT end_POSTSUBSCRIPT 0.9LDopt0.9subscript𝐿subscript𝐷opt0.9\cdot L_{D_{\mathrm{opt}}}0.9 ⋅ italic_L start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT end_POSTSUBSCRIPT LDoptsubscript𝐿subscript𝐷optL_{D_{\mathrm{opt}}}italic_L start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT end_POSTSUBSCRIPT 1.5LDopt1.5subscript𝐿subscript𝐷opt1.5\cdot L_{D_{\mathrm{opt}}}1.5 ⋅ italic_L start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT end_POSTSUBSCRIPT
Hinge loss
1, 2 0.05LDopt0.05subscript𝐿subscript𝐷opt0.05\cdot L_{D_{\mathrm{opt}}}0.05 ⋅ italic_L start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT end_POSTSUBSCRIPT 0.25LDopt0.25subscript𝐿subscript𝐷opt0.25\cdot L_{D_{\mathrm{opt}}}0.25 ⋅ italic_L start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT end_POSTSUBSCRIPT 0.45LDopt0.45subscript𝐿subscript𝐷opt0.45\cdot L_{D_{\mathrm{opt}}}0.45 ⋅ italic_L start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT end_POSTSUBSCRIPT 0.5LDopt0.5subscript𝐿subscript𝐷opt0.5\cdot L_{D_{\mathrm{opt}}}0.5 ⋅ italic_L start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT end_POSTSUBSCRIPT 0.75LDopt0.75subscript𝐿subscript𝐷opt0.75\cdot L_{D_{\mathrm{opt}}}0.75 ⋅ italic_L start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT end_POSTSUBSCRIPT
3 0.1LDopt0.1subscript𝐿subscript𝐷opt0.1\cdot L_{D_{\mathrm{opt}}}0.1 ⋅ italic_L start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT end_POSTSUBSCRIPT 0.5LDopt0.5subscript𝐿subscript𝐷opt0.5\cdot L_{D_{\mathrm{opt}}}0.5 ⋅ italic_L start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT end_POSTSUBSCRIPT 0.9LDopt0.9subscript𝐿subscript𝐷opt0.9\cdot L_{D_{\mathrm{opt}}}0.9 ⋅ italic_L start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT end_POSTSUBSCRIPT LDoptsubscript𝐿subscript𝐷optL_{D_{\mathrm{opt}}}italic_L start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT end_POSTSUBSCRIPT 1.5LDopt1.5subscript𝐿subscript𝐷opt1.5\cdot L_{D_{\mathrm{opt}}}1.5 ⋅ italic_L start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT end_POSTSUBSCRIPT
Wasserstein loss
1,2 0.500.50-0.50- 0.50 0.250.25-0.25- 0.25 0.050.05-0.05- 0.05 0.00.00.00.0 0.250.250.250.25
3 1.01.0-1.0- 1.0 0.50.5-0.5- 0.5 0.10.1-0.1- 0.1 0.00.00.00.0 0.50.50.50.5
Table 10: Flood level settings for pairs of adversarial losses and flooding type. LDopt,realsubscript𝐿subscript𝐷optrealL_{D_{\mathrm{opt}},\mathrm{real}}italic_L start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT , roman_real end_POSTSUBSCRIPT is the discriminator’s loss for real data at the theoretical convergence, and LDoptsubscript𝐿subscript𝐷optL_{D_{\mathrm{opt}}}italic_L start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT end_POSTSUBSCRIPT is the sum of the discriminator’s loss for real and generated data at the theoretical convergence. When we used flooding for generated data, we replaced LDopt,realsubscript𝐿subscript𝐷optrealL_{D_{\mathrm{opt}},\mathrm{real}}italic_L start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT , roman_real end_POSTSUBSCRIPT with LDopt,fakesubscript𝐿subscript𝐷optfakeL_{D_{\mathrm{opt}},\mathrm{fake}}italic_L start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT , roman_fake end_POSTSUBSCRIPT.

0.D.2 Implementation

Synthetic Dataset To examine the effect of flooding for training GANs, we used the ring of 2D Gaussian dataset (2D Ring) as previous research [21, 7, 10, 34, 22]. The dataset is sampled from the distribution that is composed of eight Gaussian components with the same standard deviation σ𝜎\sigmaitalic_σ, arranged in a circular pattern, as Figure 2 (c) (original). For each iteration, we sampled training data from the distribution. We use multi-layer perceptron as the generator and discriminator. After training, we evaluated the variety and quality of generated samples with ‘modes’ and ‘high quality (HQ),’ proposed in [31]. We sampled 2,500 generated samples, then we counted modes as the number of the center of the Gaussian components that have a sample located within 3σ𝜎\sigmaitalic_σ in L2subscript𝐿2L_{2}italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT distance. Furthermore, we calculated HQ as the ratio of the samples that have a center located within 3σ𝜎\sigmaitalic_σ. For instance, if we use 2D Ring, it holds that modes 8absent8\leq 8≤ 8 and HQ 1absent1\leq 1≤ 1. In order to confirm how much our method prevents mode collapse, we consider higher modes as better performance. If the modes are same, we consider one with higher HQ as better performance.

DCGAN We used unconditional DCGAN [27] to evaluate the performance on image generation. We tried generating CIFAR10, CIFAR100 at 32×\times×32, STL10 at 64×\times×64, CelebA at both 64×\times×64 and 128×\times×128. We followed Radford et al. [27] for batch normalization layers and Miyato et al. [24] for spectral normalization layers. When adding the gradient penalty, the implementation details, such as loss weights, also followed the research of Gulrajani et al. [10] When we used CDCGAN, we split the first convolution layer of DCGAN to two convolution layers for the input noise and the class label. After that, we concatenated output of the two convolution layers. We used the same layers as DCGAN for the second and the following layers, and gave the second layer the concatenated output as input. The batch size was 128, the learning rate was 0.0002, and Adam optimizer (β=(0.5,0.999)𝛽0.50.999\beta=(0.5,0.999)italic_β = ( 0.5 , 0.999 )) was employed. The training used one GPU for 100,000 iterations. We conducted each experiment five times, and the generated images were evaluated using Fréchet Inception Distance (FID) [11].

Domain adaptation When we adapt flooding for ADDA [35], DANN [8], and WDGRL [29], we regard features from source domain as real distribution and one from target domain as generated distribution. Note that the experimental results without flooding have deteriorated compared to the officially announced scores in the page as shown in Section 0.C. However, the source code even with no changes caused it, and there are issues about the reproducibility of other experiments at the repository page. Therefore, we regarded that the score is not an average but the best score.

Large model We conducted experiments using StarGAN V2 [5] to investigate the effect of flooding in the generation of larger images. We followed the author’s implementation and used CelebAHQ. The training used two GPUs for 100,000 iterations. We conducted each experiment five times and evaluated the generated images with FID and LPIPS [41].

Appendix 0.E Additional synthetic dataset experiments

We verified the effect of flooding in Section 4. In this section, We conducted a further experiments with synthetic dataset to explore the potential effects.

0.E.1 Flooding with various combination of brealsubscript𝑏realb_{\,\text{real}}italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT and bfakesubscript𝑏fakeb_{\,\text{fake}}italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT

brealabsentsubscript𝑏real\downarrow b_{\,\text{real}}↓ italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT bfakesubscript𝑏fakeabsentb_{\,\text{fake}}\rightarrowitalic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT → w/o flooding Small Medium Near Opt Opt Over Opt
w/o flooding 4.8 (1.2) 5.0 (2.6) 8.0 (0.0) 8.0 (0.0) 8.0 (0.0) 8.0 (0.0)
Small 2.4 (2.1) 6.6 (0.8) 8.0 (0.0) 8.0 (0.0) 8.0 (0.0) 8.0 (0.0)
Medium 4.0 (1.8) 5.6 (0.5) 7.8 (0.4) 8.0 (0.0) 8.0 (0.0) 8.0 (0.0)
Near Opt 2.2 (1.3) 5.0 (0.0) 5.6 (0.5) 7.0 (0.0) 7.8 (0.4) 0.0 (0.0)
Opt 2.0 (1.7) 4.2 (0.4) 5.0 (0.6) 6.8 (0.4) 2.0 (0.6) 0.0 (0.0)
Over Opt 1.4 (0.8) 3.6 (0.8) 5.0 (0.6) 0.0 (0.0) 0.0 (0.0) 0.0 (0.0)
Table 11: Average (standard deviation) of modes with BCE loss with various flooding setting. We use BCE loss and LD,flood,1subscript𝐿𝐷flood1L_{D,\text{flood},1}italic_L start_POSTSUBSCRIPT italic_D , flood , 1 end_POSTSUBSCRIPT. We colorize value with blue if the brealsubscript𝑏realb_{\,\text{real}}italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT and bfakesubscript𝑏fakeb_{\,\text{fake}}italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT don’t satisfy 1<ebreal+ebfake1superscript𝑒subscript𝑏realsuperscript𝑒subscript𝑏fake1<e^{-b_{\,\text{real}}}+e^{-b_{\,\text{fake}}}1 < italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT end_POSTSUPERSCRIPT + italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT end_POSTSUPERSCRIPT.
brealabsentsubscript𝑏real\downarrow b_{\,\text{real}}↓ italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT bfakesubscript𝑏fakeabsentb_{\,\text{fake}}\rightarrowitalic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT → w/o flooding Small Medium Near Opt Opt Over Opt
w/o flooding 0.90 (0.11) 0.72 (0.36) 0.89 (0.05) 0.82 (0.10) 0.87 (0.09) 0.51 (0.05)
Small 0.53 (0.43) 0.94 (0.03) 0.86 (0.05) 0.87 (0.06) 0.88 (0.08) 0.49 (0.12)
Medium 0.77 (0.13) 0.87 (0.05) 0.87 (0.07) 0.82 (0.06) 0.82 (0.14) 0.50 (0.09)
Near Opt 0.69 (0.37) 0.91 (0.04) 0.84 (0.07) 0.90 (0.03) 0.83 (0.08) 0.00 (0.00)
Opt 0.53 (0.44) 0.90 (0.06) 0.87 (0.09) 0.80 (0.12) 0.00 (0.00) 0.00 (0.00)
Over Opt 0.71 (0.36) 0.86 (0.12) 0.74 (0.13) 0.00 (0.00) 0.00 (0.00) 0.00 (0.00)
Table 12: Average (standard deviation) of HQ with BCE loss with various flooding setting. We use BCE loss and LD,flood,1subscript𝐿𝐷flood1L_{D,\text{flood},1}italic_L start_POSTSUBSCRIPT italic_D , flood , 1 end_POSTSUBSCRIPT. We colorize value with blue if the brealsubscript𝑏realb_{\,\text{real}}italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT and bfakesubscript𝑏fakeb_{\,\text{fake}}italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT don’t satisfy 1<ebreal+ebfake1superscript𝑒subscript𝑏realsuperscript𝑒subscript𝑏fake1<e^{-b_{\,\text{real}}}+e^{-b_{\,\text{fake}}}1 < italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT end_POSTSUPERSCRIPT + italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT end_POSTSUPERSCRIPT.
brealabsentsubscript𝑏real\downarrow b_{\,\text{real}}↓ italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT bfakesubscript𝑏fakeabsentb_{\,\text{fake}}\rightarrowitalic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT → w/o flooding Small Medium Near Opt Opt Over Opt
w/o flooding - - - - - -
Small - 1.87 1.64 1.47 1.43 1.29
Medium - 1.64 1.41 1.24 1.21 1.06
Near Opt - 1.47 1.24 1.07 1.04 0.89
Opt - 1.43 1.21 1.04 1.00 0.85
Over Opt - 1.29 1.06 0.89 0.85 0.71
Table 13: Table of ebreal+ebfakesuperscript𝑒subscript𝑏realsuperscript𝑒subscript𝑏fakee^{-b_{\,\text{real}}}+e^{-b_{\,\text{fake}}}italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT end_POSTSUPERSCRIPT + italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT end_POSTSUPERSCRIPT of BCE loss. We colorize value with blue if the brealsubscript𝑏realb_{\,\text{real}}italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT and bfakesubscript𝑏fakeb_{\,\text{fake}}italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT don’t satisfy 1<ebreal+ebfake1superscript𝑒subscript𝑏realsuperscript𝑒subscript𝑏fake1<e^{-b_{\,\text{real}}}+e^{-b_{\,\text{fake}}}1 < italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT end_POSTSUPERSCRIPT + italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT end_POSTSUPERSCRIPT.

We investigated the effect of flooding for GAN training with the condition breal=bfakesubscript𝑏realsubscript𝑏fakeb_{\,\text{real}}=b_{\,\text{fake}}italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT = italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT or one-sided flooding in Section 4.2 and 4.4. In this section, we conducted experiments with brealsubscript𝑏realb_{\,\text{real}}italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT and bfakesubscript𝑏fakeb_{\,\text{fake}}italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT without such condition. We assigned the six flood level settings (w/o flooding, Small, Medium, Near Opt, Opt, and Over Opt) to brealsubscript𝑏realb_{\,\text{real}}italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT and bfakesubscript𝑏fakeb_{\,\text{fake}}italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT, and we conducted experiments with the 6×\times×6 combinations of brealsubscript𝑏realb_{\,\text{real}}italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT and bfakesubscript𝑏fakeb_{\,\text{fake}}italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT.

We show the experimental results of modes on Table 11 and HQ on Table 12. First, one-sided (fake) flooding (brealsubscript𝑏realb_{\,\text{real}}italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT: w/o flooding, bfakesubscript𝑏fakeb_{\,\text{fake}}italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT: Medium) shows the best performance. It supports that the discriminator overfits the generated data with synthetic dataset. Furthermore, it is noteworthy that if both brealsubscript𝑏realb_{\,\text{real}}italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT and bfakesubscript𝑏fakeb_{\,\text{fake}}italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT exceed Opt, the training completely collapses, which is that the average of modes and HQ are zero. On the other hand, if either of them exceeds Opt, some results does not completely collapse. We give one explanation for the difference by referring to the inequality ebreal+ebfake1superscript𝑒subscript𝑏realsuperscript𝑒subscript𝑏fake1e^{-b_{\,\text{real}}}+e^{-b_{\,\text{fake}}}\leq 1italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT end_POSTSUPERSCRIPT + italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ≤ 1 in Theorem 3.1. We shows the values of ebreal+ebfakesuperscript𝑒subscript𝑏realsuperscript𝑒subscript𝑏fakee^{-b_{\,\text{real}}}+e^{-b_{\,\text{fake}}}italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT end_POSTSUPERSCRIPT + italic_e start_POSTSUPERSCRIPT - italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT end_POSTSUPERSCRIPT for each combination of brealsubscript𝑏realb_{\,\text{real}}italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT and bfakesubscript𝑏fakeb_{\,\text{fake}}italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT in Table 13. Next, based on these values, we colored tables 11 and 12 blue where the settings that did not satisfy the inequality. Consequently, the settings that did not satisfy the inequality corresponded to those where the training completely collapsed, it supports the arguments of Theorem 3.1. On the other hand, if either of brealsubscript𝑏realb_{\,\text{real}}italic_b start_POSTSUBSCRIPT real end_POSTSUBSCRIPT and bfakesubscript𝑏fakeb_{\,\text{fake}}italic_b start_POSTSUBSCRIPT fake end_POSTSUBSCRIPT exceeds Opt, the performance is still low performance in modes or HQ. It supports the arguments of Supplementary Section 0.B.

0.E.2 Change of flooding type with various adversarial losses

We verified the relation of flooding types (LD,flood,1subscript𝐿𝐷flood1L_{D,\text{flood},1}italic_L start_POSTSUBSCRIPT italic_D , flood , 1 end_POSTSUBSCRIPT, LD,flood,2subscript𝐿𝐷flood2L_{D,\text{flood},2}italic_L start_POSTSUBSCRIPT italic_D , flood , 2 end_POSTSUBSCRIPT, and LD,flood,3subscript𝐿𝐷flood3L_{D,\text{flood},3}italic_L start_POSTSUBSCRIPT italic_D , flood , 3 end_POSTSUBSCRIPT) and the performance with BCE loss in Section 4.2. In this section, we investigated the best flooding type with adversarial losses other than BCE loss. The flood level setting is provided on Table 10.

The results are shown in Table 14. With all adversarial losses LD,flood,1subscript𝐿𝐷flood1L_{D,\text{flood},1}italic_L start_POSTSUBSCRIPT italic_D , flood , 1 end_POSTSUBSCRIPT flood level Medium achieved the best performance. For adversarial losses other than the Wasserstein loss, the upper and lower bounds for losses at the theoretical convergence LDopt,realsubscript𝐿subscript𝐷optrealL_{D_{\mathrm{opt}},\mathrm{real}}italic_L start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT , roman_real end_POSTSUBSCRIPT and LDopt,fakesubscript𝐿subscript𝐷optfakeL_{D_{\mathrm{opt}},\mathrm{fake}}italic_L start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT , roman_fake end_POSTSUBSCRIPT are somewhat fixed. Therefore, we discovered an empirical rule that we should apply flooding with LD,flood,1subscript𝐿𝐷flood1L_{D,\text{flood},1}italic_L start_POSTSUBSCRIPT italic_D , flood , 1 end_POSTSUBSCRIPT and the flood level within a range that is not too close to the upper and lower bounds.

Flooding type Eval w/o flooding Small Medium Near Opt Opt Over Opt
BCE Loss
LD,flood,1subscript𝐿𝐷flood1L_{D,\text{flood},1}italic_L start_POSTSUBSCRIPT italic_D , flood , 1 end_POSTSUBSCRIPT Modes 4.8 (1.2) 6.6 (0.8) 7.8 (0.4) 7.0 (0.0) 2.0 (0.6) 0.0 (0.0)
HQ 0.90 (0.11) 0.94 (0.03) 0.87 (0.07) 0.90 (0.03) 0.00 (0.00) 0.00 (0.00)
LD,flood,2subscript𝐿𝐷flood2L_{D,\text{flood},2}italic_L start_POSTSUBSCRIPT italic_D , flood , 2 end_POSTSUBSCRIPT Modes - 4.0 (2.0) 4.2 (1.3) 7.0 (0.6) 0.2 (0.4) 0.0 (0.0)
HQ - 0.65 (0.35) 0.75 (0.18) 0.28 (0.18) 0.00 (0.00) 0.00 (0.00)
LD,flood,3subscript𝐿𝐷flood3L_{D,\text{flood},3}italic_L start_POSTSUBSCRIPT italic_D , flood , 3 end_POSTSUBSCRIPT Modes - 4.0 (2.3) 5.2 (1.9) 7.2 (0.4) 2.8 (1.6) 0.0 (0.0)
HQ - 0.68 (0.35) 0.91 (0.06) 0.24 (0.19) 0.01 (0.01) 0.00 (0.00)
Hinge Loss
LD,flood,1subscript𝐿𝐷flood1L_{D,\text{flood},1}italic_L start_POSTSUBSCRIPT italic_D , flood , 1 end_POSTSUBSCRIPT Modes 7.4 (0.8) 6.6 (0.8) 8.0 (0.0) 7.4 (0.8) 0.6 (0.5) 0.0 (0.0)
HQ 0.83 (0.13) 0.73 (0.20) 0.78 (0.04) 0.81 (0.07) 0.00 (0.00) 0.00 (0.00)
LD,flood,2subscript𝐿𝐷flood2L_{D,\text{flood},2}italic_L start_POSTSUBSCRIPT italic_D , flood , 2 end_POSTSUBSCRIPT Modes - 7.2 (0.4) 6.6 (0.5) 7.6 (0.5) 0.0 (0.0) 0.0 (0.0)
HQ - 0.85 (0.10) 0.83 (0.06) 0.33 (0.16) 0.00 (0.00) 0.00 (0.00)
LD,flood,3subscript𝐿𝐷flood3L_{D,\text{flood},3}italic_L start_POSTSUBSCRIPT italic_D , flood , 3 end_POSTSUBSCRIPT Modes - 5.8 (0.7) 7.6 (0.5) 7.6 (0.5) 2.4 (0.8) 0.0 (0.0)
HQ - 0.90 (0.03) 0.79 (0.12) 0.53 (0.31) 0.00 (0.00) 0.00 (0.00)
LS Loss
LD,flood,1subscript𝐿𝐷flood1L_{D,\text{flood},1}italic_L start_POSTSUBSCRIPT italic_D , flood , 1 end_POSTSUBSCRIPT Modes 6.6 (0.8) 6.8 (0.4) 7.8 (0.4) 7.4 (0.5) 0.0 (0.0) 0.0 (0.0)
HQ 0.80 (0.09) 0.79 (0.09) 0.90 (0.02) 0.78 (0.07) 0.00 (0.00) 0.00 (0.00)
LD,flood,2subscript𝐿𝐷flood2L_{D,\text{flood},2}italic_L start_POSTSUBSCRIPT italic_D , flood , 2 end_POSTSUBSCRIPT Modes - 7.0 (0.0) 7.2 (0.7) 6.8 (0.4) 0.6 (0.8) 0.2 (0.4)
HQ - 0.82 (0.07) 0.79 (0.13) 0.44 (0.14) 0.00 (0.00) 0.00 (0.00)
LD,flood,3subscript𝐿𝐷flood3L_{D,\text{flood},3}italic_L start_POSTSUBSCRIPT italic_D , flood , 3 end_POSTSUBSCRIPT Modes - 6.3 (0.8) 6.4 (0.5) 6.8 (0.4) 2.8 (1.5) 0.0 (0.0)
HQ - 0.87 (0.04) 0.80 (0.13) 0.40 (0.11) 0.01 (0.01) 0.00 (0.00)
Wasserstein loss
LD,flood,1subscript𝐿𝐷flood1L_{D,\text{flood},1}italic_L start_POSTSUBSCRIPT italic_D , flood , 1 end_POSTSUBSCRIPT Modes 8.0 (0.0) 8.0 (0.0) 8.0 (0.0) 0.0 (0.0) 0.0 (0.0) 0.0 (0.0)
HQ 0.93 (0.01) 0.95 (0.01) 0.95 (0.01) 0.00 (0.00) 0.00 (0.00) 0.00 (0.00)
LD,flood,2subscript𝐿𝐷flood2L_{D,\text{flood},2}italic_L start_POSTSUBSCRIPT italic_D , flood , 2 end_POSTSUBSCRIPT Modes - 8.0 (0.0) 8.0 (0.0) 0.0 (0.0) 0.0 (0.0) 0.0 (0.0)
HQ - 0.93 (0.02) 0.94 (0.02) 0.00 (0.00) 0.00 (0.00) 0.00 (0.00)
LD,flood,3subscript𝐿𝐷flood3L_{D,\text{flood},3}italic_L start_POSTSUBSCRIPT italic_D , flood , 3 end_POSTSUBSCRIPT Modes - 8.0 (0.0) 7.6 (0.8) 8.0 (0.0) 0.8 (1.6) 0.0 (0.0)
HQ - 0.90 (0.06) 0.91 (0.10) 0.88 (0.06) 0.02 (0.05) 0.00 (0.00)
Table 14: Average (standard deviation) of modes and HQ with various adversarial losses, different flooding types, and flood levels.
Flooding for G𝐺Gitalic_G (\downarrow) and D𝐷Ditalic_D (\rightarrow) Eval w/o flooding Two-sided One-sided real One-sided fake
w/o flooding Modes 4.8 (1.2) 7.8 (0.4) 4.0 (1.8) 8.0 (0.0)
HQ 0.90 (0.11) 0.87 (0.07) 0.77 (0.13) 0.89 (0.08)
With flooding Modes 3.2 (2.6) 7.6 (0.5) 3.8 (0.7) 8.0 (0.0)
HQ 0.46 (0.39) 0.89 (0.06) 0.83 (0.12) 0.89 (0.07)
Table 15: Average (standard deviation) of modes and HQ with or without flooding for the generator G𝐺Gitalic_G (vertical axis). We conducted experiments with BCE loss, flood level Medium, and flooding for the discriminator D𝐷Ditalic_D (horizontal axis).

0.E.3 Flooding for the generator’s loss

We verified the effect of flooding for the discriminator in previous sections because the discriminator often cause the instability. In this section, we investigated the performance when flooding is applied to the generator. As flooding for the discriminator’s loss with flooding type LD,flood,1subscript𝐿𝐷flood1L_{D,\text{flood},1}italic_L start_POSTSUBSCRIPT italic_D , flood , 1 end_POSTSUBSCRIPT, we adopt the following recalculation function with flood level bGsubscript𝑏𝐺b_{\,G}italic_b start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT as

LG,flood,1=𝔼xg[h(fG(D(𝒙)),bG)].subscript𝐿𝐺flood1subscript𝔼similar-to𝑥subscript𝑔delimited-[]subscript𝑓𝐺𝐷𝒙subscript𝑏G\begin{split}L_{G,\text{flood},1}=&\mathbb{E}_{x\sim\mathbb{P}_{g}}[h(f_{G}(D(% \bm{x})),b_{\,\text{G}})].\\ \end{split}start_ROW start_CELL italic_L start_POSTSUBSCRIPT italic_G , flood , 1 end_POSTSUBSCRIPT = end_CELL start_CELL blackboard_E start_POSTSUBSCRIPT italic_x ∼ blackboard_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_h ( italic_f start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ( italic_D ( bold_italic_x ) ) , italic_b start_POSTSUBSCRIPT G end_POSTSUBSCRIPT ) ] . end_CELL end_ROW (31)

We conducted experiments with BCE loss and flood level 0.5log20.520.5\cdot\log 20.5 ⋅ roman_log 2 because of LG,opt=log2subscript𝐿𝐺opt2L_{G,\mathrm{opt}}=\log 2italic_L start_POSTSUBSCRIPT italic_G , roman_opt end_POSTSUBSCRIPT = roman_log 2 with BCE loss.

Table 15 shows the result. In this experimental setting, there was no benefit to applying flooding to the generator, in some cases it led to a degradation in performance. However, since there could be potential benefits to applying flooding to the generator in experimental settings where the generator is prone to overfitting, we believe the potential of flooding for the generator.

0.E.4 Change of experimental settings with flooding

Next, we evaluated performance when changing experimental settings from the code described in Supplementary Section 0.C. We did experiments with changing of the number of updates, batch size, depth of the layers, and dataset size.

We show the results in Table 16. It is important that the experiment results without flooding cause performance degradation in many cases according to the changes. It indicates the vulnerability of GANs to changes in experimental settings. On the other hand, both two-sided flooding and one-sided (fake) flooding generally improved the performance. From these findings, we can conclude that flooding contributes to robustness against changes in experimental settings.

Flooding type Eval w/o flooding Two-sided One-sided real One-sided fake
(a) Number of updates for the generator or discriminator.
Baseline Modes 4.8 (1.2) 7.8 (0.4) 4.0 (1.8) 8.0 (0.0)
HQ 0.90 (0.11) 0.87 (0.07) 0.77 (0.13) 0.89 (0.08)
G𝐺Gitalic_G updates (5×\times×) Modes 0.0 (0.0) 7.0 (0.9) 1.2 (0.4) 8.0 (0.0)
HQ 0.00 (0.00) 0.87 (0.05) 0.79 (0.17) 0.82 (0.08)
D𝐷Ditalic_D updates (5×\times×) Modes 0.0 (0.0) 4.4 (1.0) 0.0 (0.0) 0.0 (0.0)
HQ 0.00 (0.00) 0.84 (0.09) 0.00 (0.00) 0.00 (0.00)
(b) Batch size.
4 Modes 3.0 (2.5) 6.6 (0.8) 1.0 (2.0) 5.0 (2.7)
HQ 0.24 (0.21) 0.50 (0.09) 0.11 (0.21) 0.36 (0.22)
16 Modes 2.2 (1.8) 6.6 (0.5) 1.8 (2.2) 7.4 (0.5)
HQ 0.44 (0.36) 0.74 (0.02) 0.22 (0.27) 0.79 (0.07)
64 Modes 2.6 (2.2) 7.2 (0.4) 1.6 (2.1) 8.0 (0.0)
HQ 0.49 (0.40) 0.84 (0.08) 0.35 (0.43) 0.77 (0.22)
256 (default) Modes 4.8 (1.2) 7.8 (0.4) 4.0 (1.8) 8.0 (0.0)
HQ 0.90 (0.11) 0.87 (0.07) 0.77 (0.13) 0.89 (0.08)
512 Modes 2.4 (2.9) 8.0 (0.0) 5.0 (1.1) 8.0 (0.0)
HQ 0.30 (0.39) 0.85 (0.06) 0.90 (0.06) 0.91 (0.02)
(c) Number of layers.
2 Modes 6.2 (3.1) 8.0 (0.0) 7.8 (0.4) 6.4 (3.2)
HQ 0.24 (0.16) 0.47 (0.12) 0.38 (0.13) 0.30 (0.20)
4 (default) Modes 4.8 (1.2) 7.8 (0.4) 4.0 (1.8) 8.0 (0.0)
HQ 0.90 (0.11) 0.87 (0.07) 0.77 (0.13) 0.89 (0.08)
6 Modes 0.0 (0.0) 6.4 (0.5) 0.0 (0.0) 4.6 (3.8)
HQ 0.00 (0.00) 0.78 (0.12) 0.00 (0.00) 0.30 (0.32)
8 Modes 0.0 (0.0) 5.8 (1.2) 0.0 (0.0) 3.2 (3.9)
HQ 0.00 (0.00) 0.61 (0.28) 0.00 (0.00) 0.32 (0.39)
(d) Dataset size.
1000 Modes 0.0 (0.0) 7.6 (0.5) 0.0 (0.0) 8.0 (0.0)
HQ 0.00 (0.00) 0.87 (0.06) 0.00 (0.00) 0.90 (0.04)
10000 Modes 1.0 (1.3) 8.0 (0.0) 3.8 (2.5) 8.0 (0.0)
HQ 0.28 (0.36) 0.75 (0.19) 0.73 (0.37) 0.86 (0.07)
100000 Modes 2.2 (2.0) 8.0 (0.0) 4.4 (1.4) 8.0 (0.0)
HQ 0.56 (0.46) 0.93 (0.01) 0.81 (0.13) 0.87 (0.05)
\infty (default) Modes 4.8 (1.2) 7.8 (0.4) 4.0 (1.8) 8.0 (0.0)
HQ 0.90 (0.11) 0.87 (0.07) 0.77 (0.13) 0.89 (0.08)
Table 16: Average (standard deviation) of modes and HQ with BCE loss with flood level Medium, various experimental settings, and different flooding styles. (a) G𝐺Gitalic_G updates (5×\times×) and D𝐷Ditalic_D updates (5×\times×) means that we increase the number of updates of the generator and the discriminator, respectively, from one to five for each iteration. (d) The dataset size in the default setting is denoted as \infty because each iteration in the default setting samples from the true distribution. On the other hand, when we use datasets of finite size, we first prepare a fixed number of data from the true distribution to use as the dataset.
Dataset Eval w/o flooding Two-sided One-sided (real) One-sided (fake)
2D Ring Modes 4.8 (1.2) 7.8 (0.4) 4.0 (1.8) 8.0 (0.0)
HQ 0.90 (0.11) 0.87 (0.07) 0.77 (0.13) 0.89 (0.08)
2D Grid Modes 10.6 (5.5) 19.6 (0.8) 12.8 (1.5) 21.6 (2.2)
HQ 0.70 (0.35) 0.85 (0.06) 0.77 (0.12) 0.87 (0.03)
Table 17: Average (standard deviation) of modes and HQ with BCE loss with flood level Medium, datasets (2D Ring and 2D Grid), and different flooding styles.

0.E.5 Change of datasets

Next, we evaluated the effect of flooding on datasets other than 2D Ring. 2D Grid is a dataset where the Gaussian centers are arranged in a 5×\times×5 grid. Note that it holds that modes 25absent25\leq 25≤ 25 because of changes of number of the Gaussian centers from 2D Ring. We also evaluate the effect of flooding on this dataset with BCE loss and flood level Medium.

The results are presented in Table 17. Both two-sided flooding and one-sided (fake) flooding demonstrated performance improvement as 2D Ring. It supports the flooding effect regardless of the dataset.

0.E.6 Comparison with other functions with flooding function hhitalic_h

The flooding function is defined as

h(L,b)=|Lb|+b={L+2bif L<b is satisfied,Lotherwise.𝐿𝑏𝐿𝑏𝑏cases𝐿2𝑏if L<b is satisfied,𝐿otherwise.\begin{split}h(L,b)=|L-b|+b=\begin{split}\begin{cases}-L+2b&\text{if\> $L<b$ % is satisfied,}\\ L&\text{otherwise.}\end{cases}\end{split}\end{split}start_ROW start_CELL italic_h ( italic_L , italic_b ) = | italic_L - italic_b | + italic_b = start_ROW start_CELL { start_ROW start_CELL - italic_L + 2 italic_b end_CELL start_CELL if italic_L < italic_b is satisfied, end_CELL end_ROW start_ROW start_CELL italic_L end_CELL start_CELL otherwise. end_CELL end_ROW end_CELL end_ROW end_CELL end_ROW (32)

On the other hand, there are other possible methods than flooding to manipulate the loss below a certain value b𝑏bitalic_b. We tried some methods (max, log, 10%) that replace the flooding function hhitalic_h with the following functions,

hmax(L,b)={bif L<b is satisfied,Lotherwise,hlog(L,b)={blog(1+by)if L<b is satisfied,Lotherwise,h10%(L,b)={b0.1(by)if L<b is satisfied,Lotherwise.subscriptmax𝐿𝑏cases𝑏if L<b is satisfied,𝐿otherwise,subscriptlog𝐿𝑏cases𝑏1𝑏𝑦if L<b is satisfied,𝐿otherwise,subscript10%𝐿𝑏cases𝑏0.1𝑏𝑦if L<b is satisfied,𝐿otherwise.\begin{split}h_{\text{max}}(L,b)=&\begin{split}\begin{cases}b&\text{if\> $L<b$% is satisfied,}\\ L&\text{otherwise,}\end{cases}\end{split}\\ h_{\text{log}}(L,b)=&\begin{split}\begin{cases}b-\log(1+b-y)&\text{if\> $L<b$ % is satisfied,}\\ L&\text{otherwise,}\end{cases}\end{split}\\ h_{\text{10\%}}(L,b)=&\begin{split}\begin{cases}b-0.1\cdot(b-y)&\text{if\> $L<% b$ is satisfied,}\\ L&\text{otherwise.}\end{cases}\end{split}\\ \end{split}start_ROW start_CELL italic_h start_POSTSUBSCRIPT max end_POSTSUBSCRIPT ( italic_L , italic_b ) = end_CELL start_CELL start_ROW start_CELL { start_ROW start_CELL italic_b end_CELL start_CELL if italic_L < italic_b is satisfied, end_CELL end_ROW start_ROW start_CELL italic_L end_CELL start_CELL otherwise, end_CELL end_ROW end_CELL end_ROW end_CELL end_ROW start_ROW start_CELL italic_h start_POSTSUBSCRIPT log end_POSTSUBSCRIPT ( italic_L , italic_b ) = end_CELL start_CELL start_ROW start_CELL { start_ROW start_CELL italic_b - roman_log ( 1 + italic_b - italic_y ) end_CELL start_CELL if italic_L < italic_b is satisfied, end_CELL end_ROW start_ROW start_CELL italic_L end_CELL start_CELL otherwise, end_CELL end_ROW end_CELL end_ROW end_CELL end_ROW start_ROW start_CELL italic_h start_POSTSUBSCRIPT 10% end_POSTSUBSCRIPT ( italic_L , italic_b ) = end_CELL start_CELL start_ROW start_CELL { start_ROW start_CELL italic_b - 0.1 ⋅ ( italic_b - italic_y ) end_CELL start_CELL if italic_L < italic_b is satisfied, end_CELL end_ROW start_ROW start_CELL italic_L end_CELL start_CELL otherwise. end_CELL end_ROW end_CELL end_ROW end_CELL end_ROW (33)

The effect of the functions is illustrated in Figure 5. We applied the functions (hmaxsubscriptmaxh_{\text{max}}italic_h start_POSTSUBSCRIPT max end_POSTSUBSCRIPT, hlogsubscriptlogh_{\text{log}}italic_h start_POSTSUBSCRIPT log end_POSTSUBSCRIPT, and h10%subscript10%h_{\text{10\%}}italic_h start_POSTSUBSCRIPT 10% end_POSTSUBSCRIPT) to the discriminator’s loss.

Refer to caption
Figure 5: Illustration of changes of a graph (original) by using flooding (blue), max (yellow), log (green), and 10% (red). The horizontal dotted line (black) shows the b𝑏bitalic_b.
Adversarial loss Eval w/o flooding flooding max log 10%
BCE Loss Modes 4.8 (1.2) 7.8 (0.4) 4.0 (1.4) 4.8 (0.7) 5.2 (1.2)
HQ 0.90 (0.11) 0.87 (0.07) 0.90 (0.11) 0.80 (0.12) 0.85 (0.09)
Table 18: Average (standard deviation) of modes and HQ with BCE loss with flood level Medium, and functions which change the loss below a certain value b𝑏bitalic_b as Figure 5.

Table 18 shows the results. Results with the functions, hmaxsubscriptmaxh_{\text{max}}italic_h start_POSTSUBSCRIPT max end_POSTSUBSCRIPT, hlogsubscriptlogh_{\text{log}}italic_h start_POSTSUBSCRIPT log end_POSTSUBSCRIPT, and h10%subscript10%h_{\text{10\%}}italic_h start_POSTSUBSCRIPT 10% end_POSTSUBSCRIPT, did not show meaningful improvements and some methods cause performance degradation. It supports the advantage of flooding function hhitalic_h in preventing the training instability. It is important that even hmaxsubscriptmaxh_{\text{max}}italic_h start_POSTSUBSCRIPT max end_POSTSUBSCRIPT cannot prevent the training instability. Both of hmaxsubscriptmaxh_{\text{max}}italic_h start_POSTSUBSCRIPT max end_POSTSUBSCRIPT and flooding function hhitalic_h prevents the drop from b𝑏bitalic_b as Figure 5, however, only hhitalic_h prevents the instability. Therefore, we found that the effect of hhitalic_h that causes the gradient flipping is crucial to GAN training.

(a) Without flooding
(b) Two-sided flooding
(c) One-sided (real) flooding
Refer to caption
Refer to caption
Refer to caption
Refer to caption
(a) Without flooding
(b) Two-sided flooding
(c) One-sided (real) flooding
(d) One-sided (fake) flooding
Figure 6: Transition of losses LD,realsubscript𝐿𝐷realL_{D,\mathrm{real}}italic_L start_POSTSUBSCRIPT italic_D , roman_real end_POSTSUBSCRIPT (blue), LD,fakesubscript𝐿𝐷fakeL_{D,\mathrm{fake}}italic_L start_POSTSUBSCRIPT italic_D , roman_fake end_POSTSUBSCRIPT (yellow), and LGsubscript𝐿𝐺L_{G}italic_L start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT (green). Because the fluctuations of these measurements during the training diminish the visibility, we took moving average with a window size of 1000. The dashed line (black) expressed LD,opt,realsubscript𝐿𝐷optrealL_{D,\mathrm{opt,real}}italic_L start_POSTSUBSCRIPT italic_D , roman_opt , roman_real end_POSTSUBSCRIPT (blue), LD,opt,fakesubscript𝐿𝐷optfakeL_{D,\mathrm{opt,fake}}italic_L start_POSTSUBSCRIPT italic_D , roman_opt , roman_fake end_POSTSUBSCRIPT (yellow), and LG,optsubscript𝐿𝐺optL_{G,\mathrm{opt}}italic_L start_POSTSUBSCRIPT italic_G , roman_opt end_POSTSUBSCRIPT (=log2absent2=\log 2= roman_log 2). The dashed line (red) expresses the flood level for the discriminator. Note that the discriminator’s loss shown in the figure is that before applying flooding.
(a) Without flooding
(b) Two-sided flooding
(c) One-sided (real) flooding
Refer to caption
Refer to caption
Refer to caption
Refer to caption
(a) Without flooding
(b) Two-sided flooding
(c) One-sided (real) flooding
(d) One-sided (fake) flooding
Figure 7: Transition of gradient of the discriminator.

Appendix 0.F Analysis on the loss and gradient

We showed experimentally that flooding stabilizes GAN training. However, it is unknown why GAN training with flooding succeeds well even when flooding prevents the discriminator’s loss from becoming low. For instance, the flood level Medium is not too low, but GAN training with the flood level was stable rather than collapsed. In this section, We examined the loss of the generator and the discriminator during the training with synthetic dataset, BCE loss, and LD,flood,1subscript𝐿𝐷flood1L_{D,\mathrm{flood},1}italic_L start_POSTSUBSCRIPT italic_D , roman_flood , 1 end_POSTSUBSCRIPT to understand the dynamics of the two models training.

Figure 6 shows the results. Without flooding (Figure 6 (a)), the discriminator’s loss (LD,realsubscript𝐿𝐷realL_{D,\mathrm{real}}italic_L start_POSTSUBSCRIPT italic_D , roman_real end_POSTSUBSCRIPT and LD,fakesubscript𝐿𝐷fakeL_{D,\mathrm{fake}}italic_L start_POSTSUBSCRIPT italic_D , roman_fake end_POSTSUBSCRIPT) is low around 10,000 iterations, while the generator’s loss LGsubscript𝐿𝐺L_{G}italic_L start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT is high. This is because the generated samples are not accurate at the beginning of training, making it easier to distinguish the real or generated samples, resulting in a lower discriminator’s loss. Subsequently, as the quality of the generated samples improves, the discriminator’s loss increases steadily, but around 40,000 iterations, the discriminator’s loss begins to decrease again and then it continues until the end of the training. The loss transitions can be interpreted as overfitting of the discriminator to either the real or generated samples. On the other hand, with two-sided flooding (Figure 6 (b)), such collapse does not occur, and it can be observed that it stably converges to the values at theoretical convergence (LD,opt,realsubscript𝐿𝐷optrealL_{D,\mathrm{opt,real}}italic_L start_POSTSUBSCRIPT italic_D , roman_opt , roman_real end_POSTSUBSCRIPT, LD,opt,fakesubscript𝐿𝐷optfakeL_{D,\mathrm{opt,fake}}italic_L start_POSTSUBSCRIPT italic_D , roman_opt , roman_fake end_POSTSUBSCRIPT, and LG,optsubscript𝐿𝐺optL_{G,\mathrm{opt}}italic_L start_POSTSUBSCRIPT italic_G , roman_opt end_POSTSUBSCRIPT). Moreover, comparing the one-sided flooding experiments (Figure 6 (c) and (d)), while overfitting can be observed in one-sided (real) flooding, one-sided (fake) flooding stably converged. This suggests that overfitting to generated data is occurring in synthetic dataset experiments, and preventing this through flooding leads to stabilization. We also consider that Figure 6 supports the concept of Figure 1 that flooding prevent rapid decline in the discriminator’s loss, while Figure 6 (a) does not show such rapid decline in Figure 1 (a). We believe that the drop in the batch loss shown in Figure 6 (a) become smaller because overfitting at the instance level, as shown in Figure 6 (a), occasionally occurs and the losses of overfitted instances and non-overfitted instances were averaged during the calculation of the batch loss.

Another interesting information in Figure 6 is that, compared to w/o flooding, the discriminator’s loss around 10,000 iterations is raised according to the flood level Medium, which is not too-low flood level. The findings suggest that the discriminator does not classify perfectly according to the difference of the real and generated data distribution, which does not correspond to the proof procedure [9], demanding an optimal discriminator that minimizes the loss for a fixed generator. On the other hand, according to the result of Arojovsky et al. [1] that such an optimal discriminator can cause instability, preventing the optimal discriminator by using flooding could be regarded as the advantage. Note that Figure 6 also shows the existence of the loss dynamics in GANs that do not follow the proof of procedure [9], the details are future work.

Moreover, we investigated the model’s gradient norms, which stability affects the training stability.

Figure 7 shows a graph of the transition of gradient norms during the experiment in Figure 6. Without flooding for LD,fakesubscript𝐿𝐷fakeL_{D,\mathrm{fake}}italic_L start_POSTSUBSCRIPT italic_D , roman_fake end_POSTSUBSCRIPT ((a) and (c)), the gradient is large from the beginning and becomes even larger as training progresses. It indicates instability in training as it can be confirmed even when the loss appears to be stable in 20,000 40,000 iterations in Figure 6. On the other hand, with flooding for LD,fakesubscript𝐿𝐷fakeL_{D,\mathrm{fake}}italic_L start_POSTSUBSCRIPT italic_D , roman_fake end_POSTSUBSCRIPT ((b) and (d)), the gradient peak in the early stages of the training and then gradually decreases. Furthermore, the gradient norms at the peak are also relatively small. It suggests that flooding also affects suppressing the gradient and stabilize the training.

Appendix 0.G Generated images

We provide the generated images that we could not fully display in Figure 3 and 4. Figure 8 shows the generated images with DCGAN, CelebA (128×\times×128), and the BCE loss. Moreover, Figure 9 and 10 show the generated images with StarGAN V2 and CelebAHQ.

(a) w/o flooding (collapse)
(b) w/o flooding
Refer to caption
Refer to caption
Refer to caption
(a) w/o flooding (collapse)
(b) w/o flooding
(c) With flooding
Figure 8: Generated images with DCGAN, CelebA (128×\times×128), and the BCE loss that we could not fully display in Figure 3. Without flooding, in four out of five trials the generated images were collapsed (a). In one trial, the generated images did not collapsed, however, causes the graying (b). On the other hand, the generated images of four trials with flooding are not collapsed (c).
Refer to caption
Figure 9: Generated images with source images and random latent vectors for style codes. We arrange the source images in the top row, and below them, we show the generated images, both with and without flooding.
Refer to caption
Figure 10: Generated images with source images and reference images for style codes. We arrange the source images in the top row, and below them, we show the generated images, both with and without flooding. We also arrange the reference images in the left column.