Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Analysis of Evolutionary Diversity Optimisation for the Maximum Matching Problem

Jonathan Gadea Harder
Algorithm Engineering
Hasso Plattner Institute
University of Potsdam
Potsdam, Germany &Aneta Neumann
Optimisation and Logistics
School of Computer and Mathematical Sciences
University of Adelaide, Australia
Adelaide, SA 5005
Australia &Frank Neumann
Optimisation and Logistics
School of Computer and Mathematical Sciences
University of Adelaide, Australia
Adelaide, SA 5005
Australia
Abstract

This paper delves into the enhancement of solution diversity in evolutionary algorithms (EAs) for the maximum matching problem, with a particular focus on complete bipartite graphs and paths. We utilize binary string encoding for matchings and employ Hamming distance as the metric for measuring diversity, aiming to maximize it. Central to our research is the (μ+1)𝜇1(\mu+1)( italic_μ + 1 )-EA and 2P-EAD, applied for diversity optimization, which we rigorously analyze both theoretically and empirically.

For complete bipartite graphs, our runtime analysis demonstrates that, for reasonably small μ𝜇\muitalic_μ, the (μ+1)𝜇1(\mu+1)( italic_μ + 1 )-EA achieves maximal diversity with an expected runtime of O(μ2m4log(m))𝑂superscript𝜇2superscript𝑚4𝑚O(\mu^{2}m^{4}\log(m))italic_O ( italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_m start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT roman_log ( italic_m ) ) for the small gap case (where the population size μ𝜇\muitalic_μ is less than the difference in the sizes of the bipartite partitions) and O(μ2m2log(m))𝑂superscript𝜇2superscript𝑚2𝑚O(\mu^{2}m^{2}\log(m))italic_O ( italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_log ( italic_m ) ) otherwise. For paths we give an upper bound of O(μ3m3)𝑂superscript𝜇3superscript𝑚3O(\mu^{3}m^{3})italic_O ( italic_μ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_m start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ). Additionally, for the 2P-EAD we give stronger performance bounds of O(μ2m2log(m))𝑂superscript𝜇2superscript𝑚2𝑚O(\mu^{2}m^{2}\log(m))italic_O ( italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_log ( italic_m ) ) for the small gap case, O(μ2n2log(n))𝑂superscript𝜇2superscript𝑛2𝑛O(\mu^{2}n^{2}\log(n))italic_O ( italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_log ( italic_n ) ) otherwise, and O(μ3m2)𝑂superscript𝜇3superscript𝑚2O(\mu^{3}m^{2})italic_O ( italic_μ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) for paths. Here n𝑛nitalic_n is the total number of vertices and m𝑚mitalic_m the number of edges. Our empirical studies, examining the scaling behavior with respect to m𝑚mitalic_m and μ𝜇\muitalic_μ, complement these theoretical insights and suggest potential for further refinement of the runtime bounds.

1 Introduction

Evolutionary algorithms (EAs) stand as a robust class of heuristics that navigate the intricate landscapes of various domains, from combinatorial optimization to bioinformatics, and have proven especially valuable in addressing problems within graph theory [24]. Central to the discussion in the field is the concept of diversity within EAs, which has been pivotal in enhancing the search process and preventing premature convergence on suboptimal solutions [11].

1.1 Related work

Recent research in evolutionary computation investigates various connections between quality and diversity. Quality Diversity (QD) has gained recognition as a widely adopted search paradigm, particularly in the fields of robotics and games [28, 5, 16, 1, 4]. The goal of QD is to illuminate the space of solution behaviours by exploring various niches in the feature space and maximizing quality within each specific niche. In particular, the popular MAP-elites algorithm divides the search space into cells to identify the solution with the highest possible quality for each cell [18, 31, 1, 32]

The area of Evolutionary diversity optimization (EDO) aims to find a maximal diverse set of solutions that all meet a given quality criterion. EDO approaches have been applied in a wide range of settings. Diversity, while typically a means to avoid stagnation in the search for a single optimal solution, here is leveraged to yield a set of diverse, high-quality solutions. This is advantageous for decision-makers who value a variety of options from which to select the most fitting solution, accounting for different practical considerations and trade-offs [29, 30]. For example the use of different diversity measures has been explored for evolving diverse set of TSP instances that exhibit the difference in performance of algorithms for the traveling salesperson problem as well as differences in terms of features of variation of a given image.[6] In the classical context of combinatorial optimization, EDO algorithms have been designed for problems such as the knapsack problem [2], the computation of minimum spanning trees [3], communication networks [15, 23], to compute sets of problem instances [12, 21, 22], as well as the computation of diverse sets of solutions for monotone submodular functions under given constraints [20, 8]. Furthermore, Pareto Diversity Optimization (PDO) has been developed in [19] which is a coevolutionary approach optimizing the quality of the best possible solution as well as computing a diverse set of solutions meeting a given threshold value. EDO approaches have been analyzed with respect to their theoretical behavior for simple single- and multi-objective pseudo-Boolean functions [10] as well as simple scenarios of the traveling salesperson problem [6, 26, 25], the minimum spanning tree problem [3], the traveling thief problem [27], the permutation problems [7] and the optimization of submodular functions [20].

1.2 Our contribution

This paper builds upon the methodology of [13] applying the theoretical runtime analysis framework to the maximum matching problem, specifically in bipartite graphs and paths. We aim to provide a deeper understanding of how diversity mechanisms influence the efficiency of population-based EAs in converging to a diverse set of high-quality maximum matchings.

To achieve this, we adopt a binary string representation for matchings and use Hamming distance as a measure of diversity. We then delve into the theoretical underpinnings of evolutionary diversity optimization for the maximum matching problem, examining structural properties that impact the performance of diversity-enhancing mechanisms within EAs. We provide runtime analysis for evolutionary algorithms, shedding light on their scalability for different problem instances. Finally, we present our experimental investigations to assess how close the bounds on the theoretical runtimes match the the experimental runtimes.

In summary, our research provides theoretical insights and empirical evidence to understand how diversity can be effectively maximized for the maximum matching problem. Our findings contribute to a deeper understanding of the interplay between diversity and optimization in EAs and pave the way for further research in this direction.

The paper is organized as follows. In Section 2, we introduce the maximum matching problem and the evolutionary diversity optimization approaches analyzed in this study. We then explore structural properties and present runtime analyses for diversity optimization in the context of complete bipartite graphs and paths (Section 3). Experimental investigations are detailed for both unconstrained and constrained scenarios (Section 4 and  5), followed by concluding remarks and suggestions for future research directions (Section 6).

2 Preliminaries

In this part of the paper, we present the core concepts related to diversity optimization for matchings in bipartite graphs. We start by establishing the definitions and measures of diversity that will be used throughout our discussion.

2.1 Maximum matching problem and diversity optimization

Our study is concerned with the matching problem in bipartite graphs, described by a graph G=(V,E)𝐺𝑉𝐸G=(V,E)italic_G = ( italic_V , italic_E ). The aim is to find a maximum matching M𝑀Mitalic_M, which is a collection of edges that do not share common vertices. It is presumed that each individual in the starting population represents a valid maximum matching. Our analysis is directed at determining how long it takes evolutionary algorithms to cultivate a population that is not only diverse but also meets a specified quality benchmark.

Let x{0,1}|E|𝑥superscript01𝐸x\in\{0,1\}^{|E|}italic_x ∈ { 0 , 1 } start_POSTSUPERSCRIPT | italic_E | end_POSTSUPERSCRIPT represent a bitstring where each bit corresponds to an edge in E𝐸Eitalic_E, indicating whether the edge is included in the matching. We define the fitness function f(x)𝑓𝑥f(x)italic_f ( italic_x ) as follows, adapting the approach introduced by Giel and Wegener[14]:

f(x)={col(x)if x represents an invalid matching|x|if x represents a valid matching𝑓𝑥cases𝑐𝑜𝑙𝑥if 𝑥 represents an invalid matching𝑥if 𝑥 represents a valid matchingf(x)=\begin{cases}-col(x)&\text{if }x\text{ represents an invalid matching}\\ |x|&\text{if }x\text{ represents a valid matching}\end{cases}italic_f ( italic_x ) = { start_ROW start_CELL - italic_c italic_o italic_l ( italic_x ) end_CELL start_CELL if italic_x represents an invalid matching end_CELL end_ROW start_ROW start_CELL | italic_x | end_CELL start_CELL if italic_x represents a valid matching end_CELL end_ROW

Here, col(x)𝑐𝑜𝑙𝑥col(x)italic_c italic_o italic_l ( italic_x ) is the collision number, representing the count of pairs of edges that are included in x𝑥xitalic_x and share a common endpoint, rendering x𝑥xitalic_x an invalid matching, and |x|𝑥|x|| italic_x | is the number of edges included in the matching represented by x𝑥xitalic_x.

This fitness function imposes a penalty for invalid matchings proportional to the number of edge conflicts, thereby encouraging the evolution of valid matchings. The goal is to maximize f(x)𝑓𝑥f(x)italic_f ( italic_x ), which aligns with identifying a maximum matching that has no edge collisions.

The divergence between individuals is gauged using the Hamming distance, which is appropriate given our binary string representation of solutions. This distance measures how many bits differ between two strings.

2.2 Diversity measure

The diversity of a multiset (duplicates allowed) of search points P𝑃Pitalic_P (called population in the following) is defined as the cumulative Hamming distance across all unique individual pairings within P𝑃Pitalic_P. This is mathematically expressed as

D(P)=(x,y)P~×P~H(x,y),𝐷𝑃subscript𝑥𝑦~𝑃~𝑃𝐻𝑥𝑦D(P)=\sum_{(x,y)\in\tilde{P}\times\tilde{P}}H(x,y),italic_D ( italic_P ) = ∑ start_POSTSUBSCRIPT ( italic_x , italic_y ) ∈ over~ start_ARG italic_P end_ARG × over~ start_ARG italic_P end_ARG end_POSTSUBSCRIPT italic_H ( italic_x , italic_y ) ,

where P~~𝑃\tilde{P}over~ start_ARG italic_P end_ARG is the set (no duplicates) containing all solutions in P𝑃Pitalic_P, and H(x,y)𝐻𝑥𝑦H(x,y)italic_H ( italic_x , italic_y ) is the Hamming distance between any two solutions x𝑥xitalic_x and y𝑦yitalic_y. The notion of contribution for a solution x𝑥xitalic_x within a population is quantified as the difference in diversity if x𝑥xitalic_x were to be excluded and defined as

c(x)=D(P)D(P{x}).𝑐𝑥𝐷𝑃𝐷𝑃𝑥c(x)=D(P)-D(P\setminus\{x\}).italic_c ( italic_x ) = italic_D ( italic_P ) - italic_D ( italic_P ∖ { italic_x } ) .

2.3 Algorithms

The (μ+1𝜇1\mu+1italic_μ + 1)-EAD (see Algorithm 1) operates on a principle of maintaining and enhancing diversity within a population. It starts with a population of solutions, iteratively evolving them through mutation. In each iteration, it selects a solution uniformly at random, applies mutation, and if the new solution meets quality criteria, it is added to the population. To maintain population size, the least diverse individual (or one of them, if there are several) is removed. This process continues until the termination criterion is met. In our case this would be achieving maximal diversity and the quality criterion being a valid maximum matching.

Input: A population size μ𝜇\muitalic_μ, individual length m𝑚mitalic_m, mutation probability 1/m1𝑚1/m1 / italic_m
Output: A diverse population of solutions P𝑃Pitalic_P
1 Initialize P𝑃Pitalic_P with μ𝜇\muitalic_μ m𝑚mitalic_m-bit binary strings
2 while termination criterion not met do
3       Choose sP𝑠𝑃s\in Pitalic_s ∈ italic_P uniformly at random
4       Produce ssuperscript𝑠s^{\prime}italic_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT by flipping each bit of s𝑠sitalic_s with probability 1/m1𝑚1/m1 / italic_m independently
5       if ssuperscript𝑠s^{\prime}italic_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT meets the quality criteria then
6             Add ssuperscript𝑠s^{\prime}italic_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT to P𝑃Pitalic_P
7             Choose a solution zP𝑧𝑃z\in Pitalic_z ∈ italic_P where c(z)=minxPc(x)𝑐𝑧subscript𝑥𝑃𝑐𝑥c(z)=\min\limits_{x\in P}c(x)italic_c ( italic_z ) = roman_min start_POSTSUBSCRIPT italic_x ∈ italic_P end_POSTSUBSCRIPT italic_c ( italic_x ) u.a.r.
8             Set P:=P{z}assign𝑃𝑃𝑧P:=P\setminus\{z\}italic_P := italic_P ∖ { italic_z }
9            
10       end if
11      
12 end while
13
Algorithm 1 (μ+1𝜇1\mu+1italic_μ + 1)-EAD

The Two-Phase Matching EAD (see Algorithm 2) is also designed to generate diverse solutions in the population. The first phase involves ’unmatching’ a random subset of vertices in a solution, while the second phase focuses on ’rematching’ these vertices to other unmatched vertices in the graph. The algorithm keeps adding these newly formed solutions to the population if they fulfill the quality criteria and, similar to the (μ+1𝜇1\mu+1italic_μ + 1)-EAD, removes the least diverse solutions to maintain population size. The algorithm continues this process until the set criteria are met, aiming to achieve a diverse set of high-quality matchings.

Input: A population size μ𝜇\muitalic_μ, individual length m𝑚mitalic_m
Output: A diverse population of solutions P𝑃Pitalic_P
1 Initialize P𝑃Pitalic_P with μ𝜇\muitalic_μ m𝑚mitalic_m-bit binary strings;
2 while termination criterion not met do
3       Choose sP𝑠𝑃s\in Pitalic_s ∈ italic_P uniformly at random (u.a.r);
4       Create ssuperscript𝑠s^{\prime}italic_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT as duplicate of s𝑠sitalic_s;
5       Select a subset of vertices SV,𝑆𝑉S\subseteq V,italic_S ⊆ italic_V , where each vertex is included with probability 1|V|1𝑉\frac{1}{|V|}divide start_ARG 1 end_ARG start_ARG | italic_V | end_ARG;
6       foreach vertex vS𝑣𝑆v\in Sitalic_v ∈ italic_S do
7             Unmatch v𝑣vitalic_v in ssuperscript𝑠s^{\prime}italic_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT by setting corresponding bits to 0
8            
9       end foreach
10      foreach vertex vS𝑣𝑆v\in Sitalic_v ∈ italic_S do
11             if there are unmatched neighbors then
12                   Match v𝑣vitalic_v in ssuperscript𝑠s^{\prime}italic_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT u.a.r with unmatched neighbor
13                  
14             end if
15            
16       end foreach
17      if ssuperscript𝑠s^{\prime}italic_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT meets quality criteria then
18             Add ssuperscript𝑠s^{\prime}italic_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT to P𝑃Pitalic_P
19             Choose a solution zP𝑧𝑃z\in Pitalic_z ∈ italic_P where c(z)=minxPc(x)𝑐𝑧subscript𝑥𝑃𝑐𝑥c(z)=\min\limits_{x\in P}c(x)italic_c ( italic_z ) = roman_min start_POSTSUBSCRIPT italic_x ∈ italic_P end_POSTSUBSCRIPT italic_c ( italic_x ) u.a.r.
20            
21       end if
22      
23 end while
Algorithm 2 Two-Phase Matching EAD (2P-EAD)

2.4 Drift theorems

We analyse the considered algorithms with respect to their runtime behaviour. The expected runtime refers to the expected number of generated offspring until a given goal has been achieved (usually until a valid population of maximal diversity has been computed). For our analysis, we make use of the additive and multiplicate drift theorems which we state in the following.

Theorem 2.1 (Additive Drift Theorem[17]).

Let S𝑆S\subseteq\mathbb{R}italic_S ⊆ blackboard_R be a finite set of positive numbers and let (Xt)tsubscriptsuperscript𝑋𝑡𝑡({X^{t}})_{t\in\mathbb{N}}( italic_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_t ∈ blackboard_N end_POSTSUBSCRIPT be a sequence of random variables over S{0}𝑆0S\cup\{0\}italic_S ∪ { 0 }. Let T𝑇Titalic_T be the random variable that denotes the first point in time t𝑡t\in\mathbb{N}italic_t ∈ blackboard_N for which Xt0superscript𝑋𝑡0X^{t}\leq 0italic_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ≤ 0. Suppose that there exists a constant δ1>0subscript𝛿10\delta_{1}>0italic_δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT > 0 such that

E[XtXt+1T>t]δ1𝐸delimited-[]superscript𝑋𝑡superscript𝑋𝑡1ket𝑇𝑡subscript𝛿1E[X^{t}-X^{t+1}\mid T>t]\geq\delta_{1}italic_E [ italic_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - italic_X start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ∣ italic_T > italic_t ] ≥ italic_δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT

holds. Then

E[TX0]X0δ1.𝐸delimited-[]conditional𝑇superscript𝑋0superscript𝑋0subscript𝛿1E[T\mid X^{0}]\leq\frac{X^{0}}{\delta_{1}}.italic_E [ italic_T ∣ italic_X start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ] ≤ divide start_ARG italic_X start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT end_ARG start_ARG italic_δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG .

If there exists a constant δ2>0subscript𝛿20\delta_{2}>0italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT > 0 such that

E[XtXt+1T>t]δ2𝐸delimited-[]superscript𝑋𝑡superscript𝑋𝑡1ket𝑇𝑡subscript𝛿2E[X^{t}-X^{t+1}\mid T>t]\leq\delta_{2}italic_E [ italic_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - italic_X start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ∣ italic_T > italic_t ] ≤ italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT

holds. Then

E[TX0]X0δ2.𝐸delimited-[]conditional𝑇superscript𝑋0superscript𝑋0subscript𝛿2E[T\mid X^{0}]\geq\frac{X^{0}}{\delta_{2}}.italic_E [ italic_T ∣ italic_X start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ] ≥ divide start_ARG italic_X start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT end_ARG start_ARG italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG .
Theorem 2.2 (Multiplicative Drift Theorem[9]).

Let (Xt)tsubscriptsubscript𝑋𝑡𝑡(X_{t})_{t\in\mathbb{N}}( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_t ∈ blackboard_N end_POSTSUBSCRIPT be random variables over \mathbb{R}blackboard_R, xmin>0subscript𝑥0x_{\min}>0italic_x start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT > 0, and let T=min{tXt<xmin}𝑇conditional𝑡subscript𝑋𝑡subscript𝑥T=\min\{t\mid X_{t}<x_{\min}\}italic_T = roman_min { italic_t ∣ italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT < italic_x start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT }. Furthermore, suppose that

(a) X0xminsubscript𝑋0subscript𝑥X_{0}\geq x_{\min}italic_X start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ≥ italic_x start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT and, for all tT𝑡𝑇t\leq Titalic_t ≤ italic_T, it holds that Xt0subscript𝑋𝑡0X_{t}\geq 0italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ≥ 0, and that

(b) there is some value δ>0𝛿0\delta>0italic_δ > 0 such that, for all t<T𝑡𝑇t<Titalic_t < italic_T, it holds that XtE[Xt+1X0,,Xt]δXtsubscript𝑋𝑡𝐸delimited-[]conditionalsubscript𝑋𝑡1subscript𝑋0subscript𝑋𝑡𝛿subscript𝑋𝑡X_{t}-E[X_{t+1}\mid X_{0},\ldots,X_{t}]\geq\delta X_{t}italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_E [ italic_X start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ∣ italic_X start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ] ≥ italic_δ italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT.

Then

E[TX0]1+ln(X0xmin)δ.𝐸delimited-[]conditional𝑇subscript𝑋01subscript𝑋0subscript𝑥𝛿E[T\mid X_{0}]\leq\frac{1+\ln\left(\frac{X_{0}}{x_{\min}}\right)}{\delta}.italic_E [ italic_T ∣ italic_X start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ] ≤ divide start_ARG 1 + roman_ln ( divide start_ARG italic_X start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_x start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT end_ARG ) end_ARG start_ARG italic_δ end_ARG .

3 Runtime Analysis for complete bipartite graphs

This section introduces key theoretical results on complete bipartite graphs. We commence with a lemma that characterizes the conditions for maximal diversity within a population. Subsequently, we present a series of theorems that delineate the expected runtime to achieve this optimal diversity. These theorems compare the performance of the (μ+1)𝜇1(\mu+1)( italic_μ + 1 )-EAD and 2P-EAD algorithms, providing a quantitative basis for assessing their efficacy.

Lemma 3.1 (Diversity of a Population).

Maximal diversity D(P)𝐷𝑃D(P)italic_D ( italic_P ) on a complete bipartite graph ((G=(L,R),E)((G=(L,R),E)( ( italic_G = ( italic_L , italic_R ) , italic_E ) for a population P𝑃Pitalic_P of size μ<|R|2𝜇𝑅2\mu<\frac{|R|}{2}italic_μ < divide start_ARG | italic_R | end_ARG start_ARG 2 end_ARG, is attained if and only if all matchings in P𝑃Pitalic_P are pairwise edge-disjoint.

Proof.

Consider a set of matchings in G𝐺Gitalic_G, where each matching is a solution in the population. Let the diversity of this set be denoted by D𝐷Ditalic_D, defined as the sum of pairwise Hamming distances between all matchings.

A matching in G𝐺Gitalic_G involves pairing each vertex in R𝑅Ritalic_R with a unique vertex in L𝐿Litalic_L, yielding |R|𝑅|R|| italic_R | edges in each matching. The Hamming distance between any two distinct matchings is the count of edges that differ between them.

To maximize D𝐷Ditalic_D, each pair of matchings should differ by the greatest number of edges. This maximum difference is |R|𝑅|R|| italic_R |, occurring when the matchings share no common edges.

Given μ𝜇\muitalic_μ matchings, the number of distinct pairs of matchings is (μ2)binomial𝜇2\binom{\mu}{2}( FRACOP start_ARG italic_μ end_ARG start_ARG 2 end_ARG ). If all matchings are disjoint, each pair contributes 2|R|2𝑅2|R|2 | italic_R | to D𝐷Ditalic_D, leading to D=μ(μ1)|R|𝐷𝜇𝜇1𝑅D=\mu(\mu-1)|R|italic_D = italic_μ ( italic_μ - 1 ) | italic_R |.

If any pair of matchings shares at least one edge, the Hamming distance for that pair is strictly less than |R|𝑅|R|| italic_R |, thus reducing D𝐷Ditalic_D. Therefore, D𝐷Ditalic_D is maximized if and only if all μ𝜇\muitalic_μ matchings are pairwise edge-disjoint.

This argument hinges on the fact that μ<|R|2𝜇𝑅2\mu<\frac{|R|}{2}italic_μ < divide start_ARG | italic_R | end_ARG start_ARG 2 end_ARG, ensuring the feasibility of having disjoint matchings in G𝐺Gitalic_G since each matching uses |R|𝑅|R|| italic_R | edges and there are |R||L|𝑅𝐿|R||L|| italic_R | | italic_L | possible edges in G𝐺Gitalic_G. Consequently, it is possible to construct μ𝜇\muitalic_μ disjoint matchings, each utilizing a different subset of |R|𝑅|R|| italic_R | edges from the total pool. ∎

In the following theorem we show that there is always a local improvement, needing 2222 bit flips, to reach a population with maximum diversity if the difference in size between both partitions is larger than the population size.

Theorem 3.2.

Let G=((L,R),E)𝐺𝐿𝑅𝐸G=((L,R),E)italic_G = ( ( italic_L , italic_R ) , italic_E ) be a complete bipartite graph with μ<|R|2𝜇𝑅2\mu<\frac{|R|}{2}italic_μ < divide start_ARG | italic_R | end_ARG start_ARG 2 end_ARG, μ<|L||R|𝜇𝐿𝑅\mu<|L|-|R|italic_μ < | italic_L | - | italic_R | and |R|<|L|𝑅𝐿|R|<|L|| italic_R | < | italic_L |. In the (μ+1)𝜇1(\mu+1)( italic_μ + 1 )-EAD applied to G𝐺Gitalic_G, the expected time until the diversity is maximized is O(μ2m2log(m))𝑂superscript𝜇2superscript𝑚2𝑚O(\mu^{2}m^{2}\log(m))italic_O ( italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_log ( italic_m ) ).

Proof.

We define the potential function Xtsubscript𝑋𝑡X_{t}italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT as the difference between the optimal diversity divoptsubscriptdivopt\text{div}_{\text{opt}}div start_POSTSUBSCRIPT opt end_POSTSUBSCRIPT and the current diversity div(t)div𝑡\text{div}(t)div ( italic_t ) at time t𝑡titalic_t:

Xt:-divoptdiv(t).:-subscript𝑋𝑡subscriptdivoptdiv𝑡X_{t}\coloneq\text{div}_{\text{opt}}-\text{div}(t).italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT :- div start_POSTSUBSCRIPT opt end_POSTSUBSCRIPT - div ( italic_t ) .

In each solution, exactly |R|𝑅|R|| italic_R | vertices from L𝐿Litalic_L are adjacent to a matching edge, leaving |L||R|𝐿𝑅|L|-|R|| italic_L | - | italic_R | vertices in L𝐿Litalic_L unadjacent in every solution. Additionally, each vertex in R𝑅Ritalic_R can be matched to at most μ<|L||R|𝜇𝐿𝑅\mu<|L|-|R|italic_μ < | italic_L | - | italic_R | different vertices across all solutions, ensuring that, for each vertex in R, there exists a vertex in L that is not matched with it in any solution.

To show that there is always a 2-bit flip which improves diversity by at least Xtμsubscript𝑋𝑡𝜇\frac{X_{t}}{\mu}divide start_ARG italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG italic_μ end_ARG, we focus on a sequence of improving 2222-bit flips. Each 2222-bit flip corresponds to changing a match for a vertex in R𝑅Ritalic_R, which entails deactivating one edge (currently part of a matching) and activating another edge (currently not part of the matching). This process is akin to reassigning a vertex in R𝑅Ritalic_R to a different, unmatched vertex in L𝐿Litalic_L.

Consider an edge e𝑒eitalic_e used in i𝑖iitalic_i solutions. When this edge is deactivated (removed from the matching), the diversity change is (μi)𝜇𝑖-(\mu-i)- ( italic_μ - italic_i ), since μi𝜇𝑖\mu-iitalic_μ - italic_i solutions lose a unique edge, reducing diversity. Conversely, when a new edge is activated (added to the matching) that is unused across all other solutions,it contributes μ1𝜇1\mu-1italic_μ - 1 to the diversity.

Thus, for each such 2222-bit flip involving edge e𝑒eitalic_e, the total change in diversity is:

(μi)+(μ1)=μ+i+μ1=i1.𝜇𝑖𝜇1𝜇𝑖𝜇1𝑖1-(\mu-i)+(\mu-1)=-\mu+i+\mu-1=i-1.- ( italic_μ - italic_i ) + ( italic_μ - 1 ) = - italic_μ + italic_i + italic_μ - 1 = italic_i - 1 .

This calculation demonstrates that the diversity improve achieved by applying the 2222-bit flip for an edge in the sequence either decreases or remains unchanged if it is flipped later in the sequence. Note that in each step of the sequence the new maximum matching contains an edge unused by any other matching, so the offspring is always valid and the diversity improvement is at least 1111, since this would be achieved by replacing the parent. Also since as soon as all edges are unique across all solutions the population is optimal and thus the total change across all such edges equals the difference to the optimum Xtsubscript𝑋𝑡X_{t}italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT.

Let e¯¯𝑒\overline{e}over¯ start_ARG italic_e end_ARG represent the count of such "imperfect" edges (edges used in more than one solution). Applying the 2-bit flip to one edge of the sequence gives at-least the diversity increase it achieves in the sequence, since the value of i𝑖iitalic_i can only decrease or remain unchanged, and it is at most μ𝜇\muitalic_μ. Thus e¯μXt¯𝑒𝜇subscript𝑋𝑡\overline{e}\mu\geq X_{t}over¯ start_ARG italic_e end_ARG italic_μ ≥ italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, which implies e¯Xtμ¯𝑒subscript𝑋𝑡𝜇\overline{e}\geq\frac{X_{t}}{\mu}over¯ start_ARG italic_e end_ARG ≥ divide start_ARG italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG italic_μ end_ARG. The expected drift then is:

E[XtXt+1Xt]𝐸delimited-[]subscript𝑋𝑡conditionalsubscript𝑋𝑡1subscript𝑋𝑡\displaystyle E[X_{t}-X_{t+1}\mid X_{t}]italic_E [ italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_X start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ∣ italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ] e¯μm2(11m)m2Xtμ2m2e.absent¯𝑒𝜇superscript𝑚2superscript11𝑚𝑚2subscript𝑋𝑡superscript𝜇2superscript𝑚2𝑒\displaystyle\geq\frac{\overline{e}}{\mu m^{2}}\left(1-\frac{1}{m}\right)^{m-2% }\geq\frac{X_{t}}{\mu^{2}m^{2}e}.≥ divide start_ARG over¯ start_ARG italic_e end_ARG end_ARG start_ARG italic_μ italic_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ( 1 - divide start_ARG 1 end_ARG start_ARG italic_m end_ARG ) start_POSTSUPERSCRIPT italic_m - 2 end_POSTSUPERSCRIPT ≥ divide start_ARG italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_e end_ARG .

Given that (μ2)2|R|binomial𝜇22𝑅\binom{\mu}{2}2|R|( FRACOP start_ARG italic_μ end_ARG start_ARG 2 end_ARG ) 2 | italic_R | is the maximum diversity, when all edges aire pairwise distinct, it holds that X0(μ2)2|R|μ2|R||R|3m1.5subscript𝑋0binomial𝜇22𝑅superscript𝜇2𝑅superscript𝑅3superscript𝑚1.5X_{0}\leq\binom{\mu}{2}2|R|\leq\mu^{2}|R|\leq|R|^{3}\leq m^{1.5}italic_X start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ≤ ( FRACOP start_ARG italic_μ end_ARG start_ARG 2 end_ARG ) 2 | italic_R | ≤ italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | italic_R | ≤ | italic_R | start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ≤ italic_m start_POSTSUPERSCRIPT 1.5 end_POSTSUPERSCRIPT, the application of the multiplicative drift theorem yields the expected runtime of O(μ2m2log(m))𝑂superscript𝜇2superscript𝑚2𝑚O(\mu^{2}m^{2}\log(m))italic_O ( italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_log ( italic_m ) ) to achieve maximum diversity. ∎

We now show that the Two-Phase Matching Algorithm achieves significant speedup since no longer two edges have to be flipped to change where one vertex is matched to.

Theorem 3.3.

Let G=((L,R),E)𝐺𝐿𝑅𝐸G=((L,R),E)italic_G = ( ( italic_L , italic_R ) , italic_E ) be a complete bipartite graph with μ<|R|2𝜇𝑅2\mu<\frac{|R|}{2}italic_μ < divide start_ARG | italic_R | end_ARG start_ARG 2 end_ARG, μ<|L||R|𝜇𝐿𝑅\mu<|L|-|R|italic_μ < | italic_L | - | italic_R | and |R|<|L|𝑅𝐿|R|<|L|| italic_R | < | italic_L |. In the Two-Phase Matching Evolutionary Algorithm applied to G𝐺Gitalic_G, the expected time until the diversity is maximized is O(μ2n2log(n))𝑂superscript𝜇2superscript𝑛2𝑛O(\mu^{2}n^{2}\log(n))italic_O ( italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_log ( italic_n ) ), where n=|L|+|R|𝑛𝐿𝑅n=|L|+|R|italic_n = | italic_L | + | italic_R |.

Proof.

We define the potential function Xtsubscript𝑋𝑡X_{t}italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT as the difference between the optimal diversity divoptsubscriptdivopt\text{div}_{\text{opt}}div start_POSTSUBSCRIPT opt end_POSTSUBSCRIPT and the current diversity div(t)div𝑡\text{div}(t)div ( italic_t ) at time t𝑡titalic_t:

Xt:-divoptdiv(t).:-subscript𝑋𝑡subscriptdivoptdiv𝑡X_{t}\coloneq\text{div}_{\text{opt}}-\text{div}(t).italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT :- div start_POSTSUBSCRIPT opt end_POSTSUBSCRIPT - div ( italic_t ) .

The maximal diversity is achieved when all matchings in the population are pairwise edge-disjoint. The drift in the potential function Xtsubscript𝑋𝑡X_{t}italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT at each step of the algorithm is analyzed as follows:

In each step, the algorithm first selects a solution and a subset of vertices, which it rematches with unmatched vertices in L𝐿Litalic_L. Let e¯¯𝑒\overline{e}over¯ start_ARG italic_e end_ARG represent the count of such "imperfect" edges (edges used in more than one solution). As shown in Theorem 3.2 it holds that e¯Xtμ¯𝑒subscript𝑋𝑡𝜇\overline{e}\geq\frac{X_{t}}{\mu}over¯ start_ARG italic_e end_ARG ≥ divide start_ARG italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG italic_μ end_ARG. The expected drift then is obtained by selecting the corresponding solution to any of the e¯¯𝑒\overline{e}over¯ start_ARG italic_e end_ARG edges, unmatching the adjacent vertex in R𝑅Ritalic_R and rematching it to include an edge unused by any solution. The probability to unmatch any and no other particular vertex in R𝑅Ritalic_R is 1n(11n)n11en1𝑛superscript11𝑛𝑛11𝑒𝑛\frac{1}{n}(1-\frac{1}{n})^{n-1}\geq\frac{1}{en}divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ( 1 - divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ) start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT ≥ divide start_ARG 1 end_ARG start_ARG italic_e italic_n end_ARG, and the probability of matching it with an appropriate unmatched vertex in L𝐿Litalic_L is at-least 1n1𝑛\frac{1}{n}divide start_ARG 1 end_ARG start_ARG italic_n end_ARG.

The expected decrease in the potential function Xtsubscript𝑋𝑡X_{t}italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT per step, or the expected drift, is then given by:

E[XtXt+1Xt]e¯μn2eXtμ2n2e,𝐸delimited-[]subscript𝑋𝑡conditionalsubscript𝑋𝑡1subscript𝑋𝑡¯𝑒𝜇superscript𝑛2𝑒subscript𝑋𝑡superscript𝜇2superscript𝑛2𝑒E[X_{t}-X_{t+1}\mid X_{t}]\geq\frac{\overline{e}}{\mu n^{2}e}\geq\frac{X_{t}}{% \mu^{2}n^{2}e},italic_E [ italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_X start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ∣ italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ] ≥ divide start_ARG over¯ start_ARG italic_e end_ARG end_ARG start_ARG italic_μ italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_e end_ARG ≥ divide start_ARG italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_e end_ARG ,

where the factor 1μn21𝜇superscript𝑛2\frac{1}{\mu n^{2}}divide start_ARG 1 end_ARG start_ARG italic_μ italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG accounts for the probability of selecting the right vertex and making a beneficial rematch.

Given that (μ2)2|R|binomial𝜇22𝑅\binom{\mu}{2}2|R|( FRACOP start_ARG italic_μ end_ARG start_ARG 2 end_ARG ) 2 | italic_R | is the maximum diversity, when all edges are pairwise distinct, it holds that X0(μ2)2|R|μ2|R||R|3m1.5n3subscript𝑋0binomial𝜇22𝑅superscript𝜇2𝑅superscript𝑅3superscript𝑚1.5superscript𝑛3X_{0}\leq\binom{\mu}{2}2|R|\leq\mu^{2}|R|\leq|R|^{3}\leq m^{1.5}\leq n^{3}italic_X start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ≤ ( FRACOP start_ARG italic_μ end_ARG start_ARG 2 end_ARG ) 2 | italic_R | ≤ italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | italic_R | ≤ | italic_R | start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ≤ italic_m start_POSTSUPERSCRIPT 1.5 end_POSTSUPERSCRIPT ≤ italic_n start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT, the application of the multiplicative drift theorem yields the expected runtime of O(μ2n2log(n))𝑂superscript𝜇2superscript𝑛2𝑛O(\mu^{2}n^{2}\log(n))italic_O ( italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_log ( italic_n ) ) to achieve maximum diversity. ∎

Theorem 3.5 covers the case μ|L||R|𝜇𝐿𝑅\mu\geq|L|-|R|italic_μ ≥ | italic_L | - | italic_R | missing in the previous theorem, which gives a much larger runtime bound. Intuitively this happens because as μ𝜇\muitalic_μ gets greater than the gap between |L||R|𝐿𝑅|L|-|R|| italic_L | - | italic_R | it is not longer guaranteed that we can always find a new rematch, such that this matching edge is not used by any other solution, thus making more than two bit flips necessary. Theorem 3.4 includes such a situation with a theoretical lower bound.

Theorem 3.4.

Let G=((L,R),E)𝐺𝐿𝑅𝐸G=((L,R),E)italic_G = ( ( italic_L , italic_R ) , italic_E ) be a complete bipartite graph with |R|<|L|𝑅𝐿|R|<|L|| italic_R | < | italic_L |. Consider a population size μ𝜇\muitalic_μ, satisfying μ<|R|2𝜇𝑅2\mu<\frac{|R|}{2}italic_μ < divide start_ARG | italic_R | end_ARG start_ARG 2 end_ARG and μ|L||R|𝜇𝐿𝑅\mu\geq|L|-|R|italic_μ ≥ | italic_L | - | italic_R |. There exists a starting population Pwsubscript𝑃𝑤P_{w}italic_P start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT such that when the (μ+1)𝜇1(\mu+1)( italic_μ + 1 )-EAD is applied to G𝐺Gitalic_G, the expected time to reach a population with maximal diversity is Ω(m3.5)Ωsuperscript𝑚3.5\Omega(m^{3.5})roman_Ω ( italic_m start_POSTSUPERSCRIPT 3.5 end_POSTSUPERSCRIPT ).

Proof.

Consider a bipartite graph G=(LR,E)𝐺𝐿𝑅𝐸G=(L\cup R,E)italic_G = ( italic_L ∪ italic_R , italic_E ) with vertex partitions L={l1,l2,,l|L|}𝐿subscript𝑙1subscript𝑙2subscript𝑙𝐿L=\{l_{1},l_{2},\ldots,l_{|L|}\}italic_L = { italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_l start_POSTSUBSCRIPT | italic_L | end_POSTSUBSCRIPT } and R={r1,r2,,r|R|}𝑅subscript𝑟1subscript𝑟2subscript𝑟𝑅R=\{r_{1},r_{2},\ldots,r_{|R|}\}italic_R = { italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_r start_POSTSUBSCRIPT | italic_R | end_POSTSUBSCRIPT }. Define a matrix Mμ×|R|𝑀superscript𝜇𝑅M\in\mathbb{R}^{\mu\times|R|}italic_M ∈ blackboard_R start_POSTSUPERSCRIPT italic_μ × | italic_R | end_POSTSUPERSCRIPT representing solutions to a matching problem, where each row of M𝑀Mitalic_M corresponds to a solution, and each column j𝑗jitalic_j (for 1j|R|1𝑗𝑅1\leq j\leq|R|1 ≤ italic_j ≤ | italic_R |) indicates the match in L𝐿Litalic_L for vertex rjsubscript𝑟𝑗r_{j}italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT in R𝑅Ritalic_R.

The matrix M𝑀Mitalic_M is constructed as follows:

  1. 1.

    The first column of M𝑀Mitalic_M, denoted M,1subscript𝑀1M_{*,1}italic_M start_POSTSUBSCRIPT ∗ , 1 end_POSTSUBSCRIPT, is defined as:

    M,1=(l|L|,l|L|,l|L|1,,l|L|μ+2)T.subscript𝑀1superscriptsubscript𝑙𝐿subscript𝑙𝐿subscript𝑙𝐿1subscript𝑙𝐿𝜇2𝑇M_{*,1}=(l_{|L|},l_{|L|},l_{|L|-1},\ldots,l_{|L|-\mu+2})^{T}.italic_M start_POSTSUBSCRIPT ∗ , 1 end_POSTSUBSCRIPT = ( italic_l start_POSTSUBSCRIPT | italic_L | end_POSTSUBSCRIPT , italic_l start_POSTSUBSCRIPT | italic_L | end_POSTSUBSCRIPT , italic_l start_POSTSUBSCRIPT | italic_L | - 1 end_POSTSUBSCRIPT , … , italic_l start_POSTSUBSCRIPT | italic_L | - italic_μ + 2 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT .
  2. 2.

    For each row i𝑖iitalic_i (for 1iμ1𝑖𝜇1\leq i\leq\mu1 ≤ italic_i ≤ italic_μ), the entries in the row are filled by rotating the elements of L𝐿Litalic_L such that:

    Mi,j=l((j+i2)mod|L|)for2j|R|.formulae-sequencesubscript𝑀𝑖𝑗subscript𝑙modulo𝑗𝑖2𝐿for2𝑗𝑅M_{i,j}=l_{((j+i-2)\mod|L|)}\quad\text{for}\quad 2\leq j\leq|R|.italic_M start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT = italic_l start_POSTSUBSCRIPT ( ( italic_j + italic_i - 2 ) roman_mod | italic_L | ) end_POSTSUBSCRIPT for 2 ≤ italic_j ≤ | italic_R | .
  3. 3.

    This process results in each row of M𝑀Mitalic_M sharing the same sequence of vertices from L𝐿Litalic_L, except for the first entry, with a cyclical shift to the right in each subsequent row.

This matrix M𝑀Mitalic_M represents distinct solutions for the bipartite graph matching problem, where each row corresponds to a different solution, and each column represents a match between a vertex in R𝑅Ritalic_R and a vertex in L𝐿Litalic_L, arranged according to the specified rotating pattern.

This matrix exemplifies the construction of solutions, with μ=5,|R|=11,|L|=12formulae-sequence𝜇5formulae-sequence𝑅11𝐿12\mu=5,|R|=11,|L|=12italic_μ = 5 , | italic_R | = 11 , | italic_L | = 12 each row depicting a unique solution in the bipartite graph matching problem.

M=(l12l1l2l3l4l5l6l7l8l9l12l2l3l4l5l6l7l8l9l1l11l3l4l5l6l7l8l9l1l2l10l4l5l6l7l8l9l1l2l3l9l5l6l7l8l9l1l2l3l4)𝑀matrixsubscript𝑙12subscript𝑙1subscript𝑙2subscript𝑙3subscript𝑙4subscript𝑙5subscript𝑙6subscript𝑙7subscript𝑙8subscript𝑙9subscript𝑙12subscript𝑙2subscript𝑙3subscript𝑙4subscript𝑙5subscript𝑙6subscript𝑙7subscript𝑙8subscript𝑙9subscript𝑙1subscript𝑙11subscript𝑙3subscript𝑙4subscript𝑙5subscript𝑙6subscript𝑙7subscript𝑙8subscript𝑙9subscript𝑙1subscript𝑙2subscript𝑙10subscript𝑙4subscript𝑙5subscript𝑙6subscript𝑙7subscript𝑙8subscript𝑙9subscript𝑙1subscript𝑙2subscript𝑙3subscript𝑙9subscript𝑙5subscript𝑙6subscript𝑙7subscript𝑙8subscript𝑙9subscript𝑙1subscript𝑙2subscript𝑙3subscript𝑙4M=\begin{pmatrix}l_{12}&l_{1}&l_{2}&l_{3}&l_{4}&l_{5}&l_{6}&l_{7}&l_{8}&l_{9}% \\ l_{12}&l_{2}&l_{3}&l_{4}&l_{5}&l_{6}&l_{7}&l_{8}&l_{9}&l_{1}\\ l_{11}&l_{3}&l_{4}&l_{5}&l_{6}&l_{7}&l_{8}&l_{9}&l_{1}&l_{2}\\ l_{10}&l_{4}&l_{5}&l_{6}&l_{7}&l_{8}&l_{9}&l_{1}&l_{2}&l_{3}\\ l_{9}&l_{5}&l_{6}&l_{7}&l_{8}&l_{9}&l_{1}&l_{2}&l_{3}&l_{4}\\ \end{pmatrix}italic_M = ( start_ARG start_ROW start_CELL italic_l start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 6 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 7 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 8 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 9 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_l start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 6 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 7 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 8 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 9 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_l start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 6 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 7 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 8 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 9 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_l start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 6 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 7 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 8 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 9 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_l start_POSTSUBSCRIPT 9 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 6 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 7 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 8 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 9 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT end_CELL end_ROW end_ARG )

For each such matrix only the first column has two solutions using the same edge and the distance to optimal diversity is 2. Selecting any solution except these two can’t increase the diversity. And for each of these 2 rows there is no value we can change the assignment of r1subscript𝑟1r_{1}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT to without creating another duplicate edge or creating an invalid matching. Thus we have to change to one of the |L|μ𝐿𝜇|L|-\mu| italic_L | - italic_μ edges not part of the row and subsequently deactivate that edge and activate to one of the |L||R|μ𝐿𝑅𝜇|L|-|R|\leq\mu| italic_L | - | italic_R | ≤ italic_μ edges not used in the row. The probability of doing this is at most 2μ1m|L|μm1m|L||R|m2|R|μμm42m3.52𝜇1𝑚𝐿𝜇𝑚1𝑚𝐿𝑅𝑚2𝑅𝜇𝜇superscript𝑚42superscript𝑚3.5\frac{2}{\mu}\frac{1}{m}\frac{|L|-\mu}{m}\frac{1}{m}\frac{|L|-|R|}{m}\leq\frac% {2|R|\mu}{\mu m^{4}}\leq\frac{2}{m^{3.5}}divide start_ARG 2 end_ARG start_ARG italic_μ end_ARG divide start_ARG 1 end_ARG start_ARG italic_m end_ARG divide start_ARG | italic_L | - italic_μ end_ARG start_ARG italic_m end_ARG divide start_ARG 1 end_ARG start_ARG italic_m end_ARG divide start_ARG | italic_L | - | italic_R | end_ARG start_ARG italic_m end_ARG ≤ divide start_ARG 2 | italic_R | italic_μ end_ARG start_ARG italic_μ italic_m start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT end_ARG ≤ divide start_ARG 2 end_ARG start_ARG italic_m start_POSTSUPERSCRIPT 3.5 end_POSTSUPERSCRIPT end_ARG. The remaining runtime is Ω(m3.5)Ωsuperscript𝑚3.5\Omega(m^{3.5})roman_Ω ( italic_m start_POSTSUPERSCRIPT 3.5 end_POSTSUPERSCRIPT ). ∎

For the given hard instance, while there is no improving 2-bit flip there is however an improving 4-bit flip of the following form, changing two matches. We make use of the fact that there is a match we can alter (l5l10)absentsubscript𝑙5subscript𝑙10(l_{5}\xrightarrow{}l_{10})( italic_l start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT start_ARROW start_OVERACCENT end_OVERACCENT → end_ARROW italic_l start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT ) freeing a vertex (l5subscript𝑙5l_{5}italic_l start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT) we can match to r1subscript𝑟1r_{1}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, which is unique in all solutions.

M=(l5l1l2l3l4l10l6l7l8l9l12l2l3l4l5l6l7l8l9l1l11l3l4l5l6l7l8l9l1l2l10l4l5l6l7l8l9l1l2l3l9l5l6l7l8l9l1l2l3l4)𝑀matrixsubscript𝑙5subscript𝑙1subscript𝑙2subscript𝑙3subscript𝑙4subscript𝑙10subscript𝑙6subscript𝑙7subscript𝑙8subscript𝑙9subscript𝑙12subscript𝑙2subscript𝑙3subscript𝑙4subscript𝑙5subscript𝑙6subscript𝑙7subscript𝑙8subscript𝑙9subscript𝑙1subscript𝑙11subscript𝑙3subscript𝑙4subscript𝑙5subscript𝑙6subscript𝑙7subscript𝑙8subscript𝑙9subscript𝑙1subscript𝑙2subscript𝑙10subscript𝑙4subscript𝑙5subscript𝑙6subscript𝑙7subscript𝑙8subscript𝑙9subscript𝑙1subscript𝑙2subscript𝑙3subscript𝑙9subscript𝑙5subscript𝑙6subscript𝑙7subscript𝑙8subscript𝑙9subscript𝑙1subscript𝑙2subscript𝑙3subscript𝑙4M=\begin{pmatrix}{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{% 1,0,0}l_{5}}&l_{1}&l_{2}&l_{3}&l_{4}&{\color[rgb]{1,0,0}\definecolor[named]{% pgfstrokecolor}{rgb}{1,0,0}l_{10}}&l_{6}&l_{7}&l_{8}&l_{9}\\ l_{12}&l_{2}&l_{3}&l_{4}&l_{5}&l_{6}&l_{7}&l_{8}&l_{9}&l_{1}\\ l_{11}&l_{3}&l_{4}&l_{5}&l_{6}&l_{7}&l_{8}&l_{9}&l_{1}&l_{2}\\ l_{10}&l_{4}&l_{5}&l_{6}&l_{7}&l_{8}&l_{9}&l_{1}&l_{2}&l_{3}\\ l_{9}&l_{5}&l_{6}&l_{7}&l_{8}&l_{9}&l_{1}&l_{2}&l_{3}&l_{4}\\ \end{pmatrix}italic_M = ( start_ARG start_ROW start_CELL italic_l start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 6 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 7 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 8 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 9 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_l start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 6 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 7 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 8 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 9 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_l start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 6 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 7 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 8 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 9 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_l start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 6 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 7 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 8 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 9 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_l start_POSTSUBSCRIPT 9 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 6 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 7 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 8 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 9 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_CELL start_CELL italic_l start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT end_CELL end_ROW end_ARG )

In the following theorem we generalize that such a 4-bit flip can always be found.

Theorem 3.5.

For a complete bipartite graph G=((L,R),E)𝐺𝐿𝑅𝐸G=((L,R),E)italic_G = ( ( italic_L , italic_R ) , italic_E ) where |R|<|L|𝑅𝐿|R|<|L|| italic_R | < | italic_L |, let the population size μ𝜇\muitalic_μ satisfy μ<|R|2𝜇𝑅2\mu<\frac{|R|}{2}italic_μ < divide start_ARG | italic_R | end_ARG start_ARG 2 end_ARG and μ|L||R|𝜇𝐿𝑅\mu\geq|L|-|R|italic_μ ≥ | italic_L | - | italic_R |. When the (μ+1)𝜇1(\mu+1)( italic_μ + 1 )-EAD is applied to G𝐺Gitalic_G, the expected time to achieve maximal diversity is bounded by O(μ2m4log(m))𝑂superscript𝜇2superscript𝑚4𝑚O(\mu^{2}m^{4}\log(m))italic_O ( italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_m start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT roman_log ( italic_m ) ).

Proof.

We investigate the expected time for the (μ+1)𝜇1(\mu+1)( italic_μ + 1 )-EAD to maximize diversity in a complete bipartite graph with the given conditions. Initially, we note that for any maximum matching there exist |L||R|𝐿𝑅|L|-|R|| italic_L | - | italic_R | unmatched vertices from the left partition.

Let M𝑀Mitalic_M be a maximum matching in G𝐺Gitalic_G. Consider that full diversity is not achieved yet and thus an edge erlMsubscript𝑒𝑟𝑙𝑀e_{rl}\in Mitalic_e start_POSTSUBSCRIPT italic_r italic_l end_POSTSUBSCRIPT ∈ italic_M is part of multiple maximum matchings. We define L¯L¯𝐿𝐿\overline{L}\subseteq Lover¯ start_ARG italic_L end_ARG ⊆ italic_L to be the set of vertices in L𝐿Litalic_L that are matched to a vertex rR𝑟𝑅r\in Ritalic_r ∈ italic_R in at least one maximum matching. Given that μ<|R|2<|L|2𝜇𝑅2𝐿2\mu<\frac{|R|}{2}<\frac{|L|}{2}italic_μ < divide start_ARG | italic_R | end_ARG start_ARG 2 end_ARG < divide start_ARG | italic_L | end_ARG start_ARG 2 end_ARG and since a matching pairs each vertex in R𝑅Ritalic_R with at most one vertex in L𝐿Litalic_L, there must exist more than |L|2𝐿2\frac{|L|}{2}divide start_ARG | italic_L | end_ARG start_ARG 2 end_ARG vertices in L𝐿Litalic_L that are not paired with u𝑢uitalic_u in any maximum matching. Let RRsuperscript𝑅𝑅R^{\prime}\subseteq Ritalic_R start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⊆ italic_R be the set of vertices in R𝑅Ritalic_R that are adjacent to these unpaired vertices in L𝐿Litalic_L.

In the context of the (μ+1)𝜇1(\mu+1)( italic_μ + 1 )-EAD, by strategically reassigning the pairs in M𝑀Mitalic_M, we can ensure an increase in diversity without decreasing the matching size. We denote by M(r)𝑀𝑟M(r)italic_M ( italic_r ) the vertex in L𝐿Litalic_L to which a vertex rR𝑟𝑅r\in Ritalic_r ∈ italic_R is matched under M𝑀Mitalic_M.

Now, for the sake of contradiction, assume that rR:M(r)L¯:for-allsuperscript𝑟superscript𝑅𝑀superscript𝑟¯𝐿\forall r^{\prime}\in R^{\prime}:M(r^{\prime})\in\overline{L}∀ italic_r start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ italic_R start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT : italic_M ( italic_r start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ∈ over¯ start_ARG italic_L end_ARG. This would suggest that each vertex in Rsuperscript𝑅R^{\prime}italic_R start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is matched to a vertex in L¯¯𝐿\overline{L}over¯ start_ARG italic_L end_ARG under M𝑀Mitalic_M. However, since L¯<μ<|R|2¯𝐿𝜇𝑅2\overline{L}<\mu<\frac{|R|}{2}over¯ start_ARG italic_L end_ARG < italic_μ < divide start_ARG | italic_R | end_ARG start_ARG 2 end_ARG and |R|>|R|2superscript𝑅𝑅2|R^{\prime}|>\frac{|R|}{2}| italic_R start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | > divide start_ARG | italic_R | end_ARG start_ARG 2 end_ARG, this situation is not possible.

Therefore, there must exist a vertex rRsuperscript𝑟superscript𝑅r^{\prime}\in R^{\prime}italic_r start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ italic_R start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT such that M(r)L¯𝑀superscript𝑟¯𝐿M(r^{\prime})\notin\overline{L}italic_M ( italic_r start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ∉ over¯ start_ARG italic_L end_ARG. This implies that we can activate an edge connecting rsuperscript𝑟r^{\prime}italic_r start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT with an unmatched vertex in L𝐿Litalic_L and deactivate the edge currently matching rsuperscript𝑟r^{\prime}italic_r start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT without reducing the size of the matching, thereby increasing diversity. Just as in Theorem 3.2 each of those 4-bit flips only decreases or does not change the multiplicities of other edges, since they are both unique edges over all solutions. Also succsesively applying these 4-bit flips at most e¯¯𝑒\overline{e}over¯ start_ARG italic_e end_ARG times will result in optimal diversity, so e¯μXt¯𝑒𝜇subscript𝑋𝑡\overline{e}\mu\geq X_{t}over¯ start_ARG italic_e end_ARG italic_μ ≥ italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT holds.

Define Xtsubscript𝑋𝑡X_{t}italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT to be the difference between the optimal diversity and the current diversity at time t𝑡titalic_t. Then, we observe a positive drift in the expected diversity increase per time step, similarly as Theorem 3.2 which can be bounded below by:

E[XtXt+1Xt]𝐸delimited-[]subscript𝑋𝑡conditionalsubscript𝑋𝑡1subscript𝑋𝑡\displaystyle E[X_{t}-X_{t+1}\mid X_{t}]italic_E [ italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_X start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ∣ italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ] e¯μm4(11m)m4Xtμ2m4e.absent¯𝑒𝜇superscript𝑚4superscript11𝑚𝑚4subscript𝑋𝑡superscript𝜇2superscript𝑚4𝑒\displaystyle\geq\frac{\overline{e}}{\mu m^{4}}\left(1-\frac{1}{m}\right)^{m-4% }\geq\frac{X_{t}}{\mu^{2}m^{4}e}.≥ divide start_ARG over¯ start_ARG italic_e end_ARG end_ARG start_ARG italic_μ italic_m start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT end_ARG ( 1 - divide start_ARG 1 end_ARG start_ARG italic_m end_ARG ) start_POSTSUPERSCRIPT italic_m - 4 end_POSTSUPERSCRIPT ≥ divide start_ARG italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_m start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_e end_ARG .

Here, 1μ1𝜇\frac{1}{\mu}divide start_ARG 1 end_ARG start_ARG italic_μ end_ARG represents the probability of selecting the correct individual for reassignment, and the term 1m4(11m)m41superscript𝑚4superscript11𝑚𝑚4\frac{1}{m^{4}}\left(1-\frac{1}{m}\right)^{m-4}divide start_ARG 1 end_ARG start_ARG italic_m start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT end_ARG ( 1 - divide start_ARG 1 end_ARG start_ARG italic_m end_ARG ) start_POSTSUPERSCRIPT italic_m - 4 end_POSTSUPERSCRIPT accounts for the probability of selecting the appropriate edges for activation and deactivation.

Given that (μ2)2|R|binomial𝜇22𝑅\binom{\mu}{2}2|R|( FRACOP start_ARG italic_μ end_ARG start_ARG 2 end_ARG ) 2 | italic_R | is the maximum diversity, when all edges are pairwise distinct, it holds that X0(μ2)2|R|μ2|R||R|3m1.5subscript𝑋0binomial𝜇22𝑅superscript𝜇2𝑅superscript𝑅3superscript𝑚1.5X_{0}\leq\binom{\mu}{2}2|R|\leq\mu^{2}|R|\leq|R|^{3}\leq m^{1.5}italic_X start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ≤ ( FRACOP start_ARG italic_μ end_ARG start_ARG 2 end_ARG ) 2 | italic_R | ≤ italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | italic_R | ≤ | italic_R | start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ≤ italic_m start_POSTSUPERSCRIPT 1.5 end_POSTSUPERSCRIPT, the Multiplicative Drift Theorem provides us with a runtime bound of O(μ2m4log(m))𝑂superscript𝜇2superscript𝑚4𝑚O(\mu^{2}m^{4}\log(m))italic_O ( italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_m start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT roman_log ( italic_m ) ) to achieve maximum diversity. ∎

A similar speedup as for the small gap case can be shown by applying the 2P-EAD.

Theorem 3.6.

Given a complete bipartite graph G=((L,R),E)𝐺𝐿𝑅𝐸G=((L,R),E)italic_G = ( ( italic_L , italic_R ) , italic_E ) with |R|<|L|𝑅𝐿|R|<|L|| italic_R | < | italic_L |, consider a population size μ𝜇\muitalic_μ that fulfills μ<|R|2𝜇𝑅2\mu<\frac{|R|}{2}italic_μ < divide start_ARG | italic_R | end_ARG start_ARG 2 end_ARG and μ|L||R|𝜇𝐿𝑅\mu\geq|L|-|R|italic_μ ≥ | italic_L | - | italic_R |. For the 2P-EAD, the expected time to reach maximal diversity is O(μ2m2log(m))𝑂superscript𝜇2superscript𝑚2𝑚O(\mu^{2}m^{2}\log(m))italic_O ( italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_log ( italic_m ) ).

Proof.

Consider the (μ+1)𝜇1(\mu+1)( italic_μ + 1 )-EAD applied to a complete bipartite graph G=((L,R),E)𝐺𝐿𝑅𝐸G=((L,R),E)italic_G = ( ( italic_L , italic_R ) , italic_E ) under the condition μ|L||R|𝜇𝐿𝑅\mu\geq|L|-|R|italic_μ ≥ | italic_L | - | italic_R |. Define the potential function Xtsubscript𝑋𝑡X_{t}italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT as in the previous theorem:

Xt:-divoptdiv(t).:-subscript𝑋𝑡subscriptdivoptdiv𝑡X_{t}\coloneq\text{div}_{\text{opt}}-\text{div}(t).italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT :- div start_POSTSUBSCRIPT opt end_POSTSUBSCRIPT - div ( italic_t ) .

In this adapted algorithm, we focus on efficiently increasing diversity by unmatching and then rematching only two vertices at a time. This process targets the subset of vertices in R𝑅Ritalic_R that can be rematched to different vertices in L𝐿Litalic_L to increase diversity more effectively.

Let e¯¯𝑒\overline{e}over¯ start_ARG italic_e end_ARG be the number of edges that are shared across different matchings. The expected drift in Xtsubscript𝑋𝑡X_{t}italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT per step, considering the efficient selection and rematching process of only two vertices, is given by:

E[XtXt+1Xt]e¯μn2n2(11n)n2Xtμ2n4e,𝐸delimited-[]subscript𝑋𝑡conditionalsubscript𝑋𝑡1subscript𝑋𝑡¯𝑒𝜇superscript𝑛2superscript𝑛2superscript11𝑛𝑛2subscript𝑋𝑡superscript𝜇2superscript𝑛4𝑒E[X_{t}-X_{t+1}\mid X_{t}]\geq\frac{\overline{e}}{\mu n^{2}n^{2}}\left(1-\frac% {1}{n}\right)^{n-2}\geq\frac{X_{t}}{\mu^{2}n^{4}e},italic_E [ italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_X start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ∣ italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ] ≥ divide start_ARG over¯ start_ARG italic_e end_ARG end_ARG start_ARG italic_μ italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ( 1 - divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ) start_POSTSUPERSCRIPT italic_n - 2 end_POSTSUPERSCRIPT ≥ divide start_ARG italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_n start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_e end_ARG ,

where the factor 1μn21𝜇superscript𝑛2\frac{1}{\mu n^{2}}divide start_ARG 1 end_ARG start_ARG italic_μ italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG accounts for the probability of selecting the right solution and pair of vertices and 1n21superscript𝑛2\frac{1}{n^{2}}divide start_ARG 1 end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG of making a beneficial rematch. The term (11n)n2superscript11𝑛𝑛2\left(1-\frac{1}{n}\right)^{n-2}( 1 - divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ) start_POSTSUPERSCRIPT italic_n - 2 end_POSTSUPERSCRIPT considers the probability of unmatching and rematching exactly two vertices without affecting the others.

Given that (μ2)2|R|binomial𝜇22𝑅\binom{\mu}{2}2|R|( FRACOP start_ARG italic_μ end_ARG start_ARG 2 end_ARG ) 2 | italic_R | is the maximum diversity, when all edges are pairwise distinct, it holds that X0(μ2)2|R|μ2|R||R|3m1.5n3subscript𝑋0binomial𝜇22𝑅superscript𝜇2𝑅superscript𝑅3superscript𝑚1.5superscript𝑛3X_{0}\leq\binom{\mu}{2}2|R|\leq\mu^{2}|R|\leq|R|^{3}\leq m^{1.5}\leq n^{3}italic_X start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ≤ ( FRACOP start_ARG italic_μ end_ARG start_ARG 2 end_ARG ) 2 | italic_R | ≤ italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | italic_R | ≤ | italic_R | start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ≤ italic_m start_POSTSUPERSCRIPT 1.5 end_POSTSUPERSCRIPT ≤ italic_n start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT, applying the Multiplicative Drift Theorem yields an expected runtime of O(μ2n4log(n))𝑂superscript𝜇2superscript𝑛4𝑛O(\mu^{2}n^{4}\log(n))italic_O ( italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_n start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT roman_log ( italic_n ) ) to achieve maximum diversity. Now since |L||R|μ<|R|2𝐿𝑅𝜇𝑅2|L|-|R|\leq\mu<\frac{|R|}{2}| italic_L | - | italic_R | ≤ italic_μ < divide start_ARG | italic_R | end_ARG start_ARG 2 end_ARG it holds that |R|<|L|<1.5|R|𝑅𝐿1.5𝑅|R|<|L|<1.5|R|| italic_R | < | italic_L | < 1.5 | italic_R |, which implies O(|L|)=O(|R|)𝑂𝐿𝑂𝑅O(|L|)=O(|R|)italic_O ( | italic_L | ) = italic_O ( | italic_R | ). Also by definition n=|L|+|R|𝑛𝐿𝑅n=|L|+|R|italic_n = | italic_L | + | italic_R |, so O(n2)=O(|L||R|)=O(m)𝑂superscript𝑛2𝑂𝐿𝑅𝑂𝑚O(n^{2})=O(|L||R|)=O(m)italic_O ( italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) = italic_O ( | italic_L | | italic_R | ) = italic_O ( italic_m ) and we get a bound of O(μ2m2log(m))𝑂superscript𝜇2superscript𝑚2𝑚O(\mu^{2}m^{2}\log(m))italic_O ( italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_log ( italic_m ) ). ∎

4 Runtime Analysis for paths

This section introduces key theoretical results on paths. We commence with an introduction of useful notation to simplify the following proofs. Subsequently, we present a series of theorems that delineate the expected runtime to achieve this optimal diversity. These theorems compare the performance of the (μ+1)𝜇1(\mu+1)( italic_μ + 1 )-EAD and 2P-EAD algorithms, providing a quantitative basis for assessing their efficacy.

In a path with an even number of edges, such as when m=6𝑚6m=6italic_m = 6, there are multiple ways to form a maximum matching. Each maximum matching includes exactly three edges, ensuring that no two edges in the matching share a vertex. The notation EiOjsuperscript𝐸𝑖superscript𝑂𝑗E^{i}O^{j}italic_E start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT italic_O start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT is used to represent these matchings, where i𝑖iitalic_i and j𝑗jitalic_j denote the number of edges with even and odd indices in the matching, respectively. The detailed proof is given in the following Lemma.

Lemma 4.1.

The number of different maximum matchings on a path with m𝑚mitalic_m edges is m2+1𝑚21\frac{m}{2}+1divide start_ARG italic_m end_ARG start_ARG 2 end_ARG + 1 for m𝑚mitalic_m even and 1111 for m𝑚mitalic_m odd and each is of size m2𝑚2\lceil\frac{m}{2}\rceil⌈ divide start_ARG italic_m end_ARG start_ARG 2 end_ARG ⌉. Also for even m𝑚mitalic_m each maximum matching can be described as EiOm2isuperscript𝐸𝑖superscript𝑂𝑚2𝑖E^{i}O^{\frac{m}{2}-i}italic_E start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT italic_O start_POSTSUPERSCRIPT divide start_ARG italic_m end_ARG start_ARG 2 end_ARG - italic_i end_POSTSUPERSCRIPT. For m𝑚mitalic_m odd the unique solution has the form Em2O0superscript𝐸𝑚2superscript𝑂0E^{\lceil\frac{m}{2}\rceil}O^{0}italic_E start_POSTSUPERSCRIPT ⌈ divide start_ARG italic_m end_ARG start_ARG 2 end_ARG ⌉ end_POSTSUPERSCRIPT italic_O start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT.

Proof.

We approach the proof of this lemma by employing induction to verify the claim regarding the number and arrangement of maximum matchings in path graphs of varying edge counts.
Base Case (m=1𝑚1m=1italic_m = 1,m=2𝑚2m=2italic_m = 2):
Clearly for m=1𝑚1m=1italic_m = 1 there is only one solution consisting of one edge with index 00, so the unique solution is E1O0superscript𝐸1superscript𝑂0E^{1}O^{0}italic_E start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT italic_O start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT. For m=2𝑚2m=2italic_m = 2 only one of both edges of the path can be part of the maximum matching so the maximum matchings are E1O0superscript𝐸1superscript𝑂0E^{1}O^{0}italic_E start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT italic_O start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT or E0O1superscript𝐸0superscript𝑂1E^{0}O^{1}italic_E start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT italic_O start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT.
Inductive Step:
In a maximum matching of size m+1𝑚1m+1italic_m + 1 either the last or the second to last edge of the path has to be included, else we could increase the size by including the last edge.
Case 1: (m+1)𝑚1(m+1)( italic_m + 1 ) even
If the last edge is part of the matching, then the first m1𝑚1m-1italic_m - 1 edges must also form a maximum matching, since the choice of being in the matching is independent of the last two edges. By the induction hypothesis we can extend each maximum matching on the m1𝑚1m-1italic_m - 1 edges by O𝑂Oitalic_O. If we instead include the second to last edge of the past, then the last and third to last edge of the path can’t be part of the matching, while the remaining m2𝑚2m-2italic_m - 2 edges are independent of the choice and must thus also form a maximum matching. The remaining path is then odd and thus has a unique maximum matching , so inductively the only maximum matching of this form is E0Om+12superscript𝐸0superscript𝑂𝑚12E^{0}O^{\frac{m+1}{2}}italic_E start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT italic_O start_POSTSUPERSCRIPT divide start_ARG italic_m + 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT
Case 2: (m+1)𝑚1(m+1)( italic_m + 1 ) odd
If the last edge is part of the matching, then by the induction hypothesis the maximum matching for the m1𝑚1m-1italic_m - 1 remaining edges is unique and thus the maximum matching including the last edge of even index is E0Om+12superscript𝐸0superscript𝑂𝑚12E^{0}O^{\lceil\frac{m+1}{2}\rceil}italic_E start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT italic_O start_POSTSUPERSCRIPT ⌈ divide start_ARG italic_m + 1 end_ARG start_ARG 2 end_ARG ⌉ end_POSTSUPERSCRIPT.
If we instead include the second to last edge of the past, then the last and third to last edge of the path can’t be part of the matching, while the remaining m2𝑚2m-2italic_m - 2 edges are independent of the choice and must thus also form a maximum matching. The remaining path is then even and by the induction hypothesis each matching will have m22+1𝑚221\frac{m-2}{2}+1divide start_ARG italic_m - 2 end_ARG start_ARG 2 end_ARG + 1 edges, which is not maximum since by instead including the last edge we obtain a matching of size m2+1𝑚21\frac{m}{2}+1divide start_ARG italic_m end_ARG start_ARG 2 end_ARG + 1. ∎

With an even number of edges, such as m=6𝑚6m=6italic_m = 6, there are the following maximum matching configurations, represented as (matching edges in red)

Matching E3O0superscript𝐸3superscript𝑂0E^{3}O^{0}italic_E start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_O start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT:

012345

Matching E2O1superscript𝐸2superscript𝑂1E^{2}O^{1}italic_E start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_O start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT:

012345

Matching E1O2superscript𝐸1superscript𝑂2E^{1}O^{2}italic_E start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT italic_O start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT:

012345

Matching E0O3superscript𝐸0superscript𝑂3E^{0}O^{3}italic_E start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT italic_O start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT:

012345

With an odd number of edges, such as m=5𝑚5m=5italic_m = 5, there is only one maximum matching configuration, represented as

Matching E3O0superscript𝐸3superscript𝑂0E^{3}O^{0}italic_E start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_O start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT:

01234

In each case, every vertex is incident to at most one matching edge, and the EiOjsuperscript𝐸𝑖superscript𝑂𝑗E^{i}O^{j}italic_E start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT italic_O start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT notation describes the composition of the matching in terms of even and odd-indexed edges.

The following Lemma characterizes the conditions for maximal diversity within a population using this notation.

Lemma 4.2 (Diversity of a Population).

The population with optimum diversity for even μ𝜇\muitalic_μ contains for each j𝑗jitalic_j from 00 to μ21𝜇21\lfloor\frac{\mu}{2}\rfloor-1⌊ divide start_ARG italic_μ end_ARG start_ARG 2 end_ARG ⌋ - 1 the individuals EjOm2jsuperscript𝐸𝑗superscript𝑂𝑚2𝑗E^{j}O^{\frac{m}{2}-j}italic_E start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT italic_O start_POSTSUPERSCRIPT divide start_ARG italic_m end_ARG start_ARG 2 end_ARG - italic_j end_POSTSUPERSCRIPT and Em2jOjsuperscript𝐸𝑚2𝑗superscript𝑂𝑗E^{\frac{m}{2}-j}O^{j}italic_E start_POSTSUPERSCRIPT divide start_ARG italic_m end_ARG start_ARG 2 end_ARG - italic_j end_POSTSUPERSCRIPT italic_O start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT. For odd μ𝜇\muitalic_μ and μ2km2μ2𝜇2𝑘𝑚2𝜇2\lfloor\frac{\mu}{2}\rfloor\leq k\leq\frac{m}{2}-\lfloor\frac{\mu}{2}\rfloor⌊ divide start_ARG italic_μ end_ARG start_ARG 2 end_ARG ⌋ ≤ italic_k ≤ divide start_ARG italic_m end_ARG start_ARG 2 end_ARG - ⌊ divide start_ARG italic_μ end_ARG start_ARG 2 end_ARG ⌋ it further contains any one individual of the form EkOm2ksuperscript𝐸𝑘superscript𝑂𝑚2𝑘E^{k}O^{\frac{m}{2}-k}italic_E start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_O start_POSTSUPERSCRIPT divide start_ARG italic_m end_ARG start_ARG 2 end_ARG - italic_k end_POSTSUPERSCRIPT.

Proof.

We approach the proof of this lemma by employing induction to verify the claim regarding the number and arrangement of maximum matchings in path graphs of varying population sizes.
Base Case (μ=1𝜇1\mu=1italic_μ = 1, μ=2𝜇2\mu=2italic_μ = 2):
For μ=1𝜇1\mu=1italic_μ = 1 any solution maximizes the diversity of 00. For μ=2𝜇2\mu=2italic_μ = 2, the population with maximum diversity contains E0Om2superscript𝐸0superscript𝑂𝑚2E^{0}O^{\frac{m}{2}}italic_E start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT italic_O start_POSTSUPERSCRIPT divide start_ARG italic_m end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT and Em2O0superscript𝐸𝑚2superscript𝑂0E^{\frac{m}{2}}O^{0}italic_E start_POSTSUPERSCRIPT divide start_ARG italic_m end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT italic_O start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT with maximum diversity of m𝑚mitalic_m. Suppose that there exists another maximum matching population of size 2222, since the diversity has to be m𝑚mitalic_m, if the first matching is M𝑀Mitalic_M then the second matching must be the complement ME𝑀𝐸M\setminus Eitalic_M ∖ italic_E. As soon as edges with both even and odd indices are part of M𝑀Mitalic_M, either M𝑀Mitalic_M or ME𝑀𝐸M\setminus Eitalic_M ∖ italic_E does not have the form EiOm2isuperscript𝐸𝑖superscript𝑂𝑚2𝑖E^{i}O^{\frac{m}{2}-i}italic_E start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT italic_O start_POSTSUPERSCRIPT divide start_ARG italic_m end_ARG start_ARG 2 end_ARG - italic_i end_POSTSUPERSCRIPT and can’t be a valid maximum matching by Lemma 12.
Inductive Step:
Suppose by way of contradiction that E0Om2superscript𝐸0superscript𝑂𝑚2E^{0}O^{\frac{m}{2}}italic_E start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT italic_O start_POSTSUPERSCRIPT divide start_ARG italic_m end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT is not part of the population. Then there exists an i1𝑖1i\geq 1italic_i ≥ 1 such that for all solutions in the population the first i𝑖iitalic_i edges with even index are part of the solution and the (i+1)𝑖1(i+1)( italic_i + 1 )th edge of one solution has odd index. By changing the i𝑖iitalic_ith edge of the solution to also be of odd index we would increase the diversity, which contradicts the assumption of having maximum diversity. Analogously this holds for Em2O0superscript𝐸𝑚2superscript𝑂0E^{\frac{m}{2}}O^{0}italic_E start_POSTSUPERSCRIPT divide start_ARG italic_m end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT italic_O start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT. Since all individuals are distinct all other solutions must start with an even edge and end with an odd edge. To the remaining μ2𝜇2\mu-2italic_μ - 2 individuals restricted on the inner m4𝑚4m-4italic_m - 4 edges we can then apply the Induction Hypothesis, so for even μ𝜇\muitalic_μ the population further contains for each j𝑗jitalic_j from 00 to μ22𝜇22\lfloor\frac{\mu}{2}\rfloor-2⌊ divide start_ARG italic_μ end_ARG start_ARG 2 end_ARG ⌋ - 2 the individuals

EEjOm42jO=Ej+1Om2j1𝐸superscript𝐸𝑗superscript𝑂𝑚42𝑗𝑂superscript𝐸𝑗1superscript𝑂𝑚2𝑗1EE^{j}O^{\frac{m-4}{2}-j}O=E^{j+1}O^{\frac{m}{2}-j-1}italic_E italic_E start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT italic_O start_POSTSUPERSCRIPT divide start_ARG italic_m - 4 end_ARG start_ARG 2 end_ARG - italic_j end_POSTSUPERSCRIPT italic_O = italic_E start_POSTSUPERSCRIPT italic_j + 1 end_POSTSUPERSCRIPT italic_O start_POSTSUPERSCRIPT divide start_ARG italic_m end_ARG start_ARG 2 end_ARG - italic_j - 1 end_POSTSUPERSCRIPT

and

EEm42jOjO=Em2j1Oj+1𝐸superscript𝐸𝑚42𝑗superscript𝑂𝑗𝑂superscript𝐸𝑚2𝑗1superscript𝑂𝑗1EE^{\frac{m-4}{2}-j}O^{j}O=E^{\frac{m}{2}-j-1}O^{j+1}italic_E italic_E start_POSTSUPERSCRIPT divide start_ARG italic_m - 4 end_ARG start_ARG 2 end_ARG - italic_j end_POSTSUPERSCRIPT italic_O start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT italic_O = italic_E start_POSTSUPERSCRIPT divide start_ARG italic_m end_ARG start_ARG 2 end_ARG - italic_j - 1 end_POSTSUPERSCRIPT italic_O start_POSTSUPERSCRIPT italic_j + 1 end_POSTSUPERSCRIPT

. For odd μ𝜇\muitalic_μ and μ2km2μ2𝜇2𝑘𝑚2𝜇2\lfloor\frac{\mu}{2}\rfloor\leq k\leq\frac{m}{2}-\lfloor\frac{\mu}{2}\rfloor⌊ divide start_ARG italic_μ end_ARG start_ARG 2 end_ARG ⌋ ≤ italic_k ≤ divide start_ARG italic_m end_ARG start_ARG 2 end_ARG - ⌊ divide start_ARG italic_μ end_ARG start_ARG 2 end_ARG ⌋ it further contains any one individual of the form Ek+1Om2k1superscript𝐸𝑘1superscript𝑂𝑚2𝑘1E^{k+1}O^{\frac{m}{2}-k-1}italic_E start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT italic_O start_POSTSUPERSCRIPT divide start_ARG italic_m end_ARG start_ARG 2 end_ARG - italic_k - 1 end_POSTSUPERSCRIPT.

Building up on this, in the following theorem we show that there is always a local improvement, needing 2222 bit flips, to improve diversity.

Theorem 4.3.

In the (μ+1)𝜇1(\mu+1)( italic_μ + 1 )-EAD applied to a path with m𝑚mitalic_m edges, the expected time until the diversity is maximized is O(μ3m3)𝑂superscript𝜇3superscript𝑚3O(\mu^{3}m^{3})italic_O ( italic_μ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_m start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ).

Proof.

We consider a path graph with an even number of edges m𝑚mitalic_m, where multiple maximum matchings are possible. The maximum matching is unique when m𝑚mitalic_m is odd, hence the maximum diversity is trivially obtained in that case. Therefore, our analysis focuses on when m𝑚mitalic_m is even.

Within a population, suppose there is duplication. By Lemma 4.2 it follows that there exists at least one individual for which the first i0𝑖0i\geq 0italic_i ≥ 0 matched edges have even indices without another individual having the first i+1𝑖1i+1italic_i + 1 matched edges with even indices, or an individual where the last i0𝑖0i\geq 0italic_i ≥ 0 matched edges have odd indices without another individual having the last i+1𝑖1i+1italic_i + 1 matched edges with odd indices.

Considering that the total number of distinct maximum matchings for a path with m𝑚mitalic_m edges exceeds μ𝜇\muitalic_μ, the likelihood of choosing an individual from the current population and correctly flipping two edges to enhance diversity is at least 1μ1m2(11m)m21𝜇1superscript𝑚2superscript11𝑚𝑚2\frac{1}{\mu}\frac{1}{m^{2}}(1-\frac{1}{m})^{m-2}divide start_ARG 1 end_ARG start_ARG italic_μ end_ARG divide start_ARG 1 end_ARG start_ARG italic_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ( 1 - divide start_ARG 1 end_ARG start_ARG italic_m end_ARG ) start_POSTSUPERSCRIPT italic_m - 2 end_POSTSUPERSCRIPT. This lower bound on the probability yields a diversity improvement of at least 1.

If the population has not reached maximal diversity but consists of pairwise distinct maximum matchings, then there must exist a maximal 0jμ210𝑗𝜇210\leq j\leq\lfloor\frac{\mu}{2}-1\rfloor0 ≤ italic_j ≤ ⌊ divide start_ARG italic_μ end_ARG start_ARG 2 end_ARG - 1 ⌋ such that EjOm2jsuperscript𝐸𝑗superscript𝑂𝑚2𝑗E^{j}O^{\frac{m}{2}-j}italic_E start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT italic_O start_POSTSUPERSCRIPT divide start_ARG italic_m end_ARG start_ARG 2 end_ARG - italic_j end_POSTSUPERSCRIPT or Em2jOjsuperscript𝐸𝑚2𝑗superscript𝑂𝑗E^{\frac{m}{2}-j}O^{j}italic_E start_POSTSUPERSCRIPT divide start_ARG italic_m end_ARG start_ARG 2 end_ARG - italic_j end_POSTSUPERSCRIPT italic_O start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT is not present in the population. W.l.og. let this be EjOm2jsuperscript𝐸𝑗superscript𝑂𝑚2𝑗E^{j}O^{\frac{m}{2}-j}italic_E start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT italic_O start_POSTSUPERSCRIPT divide start_ARG italic_m end_ARG start_ARG 2 end_ARG - italic_j end_POSTSUPERSCRIPT. We focus on the individual EkOm2k,k<jsuperscript𝐸𝑘superscript𝑂𝑚2𝑘𝑘𝑗E^{k}O^{\frac{m}{2}-k},k<jitalic_E start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_O start_POSTSUPERSCRIPT divide start_ARG italic_m end_ARG start_ARG 2 end_ARG - italic_k end_POSTSUPERSCRIPT , italic_k < italic_j with most odd edges. By applying a 2-bit flip we get Ek1Om2k+1superscript𝐸𝑘1superscript𝑂𝑚2𝑘1E^{k-1}O^{\frac{m}{2}-k+1}italic_E start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT italic_O start_POSTSUPERSCRIPT divide start_ARG italic_m end_ARG start_ARG 2 end_ARG - italic_k + 1 end_POSTSUPERSCRIPT. The diversity change, by replacing the parent, would be only determined by this edge change. This new odd edge is already used by j𝑗jitalic_j matchings, since j𝑗jitalic_j is maximal, and only those since else EkOm2ksuperscript𝐸𝑘superscript𝑂𝑚2𝑘E^{k}O^{\frac{m}{2}-k}italic_E start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_O start_POSTSUPERSCRIPT divide start_ARG italic_m end_ARG start_ARG 2 end_ARG - italic_k end_POSTSUPERSCRIPT would not have the most odd edges of the remaining population. By symmetry the deactivated even edge is used in μj𝜇𝑗\mu-jitalic_μ - italic_j solutions (excluding the parent). Thus the change in diversity by replacing the parent would be μjj=μ2j𝜇𝑗𝑗𝜇2𝑗\mu-j-j=\mu-2jitalic_μ - italic_j - italic_j = italic_μ - 2 italic_j. By choice of j𝑗jitalic_j this is strictly positive. Since replacing the parent is possible, the diversity increase is at least of this size. Let Xtsubscript𝑋𝑡X_{t}italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT denote the difference between the optimal diversity and the current diversity at time t𝑡titalic_t. The possibility of enhancing diversity via a two-bit flip provides us with a drift given by

E[XtXt+1Xt]1μm2(11m)m21μm2e.𝐸delimited-[]subscript𝑋𝑡conditionalsubscript𝑋𝑡1subscript𝑋𝑡1𝜇superscript𝑚2superscript11𝑚𝑚21𝜇superscript𝑚2𝑒E[X_{t}-X_{t+1}\mid X_{t}]\geq\frac{1}{\mu m^{2}}\left(1-\frac{1}{m}\right)^{m% -2}\geq\frac{1}{\mu m^{2}e}.italic_E [ italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_X start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ∣ italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ] ≥ divide start_ARG 1 end_ARG start_ARG italic_μ italic_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ( 1 - divide start_ARG 1 end_ARG start_ARG italic_m end_ARG ) start_POSTSUPERSCRIPT italic_m - 2 end_POSTSUPERSCRIPT ≥ divide start_ARG 1 end_ARG start_ARG italic_μ italic_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_e end_ARG .

Since the initial diversity deficit X0subscript𝑋0X_{0}italic_X start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is at most mμ2𝑚superscript𝜇2m\mu^{2}italic_m italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (each pair of solutions can have a hamming distance of at most m𝑚mitalic_m), applying the additive drift theorem results in a runtime estimation of O(μ3m3)𝑂superscript𝜇3superscript𝑚3O(\mu^{3}m^{3})italic_O ( italic_μ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_m start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ). ∎

Theorem 4.4.

In the 2P-EAD applied to a path with m𝑚mitalic_m edges, the expected time until the diversity is maximized is O(μ3m2)𝑂superscript𝜇3superscript𝑚2O(\mu^{3}m^{2})italic_O ( italic_μ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ).

Proof.

Since the proof follows closely the arguments presented in Theorem 4.3, we will focus only on the different bounds on drift, which is the main differing element.

Any maximum matching EjOm2j,j>0superscript𝐸𝑗superscript𝑂𝑚2𝑗𝑗0E^{j}O^{\frac{m}{2}-j},j>0italic_E start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT italic_O start_POSTSUPERSCRIPT divide start_ARG italic_m end_ARG start_ARG 2 end_ARG - italic_j end_POSTSUPERSCRIPT , italic_j > 0 can be chosen with probability 1μ1𝜇\frac{1}{\mu}divide start_ARG 1 end_ARG start_ARG italic_μ end_ARG and be mutated to Ej1Om2j+1superscript𝐸𝑗1superscript𝑂𝑚2𝑗1E^{j-1}O^{\frac{m}{2}-j+1}italic_E start_POSTSUPERSCRIPT italic_j - 1 end_POSTSUPERSCRIPT italic_O start_POSTSUPERSCRIPT divide start_ARG italic_m end_ARG start_ARG 2 end_ARG - italic_j + 1 end_POSTSUPERSCRIPT by unmatching the jth vertex and rematching him with probability 1212\frac{1}{2}divide start_ARG 1 end_ARG start_ARG 2 end_ARG to his unmatched left neighbour. Since all previous edges have to be of even index this neighbour must be unmatched. Analogously it holds for EjOm2j,j<m1superscript𝐸𝑗superscript𝑂𝑚2𝑗𝑗𝑚1E^{j}O^{\frac{m}{2}-j},j<m-1italic_E start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT italic_O start_POSTSUPERSCRIPT divide start_ARG italic_m end_ARG start_ARG 2 end_ARG - italic_j end_POSTSUPERSCRIPT , italic_j < italic_m - 1 to Ej+1Om2j1superscript𝐸𝑗1superscript𝑂𝑚2𝑗1E^{j+1}O^{\frac{m}{2}-j-1}italic_E start_POSTSUPERSCRIPT italic_j + 1 end_POSTSUPERSCRIPT italic_O start_POSTSUPERSCRIPT divide start_ARG italic_m end_ARG start_ARG 2 end_ARG - italic_j - 1 end_POSTSUPERSCRIPT. For both the case of having duplicates or not being optimal in Theorem 4.3 we make use of such a local edge swap. The drift is therefore given by

E[XtXt+1Xt]1μn2(11n)n11μn2e.𝐸delimited-[]subscript𝑋𝑡conditionalsubscript𝑋𝑡1subscript𝑋𝑡1𝜇𝑛2superscript11𝑛𝑛11𝜇𝑛2𝑒E[X_{t}-X_{t+1}\mid X_{t}]\geq\frac{1}{\mu n2}\left(1-\frac{1}{n}\right)^{n-1}% \geq\frac{1}{\mu n2e}.italic_E [ italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_X start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ∣ italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ] ≥ divide start_ARG 1 end_ARG start_ARG italic_μ italic_n 2 end_ARG ( 1 - divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ) start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT ≥ divide start_ARG 1 end_ARG start_ARG italic_μ italic_n 2 italic_e end_ARG .

Where (11n)n1superscript11𝑛𝑛1\left(1-\frac{1}{n}\right)^{n-1}( 1 - divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ) start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT is the probability of not rematching any other vertex. Given that the initial diversity deficit X0subscript𝑋0X_{0}italic_X start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is at most mμ2𝑚superscript𝜇2m\mu^{2}italic_m italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (each pair of solutions can have a hamming distance of at most m𝑚mitalic_m), the additive drift theorem provides an upper bound on the expected run time of O(μ3m2)𝑂superscript𝜇3superscript𝑚2O(\mu^{3}m^{2})italic_O ( italic_μ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ), since m=n1𝑚𝑛1m=n-1italic_m = italic_n - 1. ∎

Refer to caption
(a) EAD and 2P with μ=8𝜇8\mu=8italic_μ = 8 in Comp. Bip. Graphs
Refer to caption
(b) 2P with fixed μ=8𝜇8\mu=8italic_μ = 8 in Comp. Bip. Graphs
Refer to caption
(c) AED and 2P with fixed m𝑚mitalic_m in Comp. Bip. Graphs
Refer to caption
(d) 2P with fixed m𝑚mitalic_m in Comp. Bip. Graphs
Figure 1: Experimental results on complete bipartite graphs

5 Empirical Analysis

In this section, we present our empirical findings on the performance of the evolutionary diversity algorithms on complete bipartite graphs and paths. Our experiments were designed to test the theoretical predictions made in previous sections, particularly focusing on the efficiency of the algorithm in terms of the number of iterations required to achieve optimal diversity.

5.1 Experimental Setup

Our experiments were designed to explore the performance dynamics of the algorithms under two specific conditions: when the population size μ𝜇\muitalic_μ is held constant and when the number of edges m𝑚mitalic_m remains fixed.

Complete Bipartite Graphs

The starting condition for complete bipartite graphs involves a maximum matching where for each 0i|R|1,0𝑖𝑅10\leq i\leq|R|-1,0 ≤ italic_i ≤ | italic_R | - 1 ,riRsubscript𝑟𝑖𝑅r_{i}\in Ritalic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ italic_R is matched to liLsubscript𝑙𝑖𝐿l_{i}\in Litalic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ italic_L, forming a homogeneous initial population. In the constant μ𝜇\muitalic_μ scenario, we increase the size of both L𝐿Litalic_L and R𝑅Ritalic_R by one unit per iteration to maintain a steady |L||R|𝐿𝑅|L|-|R|| italic_L | - | italic_R | difference, allowing a controlled analysis of the algorithms’ scalability. In the constant m𝑚mitalic_m scenario we simply increase μ𝜇\muitalic_μ by one per iteration.

Paths

For paths, the initial population comprises maximum matchings including all even-indexed edges. With a fixed μ𝜇\muitalic_μ, the number of edges is incrementally increased by ten in each iteration, in order to cover a wider set of problem sizes, while staying experimentally feasible. In the constant m𝑚mitalic_m case, out of feasibility, we simply increase μ𝜇\muitalic_μ by one per iteration.

5.2 Methodology

Each experiment was conducted 30 times to determine the average number of iterations and the standard deviation, estimating the algorithms’ asymptotic runtime for both fixed population size (μ𝜇\muitalic_μ) and a fixed number of edges (m𝑚mitalic_m). For complete bipartite graphs and fixed m𝑚mitalic_m we chose |L|=24𝐿24|L|=24| italic_L | = 24 and |R|=23𝑅23|R|=23| italic_R | = 23 for the small gap case and |L|=34𝐿34|L|=34| italic_L | = 34 and |R|=23𝑅23|R|=23| italic_R | = 23 for the big gap case, such that the number of edges m=782𝑚782m=782italic_m = 782 for the small gap case and m=756𝑚756m=756italic_m = 756 for the big gap case are comparable in size.

Refer to caption
(a) EAD and 2P with fixed μ=8𝜇8\mu=8italic_μ = 8 in paths
Refer to caption
(b) 2P with fixed μ=8𝜇8\mu=8italic_μ = 8 in paths
Refer to caption
(c) EAD and 2P with fixed m=100𝑚100m=100italic_m = 100 in paths
Refer to caption
(d) 2P with fixed m=100𝑚100m=100italic_m = 100 in paths
Figure 2: Experimental results on paths

5.3 Complete Bipartite Graphs

This subsection focuses on the performance of evolutionary diversity algorithms on complete bipartite graphs, specifically examining the (μ+1)𝜇1(\mu+1)( italic_μ + 1 )-EAD and 2P-EAD algorithms.

(μ+1)𝜇1(\mu+1)( italic_μ + 1 )-EAD

In Figure 1(a), we show the average number of iterations for a fixed population size of μ=8𝜇8\mu=8italic_μ = 8 and different values of |L||R|𝐿𝑅|L|-|R|| italic_L | - | italic_R |. Specifically, we examine cases where the difference |L||R|𝐿𝑅|L|-|R|| italic_L | - | italic_R | is either 1111, referred to as the ’small gap’ scenario or μ+1𝜇1\mu+1italic_μ + 1, the ’big gap’ scenario. The (μ+1)𝜇1(\mu+1)( italic_μ + 1 )-EAD algorithm presented a quadratic growth in m𝑚mitalic_m for the big gap case in iterations, empirically estimated as μm2𝜇superscript𝑚2\mu m^{2}italic_μ italic_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, suggesting an out-performance by a factor of approximately μlog(m)𝜇𝑚\mu\log(m)italic_μ roman_log ( italic_m ) over the theoretical bound. For the small gap case we empirically estimate the run time as μm2.5𝜇superscript𝑚2.5\mu m^{2.5}italic_μ italic_m start_POSTSUPERSCRIPT 2.5 end_POSTSUPERSCRIPT, an even stronger suggested out-performance by a factor of μm1.5log(m)𝜇superscript𝑚1.5𝑚\mu m^{1.5}\log(m)italic_μ italic_m start_POSTSUPERSCRIPT 1.5 end_POSTSUPERSCRIPT roman_log ( italic_m ) when compared against the theoretical bound of O(μ2m4log(m))𝑂superscript𝜇2superscript𝑚4𝑚O(\mu^{2}m^{4}\log(m))italic_O ( italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_m start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT roman_log ( italic_m ) ).

In Figure 1(c), we display the average iteration counts for a constant edge count m𝑚mitalic_m, considering the same values of |L||R|𝐿𝑅|L|-|R|| italic_L | - | italic_R |. These findings echo the trends observed in Figure 1(a), showcasing how the algorithm’s behavior remains consistent across different graph sizes and population disparities.

2P-EAD

In Figure 1(b) for μ𝜇\muitalic_μ fixed and Figure 1(d) for m𝑚mitalic_m fixed, we zoom in on the results for the 2P-EAD algorithm. For both the small and big gap case the 2P-EAD algorithm exhibited a linear increase in the number of iterations with respect to m𝑚mitalic_m when μ𝜇\muitalic_μ was held constant and vice versa. Empirically, the run time for 2P-EAD was observed to be close to μm𝜇𝑚\mu mitalic_μ italic_m, a notable deviation from the predicted O(μ2mlog(m))𝑂superscript𝜇2𝑚𝑚O(\mu^{2}m\log(m))italic_O ( italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_m roman_log ( italic_m ) ). The results summarized in Table 1 provide a summary of these observations. It is evident that the performance of the 2P-EAD algorithm is not only superior in practice but also suggests that our theoretical bounds may be refined to more closely predict the empirical outcomes.

5.4 Paths

This subsection focuses on the performance of evolutionary diversity algorithms on paths, specifically examining the (μ+1)𝜇1(\mu+1)( italic_μ + 1 )-EAD and 2P-EAD algorithms.

(μ+1)𝜇1(\mu+1)( italic_μ + 1 )-EAD

In Figure 2(a), we present the average number of iterations when the population size μ𝜇\muitalic_μ is fixed at 8. The graph illustrates how the number of iterations required for convergence changes as the number of edges m𝑚mitalic_m in the path increases. Figure 2(c) shows the average number of iterations for a fixed number of edges m=100𝑚100m=100italic_m = 100 and varying population size μ𝜇\muitalic_μ. For the (μ+1)𝜇1(\mu+1)( italic_μ + 1 )-EAD algorithm, a trend of polynomial growth in the number of iterations is observed as a function of the problem size. When μ𝜇\muitalic_μ is fixed at 8, the empirical runtime grows in line with μm3𝜇superscript𝑚3\mu m^{3}italic_μ italic_m start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT, which could indicate a performance better than the theoretical upper bound of O(μ3m3)𝑂superscript𝜇3superscript𝑚3O(\mu^{3}m^{3})italic_O ( italic_μ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_m start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ) by a factor of μ2superscript𝜇2\mu^{2}italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT.

2P-EAD

When we examine the 2P-EAD algorithm in Figure 2(b) for a fixed μ𝜇\muitalic_μ, and in Figure 2(d) for a fixed m𝑚mitalic_m, we notice a similar pattern. The empirical runtime for the 2P-EAD is consistently around μm2𝜇superscript𝑚2\mu m^{2}italic_μ italic_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, also possibly deviating by a factor of μ2superscript𝜇2\mu^{2}italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT from the theoretical O(μ3m2)𝑂superscript𝜇3superscript𝑚2O(\mu^{3}m^{2})italic_O ( italic_μ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) bound. The results in Table 2 provide a summary of these observations. It is evident that the performance of the 2P-EAD algorithm is not only superior in practice but also suggests that our theoretical bounds may be refined to more closely predict the empirical outcomes.

Table 1: Summary of results for complete bipartite graphs
Algo. |L||R|>μ𝐿𝑅𝜇|L|-|R|>\mu| italic_L | - | italic_R | > italic_μ |L||R|μ𝐿𝑅𝜇|L|-|R|\leq\mu| italic_L | - | italic_R | ≤ italic_μ
Empirical Theor. UB Empirical Theor. UB
EAD μm2similar-toabsent𝜇superscript𝑚2\sim\mu m^{2}∼ italic_μ italic_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT O(μ2m2log(m))𝑂superscript𝜇2superscript𝑚2𝑚O(\mu^{2}m^{2}\log(m))italic_O ( italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_log ( italic_m ) ) μm2.5similar-toabsent𝜇superscript𝑚2.5\sim\mu m^{2.5}∼ italic_μ italic_m start_POSTSUPERSCRIPT 2.5 end_POSTSUPERSCRIPT O(μ2m4log(m))𝑂superscript𝜇2superscript𝑚4𝑚O(\mu^{2}m^{4}\log(m))italic_O ( italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_m start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT roman_log ( italic_m ) )
2P μmsimilar-toabsent𝜇𝑚\sim\mu m∼ italic_μ italic_m O(μ2n2log(n))𝑂superscript𝜇2superscript𝑛2𝑛O(\mu^{2}n^{2}\log(n))italic_O ( italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_log ( italic_n ) ) μmsimilar-toabsent𝜇𝑚\sim\mu m∼ italic_μ italic_m O(μ2m2log(m))𝑂superscript𝜇2superscript𝑚2𝑚O(\mu^{2}m^{2}\log(m))italic_O ( italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_log ( italic_m ) )
Table 2: Summary of results for paths
Algorithm Empirical Theor. UB
EAD μm3similar-toabsent𝜇superscript𝑚3\sim\mu m^{3}∼ italic_μ italic_m start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT O(μ3m3)𝑂superscript𝜇3superscript𝑚3O(\mu^{3}m^{3})italic_O ( italic_μ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_m start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT )
2P μm2similar-toabsent𝜇superscript𝑚2\sim\mu m^{2}∼ italic_μ italic_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT O(μ3m2)𝑂superscript𝜇3superscript𝑚2O(\mu^{3}m^{2})italic_O ( italic_μ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT )

6 Conclusions

In this study, we explored the application of evolutionary algorithms (EAs) for maximizing diversity in solving the maximum matching problem in complete bipartite graphs and paths. Our methodology was structured into two distinct phases: a rigorous theoretical analysis followed by comprehensive empirical evaluations. We specifically looked at the (μ+1)𝜇1(\mu+1)( italic_μ + 1 )-EAD and the Two-Phase Matching Evolutionary Algorithm (2P-EAD), finding that both could achieve maximal diversity in expected polynomial time, with 2P-EAD showing a speed advantage in all scenarios. Our findings not only underscore the utility of EAs in combinatorial diversity problems but also open up avenues for further research. A significant future direction would be to refine the theoretical upper bounds of these algorithms’ runtime. Additionally, applying these insights to other graph problems and exploring real-world applications, could provide practical benefits.

Acknowledgements

This work has been supported by the Australian Research Council through grant DP190103894.

References

  • [1] Alvarez, A., Dahlskog, S., Font, J.M., Togelius, J.: Empowering quality diversity in dungeon design with interactive constrained map-elites. In: IEEE Conference on Games, CoG 2019. pp. 1–8. IEEE (2019). https://doi.org/10.1109/CIG.2019.8848022
  • [2] Bossek, J., Neumann, A., Neumann, F.: Breeding diverse packings for the knapsack problem by means of diversity-tailored evolutionary algorithms. In: Chicano, F., Krawiec, K. (eds.) GECCO ’21: Genetic and Evolutionary Computation Conference, Lille, France, July 10-14, 2021. pp. 556–564. ACM (2021). https://doi.org/10.1145/3449639.3459364
  • [3] Bossek, J., Neumann, F.: Evolutionary diversity optimization and the minimum spanning tree problem. In: Chicano, F., Krawiec, K. (eds.) GECCO ’21: Genetic and Evolutionary Computation Conference, Lille, France, July 10-14, 2021. pp. 198–206. ACM (2021). https://doi.org/10.1145/3449639.3459363
  • [4] Bossens, D.M., Tarapore, D.: QED: using quality-environment-diversity to evolve resilient robot swarms. IEEE Trans. Evol. Comput. 25(2), 346–357 (2021). https://doi.org/10.1109/TEVC.2020.3036578
  • [5] Cully, A., Demiris, Y.: Quality and diversity optimization: A unifying modular framework. IEEE Trans. Evol. Comput. 22(2), 245–259 (2018). https://doi.org/10.1109/TEVC.2017.2704781
  • [6] Do, A.V., Bossek, J., Neumann, A., Neumann, F.: Evolving diverse sets of tours for the travelling salesperson problem. In: Coello, C.A.C. (ed.) GECCO ’20: Genetic and Evolutionary Computation Conference, Cancún Mexico, July 8-12, 2020. pp. 681–689. ACM (2020). https://doi.org/10.1145/3377930.3389844
  • [7] Do, A.V., Guo, M., Neumann, A., Neumann, F.: Analysis of evolutionary diversity optimization for permutation problems. ACM Trans. Evol. Learn. Optim. 2(3), 11:1–11:27 (2022). https://doi.org/10.1145/3561974, https://doi.org/10.1145/3561974
  • [8] Do, A.V., Guo, M., Neumann, A., Neumann, F.: Diverse approximations for monotone submodular maximization problems with a matroid constraint. In: Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, IJCAI 2023. pp. 5558–5566. ijcai.org (2023). https://doi.org/10.24963/IJCAI.2023/617, https://doi.org/10.24963/ijcai.2023/617
  • [9] Doerr, B., Johannsen, D., Winzen, C.: Multiplicative drift analysis. Algorithmica 64(4), 673–697 (2012). https://doi.org/10.1007/S00453-012-9622-X
  • [10] Friedrich, T., Horoba, C., Neumann, F.: Illustration of fairness in evolutionary multi-objective optimization. Theor. Comput. Sci. 412(17), 1546–1556 (2011). https://doi.org/10.1016/J.TCS.2010.09.023
  • [11] Friedrich, T., Oliveto, P.S., Sudholt, D., Witt, C.: Analysis of diversity-preserving mechanisms for global exploration. Evol. Comput. 17(4), 455–476 (2009). https://doi.org/10.1162/EVCO.2009.17.4.17401
  • [12] Gao, W., Nallaperuma, S., Neumann, F.: Feature-based diversity optimization for problem instance classification. Evol. Comput. 29(1), 107–128 (2021). https://doi.org/10.1162/EVCO_A_00274, https://doi.org/10.1162/evco_a_00274
  • [13] Gao, W., Pourhassan, M., Neumann, F.: Runtime analysis of evolutionary diversity optimization and the vertex cover problem. In: Silva, S., Esparcia-Alcázar, A.I. (eds.) Genetic and Evolutionary Computation Conference, GECCO 2015, Companion Material Proceedings. pp. 1395–1396. ACM (2015). https://doi.org/10.1145/2739482.2764668
  • [14] Giel, O., Wegener, I.: Evolutionary algorithms and the maximum matching problem. In: Alt, H., Habib, M. (eds.) STACS 2003, 20th Annual Symposium on Theoretical Aspects of Computer Science. Lecture Notes in Computer Science, vol. 2607, pp. 415–426. Springer (2003). https://doi.org/10.1007/3-540-36494-3_37
  • [15] Gounder, S., Neumann, F., Neumann, A.: Evolutionary diversity optimisation for sparse directed communication networks. In: Genetic and Evolutionary Computation Conference, GECCO 2024. ACM (2024), to appear
  • [16] Gravina, D., Khalifa, A., Liapis, A., Togelius, J., Yannakakis, G.N.: Procedural content generation through quality diversity. In: IEEE Conference on Games, CoG 2019, London, United Kingdom, August 20-23, 2019. pp. 1–8. IEEE (2019). https://doi.org/10.1109/CIG.2019.8848053
  • [17] He, J., Yao, X.: A study of drift analysis for estimating computation time of evolutionary algorithms. Nat. Comput. 3(1), 21–35 (2004). https://doi.org/10.1023/B:NACO.0000023417.31393.C7
  • [18] Mouret, J.B., Clune, J.: Illuminating search spaces by mapping elites. arXiv preprint arXiv:1504.04909 (2015)
  • [19] Neumann, A., Antipov, D., Neumann, F.: Coevolutionary pareto diversity optimization. In: GECCO ’22: Genetic and Evolutionary Computation Conference. pp. 832–839. ACM (2022). https://doi.org/10.1145/3512290.3528755, https://doi.org/10.1145/3512290.3528755
  • [20] Neumann, A., Bossek, J., Neumann, F.: Diversifying greedy sampling and evolutionary diversity optimisation for constrained monotone submodular functions. In: GECCO ’21: Genetic and Evolutionary Computation Conference. pp. 261–269. ACM (2021). https://doi.org/10.1145/3449639.3459385
  • [21] Neumann, A., Gao, W., Doerr, C., Neumann, F., Wagner, M.: Discrepancy-based evolutionary diversity optimization. In: Proceedings of the Genetic and Evolutionary Computation Conference. pp. 991–998. ACM (2018). https://doi.org/10.1145/3205455.3205532, https://doi.org/10.1145/3205455.3205532
  • [22] Neumann, A., Gao, W., Wagner, M., Neumann, F.: Evolutionary diversity optimization using multi-objective indicators. In: Proceedings of the Genetic and Evolutionary Computation Conference, GECCO 2019. pp. 837–845. ACM (2019). https://doi.org/10.1145/3321707.3321796, https://doi.org/10.1145/3321707.3321796
  • [23] Neumann, A., Gounder, S., Yan, X., Sherman, G., Campbell, B., Guo, M., Neumann, F.: Diversity optimization for the detection and concealment of spatially defined communication networks. In: Proceedings of the Genetic and Evolutionary Computation Conference, GECCO 2023. pp. 1436–1444. ACM (2023). https://doi.org/10.1145/3583131.3590405, https://doi.org/10.1145/3583131.3590405
  • [24] Neumann, F., Witt, C.: Bioinspired computation in combinatorial optimization: algorithms and their computational complexity. In: Blum, C., Alba, E. (eds.) Genetic and Evolutionary Computation Conference, GECCO ’13. pp. 567–590. ACM (2013). https://doi.org/10.1145/2464576.2466738
  • [25] Nikfarjam, A., Bossek, J., Neumann, A., Neumann, F.: Computing diverse sets of high quality TSP tours by eax-based evolutionary diversity optimisation. In: FOGA ’21: Foundations of Genetic Algorithms XVI. pp. 9:1–9:11. ACM (2021). https://doi.org/10.1145/3450218.3477310, https://doi.org/10.1145/3450218.3477310
  • [26] Nikfarjam, A., Bossek, J., Neumann, A., Neumann, F.: Entropy-based evolutionary diversity optimisation for the traveling salesperson problem. In: GECCO ’21: Genetic and Evolutionary Computation Conference. pp. 600–608. ACM (2021). https://doi.org/10.1145/3449639.3459384, https://doi.org/10.1145/3449639.3459384
  • [27] Nikfarjam, A., Neumann, A., Neumann, F.: Evolutionary diversity optimisation for the traveling thief problem. In: GECCO ’22: Genetic and Evolutionary Computation Conference. pp. 749–756. ACM (2022). https://doi.org/10.1145/3512290.3528862, https://doi.org/10.1145/3512290.3528862
  • [28] Pugh, J.K., Soros, L.B., Stanley, K.O.: Quality diversity: A new frontier for evolutionary computation. Frontiers Robotics AI 3,  40 (2016). https://doi.org/10.3389/FROBT.2016.00040
  • [29] Ulrich, T., Bader, J., Thiele, L.: Defining and optimizing indicator-based diversity measures in multiobjective search. In: Schaefer, R., Cotta, C., Kolodziej, J., Rudolph, G. (eds.) Parallel Problem Solving from Nature - PPSN XI, 11th International Conference 2010, Proceedings, Part I. Lecture Notes in Computer Science, vol. 6238, pp. 707–717. Springer (2010). https://doi.org/10.1007/978-3-642-15844-5_71
  • [30] Ulrich, T., Thiele, L.: Maximizing population diversity in single-objective optimization. In: Krasnogor, N., Lanzi, P.L. (eds.) 13th Annual Genetic and Evolutionary Computation Conference, GECCO 2011, Proceedings, Dublin, Ireland, July 12-16, 2011. pp. 641–648. ACM (2011). https://doi.org/10.1145/2001576.2001665
  • [31] Vassiliades, V., Chatzilygeroudis, K., Mouret, J.B.: Using centroidal voronoi tessellations to scale up the multidimensional archive of phenotypic elites algorithm. IEEE Transactions on Evolutionary Computation 22(4), 623–630 (2017)
  • [32] Zhang, H., Chen, Q., Xue, B., Banzhaf, W., Zhang, M.: Map-elites for genetic programming-based ensemble learning: An interactive approach [ai-explained]. IEEE Comput. Intell. Mag. 18(4), 62–63 (2023). https://doi.org/10.1109/MCI.2023.3304085