research-article

Open access

BiqBin: A Parallel Branch-and-bound Solver for Binary Quadratic Problems with Linear Constraints

Authors:

Angelika WiegeleAuthors Info & Claims

ACM Transactions on Mathematical Software (TOMS), Volume 48, Issue 2

Article No.: 15, Pages 1 - 31

https://doi.org/10.1145/3514039

Published: 19 July 2022 Publication History

All formats PDF

Abstract

We present BiqBin, an exact solver for linearly constrained binary quadratic problems. Our approach is based on an exact penalty method to first efficiently transform the original problem into an instance of Max-Cut, and then to solve the Max-Cut problem by a branch-and-bound algorithm. All the main ingredients are carefully developed using new semidefinite programming relaxations obtained by strengthening the existing relaxations with a set of hypermetric inequalities, applying the bundle method as the bounding routine and using new strategies for exploring the branch-and-bound tree.

Furthermore, an efficient C implementation of a sequential and a parallel branch-and-bound algorithm is presented. The latter is based on a load coordinator-worker scheme using MPI for multi-node parallelization and is evaluated on a high-performance computer.

The new solver is benchmarked against BiqCrunch, GUROBI, and SCIP on four families of (linearly constrained) binary quadratic problems. Numerical results demonstrate that BiqBin is a highly competitive solver. The serial version outperforms the other three solvers on the majority of the benchmark instances. We also evaluate the parallel solver and show that it has good scaling properties. The general audience can use it as an on-line service available at http://www.biqbin.eu.

1 Introduction

1.1 Motivation

With the advent of data driven economy, a close cooperation between data science and mathematical optimization became a crucial driver for developing (i) new data science methods that are capable to reveal new knowledge hidden in the data and (ii) new algorithms for implementing these data science methods.

Studying how to group data instances according to their inner similarity, i.e., data clustering analysis [21, Ch.10.3], is a traditional and one of the most studied problems in data science. When we know that the data should be decomposed into a fixed number of groups, say K, and we want to find these K groups, we have the K-clustering problem, which is NP-hard [12].

Another interesting problem from data science, similar to the K-clustering problem, is the problem where we want to find a set of k vertices in a simple undirected weighted graph G such that the sum of the weights on the edges connecting these k vertices is maximum. This problem is called the densest k-subgraph problem [4] and can be formulated as an optimization problem with a quadratic objective function, one linear constraint, and binary ( \(0/1\) ) decision variables:

\begin{align} \max ~~ & \frac{1}{2}\mathbf {x}^\top W \mathbf {x}\nonumber \nonumber\\ \mathrm{s.t.}~~ & \mathbf {e}^\top \mathbf {x}= k \\ & \mathbf {x}\in \lbrace 0,1\rbrace ^{n}.\nonumber \nonumber \end{align}

(DkS)

Here, W is the weighted adjacency matrix of the underlying graph and \(\mathbf {e}\) is the all one vector.

The densest k-subgraph problem can be seen as a generalization of the Max-Clique problem [41] and also as a special case of the quadratic knapsack problem [34]. Even though it is easy to understand and it appears to be simple at first sight, it is not. In fact, it is one of the NP-hard problems for which even the approximability is not well understood. There is a huge gap between the best approximation algorithm and known inapproximability results [3].

The problems mentioned above are all special instances of binary quadratic problems with linear constraints, which are formally defined in Section 1.3. In practice, we typically solve such problems only approximately using heuristic algorithms. For example, in [6, 40, 42] one can find a vast number of such heuristics for the case of data clustering. However, to evaluate the performance of the heuristic algorithms, we still need the ground truth, i.e., the optimum solutions of the original problems. Therefore, solving (to optimality) constrained binary quadratic problems, like (DkS), on as large as possible problem instances is highly needed.

Many other optimization problems with clear real-life applications can also be represented in a similar way as a non-convex optimization problem in binary variables subject to linear constraints, e.g., the quadratic assignment problem, the stable set problem, and the graph coloring problem (see, e.g., [37] for definitions). Again, appropriate (meta)heuristic algorithms are used to approximately solve instances of big size, while optimum solutions for test instances of small or medium size are still needed to evaluate these heuristics.

1.2 Notation

We will use the following notation: \(\mathcal {S}_n\) denotes the space of symmetric \(n\times n\) matrices, \({\mathbb {R}}^n\) is the n-dimensional space of all n-tuples of real numbers and \({\mathbb {R}}^{m\times n}\) is the space of all \(m\times n\) real matrices. By \(\mathbf {e}_n\) we denote the vector of length n with all entries equal to one. Usually, its dimension is clear from the context, so we write only \(\mathbf {e}\) . Similarly, \(\mathbf {0}\) denotes the zero vector or the zero matrix. Given \(\mathbf {x}\in {\mathbb {R}}^n\) , \(\operatorname{Diag}(\mathbf {x})\) is the \(n \times n\) diagonal matrix with \(\mathbf {x}\) on its diagonal; and \(\operatorname{diag}(X)\) is the vector with the diagonal elements of matrix X. By \(\operatorname{rk}(X)\) we denote the rank of matrix X.

For an optimization problem \(\bf {P}\) , we refer to its optimum value by \({\mathrm{OPT}}_{\bf {P}}\) . For the SDP relaxations of (Max-Cut) we use slightly different notations for their optimum values. For example, the optimum value of the SDP relaxation (MCSDP) is denoted by \({\mathrm{OPT}}_{\bf {SDP}}\) .

1.3 Linearly Constrained Binary Quadratic Problem and the Max-Cut Problem

The central problem that we consider in the paper is the linearly constrained binary quadratic problem (BQP), which can be formulated as follows:

\begin{align} \begin{split} \min ~~ &\mathbf {x}^\top F \mathbf {x}+ \mathbf {c}^\top \mathbf {x}\\ \mathrm{s.t.}~~ &A\mathbf {x}= \mathbf {b},~~ \mathbf {x}\in \lbrace 0,1\rbrace ^{n}, \end{split} \end{align}

(BQP)

with given data \(F \in \mathcal {S}_n\) , \(c\in {\mathbb {R}}^{n}\) , \(A\in {\mathbb {R}}^{m\times n}\) , and \(\mathbf {b}\in {\mathbb {R}}^{m}\) . We assume to have only linear equality constraints, in case of inequalities we may add a slack variable and decompose it into a weighted sum of boolean variables. This problem encompasses also the densest k-subgraph problem and is thus an NP-hard problem.

The mathematical optimization community is interested in BQP as a problem per se. In this context, the main challenge is developing new algorithms for it by (i) exploring and exploiting new properties of the problem, (ii) using new results from other (non-optimization) areas (such as algebraic geometry), (iii) making new combinations of existing algorithms with best practical or theoretical performance, and (iv) exploiting best available high-performance hardware and software.

Lasserre [27] has proved that any instance of (BQP) can be transformed into an instance of the Max-Cut problem, which is an NP-hard optimization problem [12, 22] on graphs. This problem is among the most studied combinatorial optimization problems, it has connections to various fields of discrete mathematics and models a wide range of applications. The transformation is based on an exact penalty approach, that was further explored and advanced in a recent paper of two co-authors of this paper [16].

The Max-Cut problem can be defined as follows. Suppose a weighted undirected graph \(G=(V,E)\) is given, where V is the set of vertices, E is the set of edges, and each edge \(e\in E\) has the weight \(w_e\in {\mathbb {R}}\) . The Max-Cut problem asks to find a partition of V into two parts \((S, V\backslash S)\) such that the sum of the weights of the edges having one endpoint in S and the other one in \(V\backslash S\) is maximized.

If \(W=(w_{ij})\) is the weighted adjacency matrix of G, i.e., \(w_{ij} = w_{ji} = w_e\) for \(e=\lbrace i,j\rbrace \in E\) , and the Laplacian matrix of the graph associated with W is

\begin{equation*} L = \operatorname{Diag}(W\mathbf {e}) - W, \end{equation*}

(Max-Cut)

then computing the maximum cut amounts to solving the following binary quadratic problem in variables from \(\lbrace - 1,1\rbrace\) :

\begin{equation*} {\mathrm{OPT}}_{\bf {Max-Cut}}~=~\max \left\lbrace \frac{1}{4}\mathrm{x}^\top L\mathrm{x} \mid \mathrm{x}\in \lbrace - 1,1\rbrace ^{|{V}|} \right\rbrace . \end{equation*}

However, it is straightforward to transform it into a \(0/1\) problem by using the following simple linear substitution \(\mathbf {x}= 2\mathbf {z}- \mathbf {e}\) and by observing that

\begin{equation*} \mathbf {x}^\top L\mathbf {x}= 4\mathbf {z}^\top L\mathbf {z}-4\mathbf {z}^\top L\mathbf {e}+\mathbf {e}^\top L\mathbf {e}= 4\mathbf {z}^\top L\mathbf {z}, \end{equation*}

since \(L\mathbf {e}=0\) . Hence, we obtain the equivalent formulation of the Max-Cut, which has a structure of (BQP) (with no linear constraints):

\begin{equation*} {\mathrm{OPT}}_{\bf {Max-Cut}}~=~\max \left\lbrace \mathbf {z}^\top L\mathbf {z}\mid \mathbf {z}\in \lbrace 0,1\rbrace ^{|{V}|} \right\rbrace . \end{equation*}

The reason why the community is interested in reformulating (BQP) into (Max-Cut) is related to the fact that for the latter problem, there exists a wide range of approximate and exact methods and solvers and we want to employ them in solving the former problem. We are particularly interested in the methods computing global optima of both problems.

1.4 Our Contribution

The main contribution of this paper is the BiqBin solver for BQP, which outperforms the existing solvers on some special cases of BQP. The core of the BiqBin solver is the exact penalty algorithm EXPEDIS [16], meaning that we first transform every instance of (BQP) to a corresponding instance of (Max-Cut) and then solve the resulting instance of () BiqBin is coded in C and non-trivially improves existing solvers by

•

introducing a strengthened bounding routine based on hypermetric inequalities;

•

implementing a parallel branch-and-bound algorithm to solve (Max-Cut) instances using a Message Passing Interface library (MPI);

•

providing a web-based BiqBin service¹ enabling researchers to submit their instances of (BQP) to one of the Slovenian Tier-2 supercomputers to be solved.

Additionally, we demonstrate practical efficiency of BiqBin by providing an extensive benchmarking with BiqCrunch [25], GUROBI [14], and SCIP [11] on the list of four special cases of BQP, including the Max-Cut problem, the unconstrained binary quadratic problem, the densest k-subgraph problem and randomly generated binary quadratic problems with linear constraints.

We can observe that BiqBin is performing very well on the instances with a small number of linear constraints, while on the instances with many linear constraints, the solvers which directly exploit the structure of the constraints are very competitive. Additionally, as we show on the benchmark instances, BiqBin is also highly scalable.

2 Related Work

Starting in the previous century, tackling the Max-Cut problem computationally attractedthe attention of many researchers. A number of ideas for solving the problem have been proposed in the literature; fast heuristic algorithms on one side and exact algorithms on the other side have been developed over the last decades. We mention here only a few concepts of such exact methods. A first success on solving (Max-Cut) instances to optimality appeared in the eighties, when linear programming based methods were implemented [1] and further developed, in particular, in the context of solving problems arising in physics [28]. These methods are specifically successful in the cases when the underlying graphs are sparse.

Other methods use a preprocessing phase where they try to fix some variables (based on the gradient) [32, 33] or to construct a convex problem having the same optimal solution. For such convex problems efficient solvers exist, the most prominent ones being the commercial solvers Gurobi [14] and CPLEX [20].

At the beginning of this century, methods based on semidefinite programming and the branch-and-bound (B&B) algorithm were proposed [17, 24, 25, 35]. One of them, the BiqMac solver [35], was the starting point for our research, since it is one of the best performing solvers for (Max-Cut) instances and can be employed to solve (BQP) instances as follows from the Lassere’s result [27]. Another one, the BiqCrunch [25] solver, we use to benchmark our results.

By setting \(F=0\) in (BQP), we obtain a linear optimization problem with binary variables. Such problems have been investigated for many decades, leading to enormous progress in their practical solutions. According to Hans Mittelmann’s web page,² the state-of-the-art solvers (the web page contains results for the solvers CBC, GLPK, LP_SOLVE, MATLAB, SAS-OR, and (F)SCIP) can solve such problems to optimality within a few hours if the number of binary variables is up to 50.000. Moreover, they can also handle problems with several hundred thousand binary variables if sufficient structure is provided (see also the list of “easy” cases on the MIPLIB benchmark page [30]).

In the case of a (non-convex) quadratic objective function, the state-of-the-art optimization techniques perform weaker than those designed for linear problems. There are many solvers that can solve (BQP) instances: Mittelmann included in his decision tree for optimization software³ several solvers which use various optimization techniques to compute optimum solutions, such as the branch-and-cut algorithm, branch-and-bound, lift-and-project, convex reformulation combined with some first and second order methods, etc. He performed an extensive benchmarking of several solvers (BARON, (F)SCIP, ANTIGONE, MINOTAUR, OCTERACT, GUROBI) on instances from QPLIB⁴ \(^{,}\) ⁵ [10]. Mittelmann showed that these solvers can solve within one hour between 5% (MINOTAUR) and 63% (GUROBI) of the 128 benchmarking instances. These benchmarking problems have small to medium size: they mostly contain up to a few hundred binary variables with the largest instance having 8,904 binary variables. Large instances typically share some important structural properties, which make them solvable at least for some solvers.

The solvers mentioned above are available under various software licences. Most of them are freely available to all researchers for academic purposes upon registration and verification. If one does not have a strong local machine or does not want to bother with local installations, they can submit the problem instance to the web portals where some of the solvers are installed; such services are available, e.g., for NEOS [9], BiqMac,⁶ and BiqCrunch.⁷ While NEOS partially runs on fast supercomputers, BiqMac and BiqCrunch run on a small cluster and a strong single machine with multicore processors, respectively. The latter two solvers utilize B&B algorithms, but they do not perform any parallelization (a first trial to parallelize BiqMac was done in [38]). They are available on-line (more precisely, the users can submit their instance of the problem on-line) but they are running on a hardware, which cannot be compared to the state-of-the-art supercomputers.

3 Semidefinite Programming Relaxations for the Max-Cut

In the subsequent sections, we make use of tools from semidefinite programming. In order to make this paper self-contained, we recall here some definitions and algorithms.

As mentioned in Section 1.3, the Max-Cut problem can be formulated as

\begin{equation*} \max \left\lbrace \frac{1}{4}\mathbf {x}^\top L\mathbf {x}\mid \mathbf {x}\in \lbrace - 1,1\rbrace ^{|{V}|} \right\rbrace . \end{equation*}

Observe that for any \(\mathbf {x}\in \lbrace - 1,1\rbrace ^{|{V}|}\) , the matrix \(X = \mathbf {x}\mathbf {x}^\top\) is positive semidefinite and its diagonal is equal to the vector of all ones. Using this transformation and the property \(\mathbf {x}^\top L\mathbf {x}= \langle L, \mathbf {x}\mathbf {x}^\top \rangle\) , we can re-write (Max-Cut) as

\begin{equation*} \max \left\lbrace \frac{1}{4}\langle L, X \rangle \mid \operatorname{diag}(X) = \mathbf {e}, \ X \succcurlyeq 0, \ \operatorname{rk}(X) = 1 \right\rbrace . \end{equation*}

By dropping the rank-one constraint, we obtain the basic SDP relaxation

\begin{equation} {\mathrm{OPT}}_{\bf {SDP}}~=~\max \left\lbrace \frac{1}{4}\langle L, X \rangle \mid \ X \succcurlyeq 0, \ \operatorname{diag}(X) = \mathbf {e}\right\rbrace . \end{equation}

(MCSDP)

It is well-known that the bound \({\mathrm{OPT}}_{\bf {SDP}}\) is not strong enough to be successfully used within the B&B framework even for solving the Max-Cut problem to optimality on graphs with only 50 nodes. We overcome this problem by adding additional equality or inequality constraints, known as cutting planes, to strengthen the bound and consequently decrease the size of the B&B tree.

Similarly as in BiqMac and BiqCrunch, we use triangle inequalities. Furthermore, we strengthen the bound by using higher order k-gonal inequalities, which belong to the family of hypermetric inequalities [7]. They can be introduced as follows. Suppose that \(\mathbf {b}\) is an integer vector for which \(\mathbf {e}^\top \mathbf {b}\) is odd. This implies that \(\vert \mathbf {x}^\top \mathbf {b}\vert \ge 1 \textrm { for all } \mathbf {x}\in \lbrace - 1,1\rbrace ^n\) and therefore \(\langle \mathbf {b}\mathbf {b}^\top , \mathbf {x}\mathbf {x}^\top \rangle \ge 1.\) The hypermetric inequalities are the following set of linear inequalities, which can be applied to any symmetric matrix X of order n, and are valid for any X from the convex hull of rank-one matrices \(\mathbf {x}\mathbf {x}^\top\) , for \(\mathbf {x}\in \lbrace - 1,1\rbrace ^n\) :

\begin{equation*} \lbrace \langle \mathbf {b}\mathbf {b}^\top , X \rangle \ge 1\ \mid \ \mathbf {e}^\top \mathbf {b}\textrm { odd}, \ \mathbf {b}\textrm { integer}\rbrace . \end{equation*}

In this paper, we consider the subclasses of hypermetric inequalities generated by choosing \(\mathbf {b}\) with \(b_i \in \lbrace - 1,0,1\rbrace\) and by fixing the number of non-zero entries in \(\mathbf {b}\) to 3, 5, or 7. In this cases, we obtain triangle, pentagonal, and heptagonal inequalities, respectively. The latter two are also called 5-clique and 7-clique inequalities, respectively. More specifically, the triangle inequalities are defined as

\begin{align*} -X_{ij}-X_{ik}-X_{jk} \le 1, \\ -X_{ij}+X_{ik}+X_{jk} \le 1, \\ X_{ij}-X_{ik}+X_{jk} \le 1, \\ X_{ij}+X_{ik}-X_{jk} \le 1, \end{align*}

\(\forall\) distinct i, j, k. The first inequality is actually \(b^\top Xb = X_{ii} + X_{jj} + X_{kk} + 2\left(X_{ij} + X_{ik}+ X_{jk}\right) \ge 1\) , for the case \(\mathbf {b}_i = \mathbf {b}_j = \mathbf {b}_k = 1\) , by using the implicit constraint \(\operatorname{diag}(X) = \mathbf {e}\) . The other three inequalities from above, for selected \(i,j,k\) , follow by considering all the other combinations of signs of entries of \(\mathbf {b}_i,\mathbf {b}_j,\mathbf {b}_k\) (trivially, \(\mathbf {b}\) and \(-\mathbf {b}\) yield the same inequality). By introducing appropriate linear operators on the vector space of symmetric matrices and by using the fact that the matrices under consideration have diagonal entries equal to one, we write such inequalities as \(\mathcal {A}_3(X)\le \mathbf {e}\) , \(\mathcal {A}_5(X)\le \mathbf {e}\) and \(\mathcal {A}_7(X)\le \mathbf {e}\) , respectively.

Similarly, we denote by \(\mathcal {A}_{\bf {HYP}}(X) \le \mathbf {e}\) a set containing triangle, pentagonal, and heptagonal inequalities and call it (by a slight abuse of notation) hypermetric inequalities. We infer the following strengthening of (MCSDP):

\begin{equation} {\mathrm{OPT}}_{\bf {HYP}}~=~\max \left\lbrace \langle L, X \rangle \mid \ X \succcurlyeq 0, \ \operatorname{diag}(X) = \mathbf {e}, \ \mathcal {A}_{\bf {HYP}}(X)\le \mathbf {e}\right\rbrace . \end{equation}

(MCHYP)

We now describe the routine for separating the proposed cutting planes. There are \(4{n \choose 3}\) triangle inequalities; for problem sizes that we are interested in, we can enumerate them and identify the most violated ones.

Due to a large number of higher k-gonal inequalities, separation of pentagonal and heptagonal inequalities is done heuristically. Let \(\mathbf {e}= (1, 1, 1, 1, 1)^\top\) and define \(H_1 = \mathbf {e}\mathbf {e}^\top\) . Suppose we are searching for the pentagonal inequality (of the type where all nonzero entries of \(\mathbf {b}\) are ones) with large violation, i.e., for a given matrix X, we are looking for a 5-permutation p of n vertices such that the value \(\langle H_1, X(p,p) \rangle\) is minimal. By \(X(p,p)\) we denote the submatrix obtained by taking rows and columns contained in the permutation p. These numbers represent the indices of nonzero entries of \(\mathbf {b}\) determining the pentagonal inequality. Let H be an \(n \times n\) matrix having \(H_1\) as the leading principal submatrix of order 5 and all other elements are set to zero. Then the problem can be reformulated as a quadratic assignment problem of the form

\begin{align*} \min \quad & \langle H, PXP^\top \rangle \\ \mathrm{s.t.}\quad & P \in \Pi , \end{align*}

where \(\Pi\) is the set of all \(n \times n\) permutation matrices. This problem is approximately solved by using simulated annealing to obtain a pentagonal inequality with potentially large violation. By replacing the matrix \(H_1\) with rank-one matrices \(H_2 = \widehat{\mathbf {e}}\widehat{\mathbf {e}}^\top\) or \(H_3 = \widetilde{\mathbf {e}}\widetilde{\mathbf {e}}^\top\) , where \(\widehat{\mathbf {e}} = (- 1, 1, 1, 1, 1)\) and \(\widetilde{\mathbf {e}} = (- 1, - 1, 1, 1, 1)\) , different types of pentagonal inequalities are found. The same idea is applied for separating strongly violated heptagonal inequalities.

To sum up, in BiqBin, we first iteratively identify a subset of the promising cutting planes. Using a current approximate solution X, we find the promising hypermetric inequalities \(\mathcal {A}^{\prime }_{\bf {HYP}}(X)\le \mathbf {e}\) and solve ((MCHYP)) by using only these inequalities. Specifically, during the separation routine, we add \(10\cdot n\) triangle inequalities, 300 pentagonal inequalities, and 200 heptagonal inequalities.

Note that for convenience, we sometimes use instead of the cut vector \(\mathbf {x}\in \lbrace - 1,1\rbrace ^n\) the vector \(\begin{pmatrix} 1 \\ \mathbf {x}\end{pmatrix} \in \lbrace - 1,1\rbrace ^{n+1}\) to derive the SDP relaxation. This has the advantage that the values of the vector \(\mathbf {x}\) are given in the first row and column of the matrix

\begin{equation*} \begin{pmatrix} 1 \\ \mathbf {x}\end{pmatrix} \begin{pmatrix} 1\\ \mathbf {x}\end{pmatrix}^\top . \end{equation*}

4 From Binary Quadratic Problems with Linear Constraints to the Max-Cut Problem

In this section, we recall how to transform binary quadratic problems with linear constraints into the Max-Cut problem.

Exact penalty methods for solving constrained optimization problems construct a function, for which the (unconstrained) minimizers are also optimal solutions of the constrained problem, see [8] for an overview of classical results on this topic. Gusmeroli and Wiegele [16] introduced an exact penalty algorithm over discrete sets called EXPEDIS. Their work follows and improves the idea of Lasserre, see [27], and reformulates a linearly constrained binary quadratic problem as a Max-Cut instance.

The input of the EXPEDIS algorithm is an instance of (BQP). We consider its version with the binary variables being from \(\lbrace -1,1\rbrace\) .

In order to simplify notation, we define the sets of feasible and infeasible binary vectors as \(\Delta\) and \(\Delta ^c\) , respectively; i.e.,

\begin{equation*} \Delta = \left\lbrace \mathbf {x}\in \lbrace -1,1\rbrace ^{n} | A\mathbf {x}=\mathbf {b}\right\rbrace \quad \textrm {and} \quad \Delta ^c = \lbrace -1,1\rbrace ^{n} \setminus \Delta . \end{equation*}

Given a sufficiently large penalty parameter, denoted \(\sigma\) , we add a quadratic term to the objective function and obtain

\begin{equation*} h(\mathbf {x}) = \mathbf {x}^\top F\mathbf {x}+ \mathbf {c}^\top \mathbf {x}+ \sigma \Vert A\mathbf {x}-\mathbf {b}\Vert ^{2}. \end{equation*}

By defining the matrix

\begin{equation*} Q = \begin{bmatrix} \sigma b^\top b & \left(c - 2\sigma A^\top b \right)^\top /2 \\ \left(c - 2\sigma A^\top b\right)/2 & F + \sigma A^\top A \end{bmatrix} \end{equation*}

the function \(h(\mathbf {x})\) can alternatively be written as \(\bar{\mathbf {x}}^\top Q \bar{\mathbf {x}}\) , in which case we consider the following unconstrained binary optimization problem

\begin{align*} \begin{split} h^* = \min ~~ & \bar{\mathbf {x}}^\top Q \bar{\mathbf {x}} \\ \mathrm{s.t.}~~ & \bar{\mathbf {x}} \in \left\lbrace - 1,1 \right\rbrace ^{n+1} \\ & \bar{\mathbf {x}}_0 = 1, \end{split} \end{align*}

which is a Max-Cut problem on a graph with \(n+1\) vertices, i.e., \(\vert V \vert = \lbrace 0,1,\ldots , n\rbrace\) ; see [16] for more details.

Theorem 1 ([16, Theorem 2]).

Consider an instance of (BQP) with optimal value \(f^*\) . Let \(\rho\) be a threshold parameter and let \(\sigma\) be a penalty parameter such that

(1)

(BQP) has no feasible solution with value greater than \(\rho\) ; and

(2)

for any \(\mathbf {x}\) in the set \(\Delta ^c\) , we have \(h(\mathbf {x}) \gt \rho\) .

If \(f^* \lt \infty\) then the optimal values of the constrained and the unconstrained problem coincide, i.e., \(h^* = f^*\) . Moreover, this instance is infeasible if and only if \(h^* \gt \rho\) .

The choice of parameters used in [16] is

\begin{align*} \rho &= \tilde{u}\\ \sigma &= \tilde{u} - \tilde{\ell } + \epsilon , \end{align*}

where

\begin{align*} \tilde{\ell } &= \min \left\lbrace c^\top \mathbf {x}+ \langle F, X \rangle \mid \operatorname{diag}(X)=\mathbf {e}, \ X-\mathbf {x}\mathbf {x}^\top \succcurlyeq 0, \mathcal {A}_{3}(X)\le \mathbf {e}, \mathcal {A}_{5}(X)\le \mathbf {e}\right\rbrace \\ \tilde{u} &= \max \left\lbrace c^\top \mathbf {x}+ \langle F, X \rangle \mid \operatorname{diag}(X)=\mathbf {e}, \ X-\mathbf {x}\mathbf {x}^\top \succcurlyeq 0, \ [b, - A ] \cdot \begin{bmatrix} 1 & \mathbf {x}^\top \\ \mathbf {x}& X \end{bmatrix} = \mathbf {0}\right\rbrace . \end{align*}

In this way, \(\rho\) and \(\sigma\) fulfill the assumptions of Theorem 1, but are kept “sufficiently” small in order to avoid numerical difficulties when computing the maximum cut. For a more detailed study on the choice of the parameters \(\rho\) and \(\sigma\) we refer to [16].

In [16], further enhancements of the choice of the penalty parameter are discussed, e.g., an update is made as soon as a feasible solution of (BQP) is found, or an early stopping condition is added when infeasibility of (BQP) is detected.

5 BiqBin Solver

5.1 (Sequential) Branch-and-bound Algorithm

The BiqBin solver is a Max-Cut based solver for (BQP) instances that solves their reformulation to (Max-Cut) instances using a B&B algorithm.

The main ingredients of BiqBin are:

(1)

the procedure for the exact penalty reformulation of (BQP) instances into (Max-Cut) instances;

(2)

the bounding procedure, which provides for each instance of a problem (also for smaller subproblems obtained via branching) an upper bound on the optimum value;

(3)

the branching procedure, which splits the current problem into more problems of smaller dimensions by fixing some variables;

(4)

a heuristic for generating feasible solutions providing a lower bound.

The overall performance of the algorithm is determined by the quality of the lower and upper bounds, computed in each B&B node, by the computational efficiency of computing these bounds, and by the strategy of exploring the B&B tree. The main reason why we cannot solve large instances on personal computers is that the B&B tree grows too big, i.e., the pruning is too slow and a single processor is not capable to explore all the generated nodes in the tree.

Exact penalty reformulation. The procedure which reformulates every instance of (BQP) into an instance of (Max-Cut) is described in Section 4.

Bounding procedure. The starting point of the algorithm is the strengthened SDP relaxation ((MCHYP)). Since the number of inequalities is too large to solve this SDP by standard solvers, we will use a bundle method to find an approximate solution to the partial Lagrangian dual. By dualizing only the inequality constraints, we obtain the nonsmooth convex partial dual function

\begin{equation*} f(\gamma) = \max _{\textrm {diag}(X) \ = \ \mathbf {e},\\ X \succcurlyeq 0} \mathcal {L}(X,\gamma) = \mathbf {e}^\top \mathbf {\gamma }+ \max _{\textrm {diag}(X) \ = \ \mathbf {e},\\ X \succcurlyeq 0}\langle L - \mathcal {A}_{\bf {HYP}}^\top (\gamma), X \rangle , \end{equation*}

where \(\gamma\) are the nonnegative dual variables associated with the constraints \(\mathcal {A}_{\bf {HYP}}(X)\le \mathbf {e}\) . Evaluating the dual function \(f(\gamma)\) and computing the subgradient amounts to solving an SDP of the form (MCSDP), which can be efficiently computed using an interior-point method tailored for this problem. It provides us with the matching pair \((X_{\gamma }, \gamma)\) such that \(f(\gamma) = \mathcal {L}(X_{\gamma },\gamma)\) . Moreover, the subgradient of f at \(\gamma\) is given by \(\partial f(\gamma) = \mathbf {e}- \mathcal {A}_{\bf {HYP}}(X_{\gamma }).\) For obtaining an approximate minimizer of the dual problem

\begin{align} \begin{split} \min \quad & f(\gamma) \\ \mathrm{s.t.}\quad & \gamma \ge 0, \end{split} \end{align}

(1)

we use the bundle method [23, 35]. Let the current iterate be \(\hat{\gamma }\) . Suppose we have evaluated f at \(k \ge 1\) points \(\gamma _1,\ldots ,\gamma _k\) with matching pairs \(X_1,\ldots ,X_k\) and subgradients \(\mathbf {e}- \mathcal {A}_{\bf {HYP}}(X_i)\) for \(i\in \lbrace 1,\ldots ,k\rbrace\) . The bundle method combines the following two ideas:

(1)

the function \(f(\gamma)\) is approximated by

\begin{align*} f_{\mathrm{appr}}(\gamma) &= \max \lbrace \mathcal {L}(X,\gamma) \mid X \in \mathrm{conv}(X_1, \ldots , X_k)\rbrace \\ &= \max _{\lambda \ge 0, \ \mathbf {e}^\top \lambda = 1} \mathbf {e}^\top \gamma + \langle L-\mathcal {A}_{\bf {HYP}}^\top (\gamma),\mathcal {X}\lambda \rangle , \end{align*}

where the bundle of matrices \(\mathcal {X} = (X_1,\ldots ,X_k)\) is used to construct a minorant \(f_{\mathrm{appr}}\) and \(\mathcal {X}\lambda =\sum _i \lambda _i X_i\) ;

(2)

the proximal point idea, which penalizes the displacement from the current best point \(\hat{\gamma }\) with a quadratic regularization term proportional to \(\Vert \gamma - \hat{\gamma }\Vert ^2\) .

In summary, the bundle method finds a new trial point by minimizing, for some prescribed parameter \(t \gt 0\) , the function

\begin{equation*} f_{\mathrm{appr}}(\gamma) - \frac{1}{2t}\Vert \gamma - \hat{\gamma } \Vert ^2 \end{equation*}

over the nonnegative orthant. As the method terminates, we obtain the approximate minimizer of the dual function, as well as the convex weights \(\lambda \ge 0,\sum _i \lambda _i=1\) , determining the matrix \(\mathcal {X}\lambda\) .

In order to obtain a tight upper bound for an instance of () we use the cutting-plane approach, where multiple k-gonal inequalities are added and purged in the course of running the bundle algorithm. First, the optimum solution of the basic semidefinite relaxation (MCSDP) is computed using an interior-point method followed by separating a set of triangle inequalities. After doing a few bundle iterations with this set of constraints, we purge all inactive constraints. Next, invoking the separation routine described in Section 3, new violated k-gonal inequalities are added. The problem with the new set of constraints is solved and the process is iterated as long as there is a significant decrease of the upper bound.

Since the primal solution matrix X is not available, we use the same idea as in BiqMac to purge some inequality constraints. We look at the values of the corresponding dual multipliers in the vector \(\gamma\) . If the value of some dual multiplier is close to zero, this indicates that the corresponding constraint is not active and we remove it.

After each iteration of computing an upper bound and separating triangle inequalities, higher order k-gonal inequalities are added in order to further decrease the bound. We monitor the maximum violation of triangle inequalities \(r_{\text{tri}}=\max (\mathcal {A}_3(\mathcal {X} \lambda) - \mathbf {e})\) and as soon as the number is sufficiently small, the heuristic from Section 3 is used to add some strongly violated pentagonal inequalities to the relaxation. Similarly, as the maximum violation of pentagonal inequalities \(r_{\text{pent}}\) drops below some threshold, new heptagonal inequalities are separated and added to the relaxation. In our numerical tests, we have used the thresholds \(r_{\text{tri}} \lt 0.2\) and \(r_{\text{pent}}\lt 0.4\) .

In order to improve the performance of BiqBin, we can stop the bounding routine when we detect that we will not be able to prune the current node in the B&B tree. We again borrow an idea from BiqMac. After some cutting plane iterations, we make a linear (and hence optimistic) forecast to decide whether it is worth doing more iterations. If the gap cannot be closed, we terminate the bounding routine, branch the current node, and start evaluating new subproblems. This is especially important in the parallel solver, since its efficiency depends on how quickly idle workers receive subproblems.

Furthermore, at the beginning of the algorithm, the number of bundle iterations should be small. We start with three iterations. Then this number is increased after each separation of new cutting planes, until a limit is reached. In the BiqBin solver, this limit value is set to 15. This is motivated by the fact, that in the beginning it will take a while until we have identified the right cutting planes and we do not waste time by trying hard to decrease the bound if the current set of hypermetric inequalities does not allow much progress.

Branching strategies. BiqBin uses two branching strategies which are based on the bundle of matrices obtained from the bounding routine. Once the bundle method terminates, the last column \(\mathbf {x}\) of the matrix \(\mathcal {X}\lambda\) is extracted. Due to the diagonal and positive semidefiniteness constraints on the feasible matrices of ((MCHYP)), all the entries of \(\mathbf {x}\) lie in the interval \([-1,1]\) . Similarly to BiqCrunch, the decision on which variable \(x_i\) the branching should be performed is based on the following two strategies:

(1)

difficult first: we branch on the vertex i for which the variable \(x_i\) is closest to 0;

(2)

easy first: we branch on the vertex i for which the variable \(x_i\) is furthest from 0.

In the \(0/1\) formulation, these rules are usually referred to as most-fractional and least-fractional rules, respectively. The difficult first rule is set as the default strategy for the BiqBin solver.

When branching, two new subproblems, in which the chosen branching variable is fixed accordingly, are created and the corresponding nodes are added in the B&B tree, i.e., to the priority queue of unexplored problems. Priority is based on the upper bound obtained from the bundle method. When selecting the next subproblem, a node with the worst upper bound is evaluated first.

Rounding heuristic. For generating high quality feasible solutions of the Max-Cut problem, we apply the Goemans-Williamson rounding hyperplane technique [13]. Let X be an optimal solution of some SDP relaxation of Max-Cut. By computing the Cholesky factorization \(X = V^\top V\) with column vectors \(v_i\) of V and selecting some random vector r, the cut \((S,V\backslash S)\) is obtained by setting \(S = \lbrace i \mid v_i^\top r \ge 0 \rbrace .\) Since in our case we are working with the partial Lagrangian, the information about the primal X is not available. Instead, we use a convex combination of bundle matrices \(\mathcal {X}\lambda\) as the input. The cut vector x obtained from this heuristic is then further improved by flipping the vertices and using a convex combination of \(\mathcal {X}\lambda\) and the cut matrix \(xx^\top\) . To summarize, for generating good cuts, we use the following iterative scheme:

(1)

use the Goemans-Williamson rounding hyperplane technique to generate cut vector x from \(\mathcal {X}\lambda\) ;

(2)

the cut x is locally improved by checking all possible moves of a single vertex to the opposite partition block;

(3)

by using a convex combination of \(\mathcal {X}\lambda\) and \(xx^\top\) , we bring the rounding matrix towards a good cut. With this new matrix, go to step (1) and repeat as long as one finds better cuts.

Interestingly, in most of our numerical experiments, this heuristic finds the optimum solution already in the root node.

Strategy for faster enumeration of the B&B tree. As described in Section 3, simulated annealing is used to heuristically separate k-gonal inequalities. Adding these k-gonal inequalities to the model in each B&B node is not necessary, especially if one can not prune the node. In that case all the work done and time are wasted, since for bigger graphs we usually need to reach a certain depth in the B&B tree in order to prune the nodes, even if the bounding routine produces tight bounds. It is beneficial to check (before including cutting planes when processing the node) whether there is hope to prune the node or whether it is better to branch and produce smaller subproblems. We propose the following strategy.

In the root node we compute: (i) the bound \({\mathrm{OPT}}_{\bf {SDP}}\) of the basic SDP relaxation (MCSDP), which is not strong but is quick to compute; and (ii) the bound \({\mathrm{OPT}}_{\bf {HYP}}\) by iteratively including violated triangle, pentagonal, and heptagonal inequalities. Let

\begin{equation} {\mathrm{diff}} = {\mathrm{OPT}}_{\bf {SDP}} - {\mathrm{OPT}}_{\bf {HYP}}, \end{equation}

(2)

and let \({\mathrm{LB}}\) denote the current lower bound. Then, at all other nodes, we first compute only the basic SDP bound \({\mathrm{OPT}}_{\bf {SDP}}\) . If the condition

\begin{equation} {\mathrm{OPT}}_{\bf {SDP}} \le {\mathrm{LB}} + {\mathrm{diff}} + 1 \end{equation}

(3)

is satisfied, this means, we are already close to the lower bound, and so we add cutting planes to compute the tighter bound \({\mathrm{OPT}}_{\bf {HYP}}\) in order to increase the probability of pruning the node. With this idea, we can efficiently traverse the B&B tree and only invest time into the bounding routine when it is needed. Numerical results show that overall this strategy produces more B&B nodes than necessary, but the performance of the algorithm improves; in particular, it has a positive impact on the parallel version described in Section 5.2.

Numerical illustration of the bounding procedure. When using a B&B algorithm, one has two options regarding the quality of the upper bound. On one hand, a strong upper bound can be computed by iteratively adding and purging multiple cutting planes. In this way, more work is done within a node but overall this approach produces fewer B&B nodes. On the other hand, one can efficiently compute slightly weaker upper bounds, hence the whole tree grows larger but if the time spent for evaluating each node is small it can be traversed faster. In BiqBin we use the first approach.

We take the Beasley bqp250.8 problem from the BiqMac library and plot the convergence curve for the bounding routines in the solvers BiqBin and BiqCrunch in the root nodes of the B&B trees. Figure 1 depicts the decrease of the dual function values in the course of a bound computation. We note that by adding higher order k-gonal inequalities, our bounding routine attains a tighter bound compared to BiqCrunch. Consequently, BiqBin creates a smaller B&B tree, which consists of 81 nodes, while BiqCrunch terminates after traversing 325 nodes.

Fig. 1.

5.2 Parallelization of Branch-and-bound

In this section, we describe how the algorithmic ingredients from sequential B&B algorithm are combined into a parallel solver which utilizes distributive memory parallelism. This is done in a similar fashion as in the parallel B&B solver for the stable set problem [18], introduced by some of the authors of this paper.

The load coordinator–worker paradigm with distributive work pools is applied, in which the rank 0 process becomes the master process carefully managing the status of each worker (idle or busy), while different workers concurrently explore branches of the B&B tree. Each worker has its own local queue of subproblems and the work is shared when one of them becomes idle. The master node knows the status of each worker and acts as a load coordinator receiving messages and, based on their content, replying in an appropriate manner.

At the beginning of the algorithm, the load coordinator reads and broadcasts the original graph to the workers and initializes the solution. It is important that every process has the knowledge of the original graph, since construction of subproblems via branching and encoding the MPI messages is done based on this information. All the data about the B&B nodes is encoded as an MPI structure, which is used in communication between different workers in order to efficiently exchange and construct the subproblems.

Next, the master process evaluates the root node and distributes the best lower bound. After the bounding step, two new subproblems are generated, which are sent to the first two idle processes. Afterwards, its job is restricted to monitoring the status of the workers, counting the number of B&B nodes, and distributing the best solution found so far. In order to solve multiple SDP relaxations in parallel, the workers need to send and receive subproblems and keep load balance. In this way, the work is shared and the whole dynamic B&B tree is enumerated faster.

After the initialization phase, the master process waits for three types of messages sent by the workers. Firstly, if the worker’s local queue is empty, the message is sent informing the master that the process is idle and can receive further work.

Secondly, the master process receives messages regarding the load balance and sharing of work. Throughout the algorithm, the master node manages a bool array containing the statuses of the workers. They specify whether the process is active or idle. The load coordinator uses this information to reply to the requests received from the workers during the branching step, in which the process wants to share one of the newly generated subproblems with some other free worker. The load coordinator sends a message specifying which workers are free and the value of best lower bound.

And thirdly, during the execution of the algorithm, the working processes compute multiple candidates for optimum solutions. The master node is keeping track of the currently best value and the corresponding solution. When a new solution is received, the value is compared, updated if necessary, and distributed back during the communication phase.

After a worker computes lower and upper bounds, it compares these values to see whether this branch of enumeration tree can be safely pruned or further branching is needed and construction of new subproblems takes place. In the latter case, a request message is sent to the load coordinator, asking for idle processes to share one of the newly generated subproblems or subproblems left in the queue from previous branching processes. If no idle worker is available, the generated subproblems are placed in the worker’s queue and the work continues locally. Otherwise, subproblems are encoded and sent to available idle workers. This is also where the exchange of the best lower bound happens.

When all the workers become idle, the master process sends a message to finish and the algorithm terminates. The algorithms for the load coordinator and the workers are summarized in Algorithms 1 and 2.

Lastly, we explain how the parallel version benefits from the strategy using the variable “ \({\mathrm{diff}}\) ” (see Equation (6) on page ). If the number of available workers is large, we need to reach a certain depth in the B&B tree in order for the processes to receive the work. Until this happens in the algorithm, the workers are idle. To fully exploit all HPC resources available, we need a strategy for the worker processes to start evaluating the nodes as soon as possible. This is where the idea from Section 5 helps. After the load coordinator evaluates the root node, the best lower bound found and the variable \({\mathrm{diff}}\) are distributed to all the workers. When the first two idle processes evaluate the generated subproblems, typically the value of the basic SDP relaxation is such that condition (7) is not satisfied. Hence, on the first few levels of the B&B tree, the workers compute only the basic SDP bound. Then branching of new subproblems takes place and idle workers quickly receive the generated subproblems. After the bounds are such that condition (7) is valid, hypermetric inequalities are added to compute tighter bounds \({\mathrm{OPT}}_{\bf {HYP}}\) approximately. This implies that more nodes are pruned, meaning that the size of the B&B tree decreases, and thus the algorithm terminates faster.

6 BiqBin Web Application

The BiqBin solver is available as a web application. Its main purpose is to enable (registered) users to test the solver on their own problem data.

The web application contains the main information about the BiqBin project and benchmark results, which are publicly available, see Figure 2 for a schematic framework. Upon registration, a user can access the core functionalities of the application. In short, the users upload their problem data files and decide which algorithm/solver they want to use to solve the problem. The application then sends these data to an HPC, where the selected algorithm is executed against the given data, and, after some time interval, it returns the solution (optimum or approximate) back to the web application, which finally notifies the users by an email that the result is ready.

Fig. 2.

Let us describe the web application in more details. The initial step for a user is to prepare the problem data in an appropriate format, which is described in the description of each function.⁸ Then, the user creates a new instance, which is the entity determining the problem data. The user determines the instance’s name, data (by uploading the file with all the data), and description. In this way, the user is able to reuse the data later, e.g., for testing the performance of other solvers on the same instance.

Next, the user creates a task, which is comprised of an instance and a solver (function) the user wants to run with the instance’s data. The user also determines the task’s name, and specifies if the result and the instance’s data can be listed publicly for benchmarking purposes. When all the properties are specified, the task can either be started automatically, or the user can decide to start it later.

Starting a task means that the system sends a demand with the execution data to a specified HPC. The administrators can configure which HPC should be used for each of the available functions or even for a user. For each sent task, a job is started on a specified number of cores. Execution of a solver is being regularly monitored by saving intermediate results per every given time interval. After a specified maximum execution time, or after finding an optimum solution, the solver’s execution is stopped, and the result files are sent back to the web application together with the task’s metadata. The result is saved in the database and the user is notified by an email that the job is completed. If the task has been marked as publicly available, then a benchmark record is also created and immediately visible on Benchmark’s site.

The system is designed in such a way that additional solvers or algorithms can be easily added. The number of cores used for a single task is fixed, so that one user does not use all of the available infrastructure. However, the users, who have special roles assigned, can specify the number of cores themselves. In this way, the architecture of the system is highly extendible in all main directions. Scalability depends on the HPC facilities, but this is also considered in the web application, since tasks for every solver can connect to a selected HPC.

7 Numerical Results

In the numerical part of the paper, we benchmark the BiqBin solver with BiqCrunch [25], Gurobi [14], and SCIP [11] on four families of test instances of (BQP), which we describe in the following paragraphs. We decided for these solvers because they are well-known to be among the best solvers for these type of problems and are freely available for research purposes. All the solvers received as input the original problem (BQP), hence only BiqBin applies the reformulation into a max-cut instance. This means that Gurobi and SCIP (with SoPlex as LP solver) use all the MIP techniques that are available. We ran Gurobi, SCIP and BiqCrunch using the default parameters. When selecting the new branching variable the BiqCrunch solver uses the most-fractional rule by default. We also planned to include CPLEX in this study but we were not able to obtain the academic licence for the HPC system that we were using.

All computations were performed on the HPC system at University of Ljubljana, Faculty of mechanical engineering. We used the E5-2680 V3 (1008 hyper-cores) DP cluster, with IB QDR interconnection, 164 TB of LUSTRE storage, 4.6 TB RAM, supplemented by GPU accelerators. As a Message Passing Interface library we used Open MPI and the code is compiled against OpenBLAS and LAPACK. The BiqBin solver was run directly on the cluster, so the computational times might be slightly different compared to the times obtained by the BiqBin on-line solver.

7.1 Four Special Families of (BQP) Instances Used for Numerical Tests

Max-Cut instances. The first family of (BQP) benchmark instances consists of Max-Cut instances. We selected benchmark instances from the BiqMac library that have been considered as hard by other solvers (see [39] for more details). Furthermore, we used rudy, a graph generator written by Giovanni Rinaldi [36], to construct new (hard) Max-Cut instances with 180 nodes, similar to the ones in the BiqMac library.

Unconstrained BQP instances. The second family of (BQP) instances consists of two sets of unconstrained BQP. The instances of the first set, due to Beasley [2], have density 0.1 with values uniformly chosen from \([- 100,100]\) . The second set of unconstrained BQP instances were generated by Billionet and Elloumi [5] according to [32]. They have size \(n \in \lbrace 100,120,150,200\rbrace\) and different densities. The diagonal coefficients are in the range \([ - 100,100]\) , while the off-diagonal ones are in from \([ - 50,50 ]\) . Similar instances with size 250 and density 0.1 were taken from the BiqMac library. For more details on these instances we refer to [39].

Densest k-subgraph instances. The third family of benchmark problems consists of instances of the (DkS), described in Section 1.1. It is a binary quadratic problem with one linear constraint, also called the cardinality constraint. These problems, also known as cardinality Boolean quadratic programming problems, include a wide range of applications in telecommunications or chemistry, see [29] for further details. We solve the benchmark instances that can be found at [26]. They have different sizes \(n \in \lbrace 120,140,160\rbrace\) , densities \(d \in \lbrace 0.25, 0.50, 0.75\rbrace\) , and values for the parameter \(k \in \lbrace n/4, n/2, 3n/4\rbrace\) . For testing the parallel version of BiqBin, we also created similar instances with sizes \(n \in \lbrace 180,200\rbrace\) . They can be found at the BiqBin web page.⁹

Randomly generated instances of (BQP). The fourth family of instances consists of randomly generated instances with a varying number of linear constraints. These instances have size \(n = 100\) and up to 15 constraints, their description can be found in [16] and they are available at [15].

7.2 Comparison of Sequential Algorithms

In this section, we compare the sequential version of BiqBin with BiqCrunch [25], GUROBI [14], and SCIP [11] solvers on the four families of problems introduced in the previous section. We note that for the first three families of benchmark problems GUROBI and SCIP did not solve any instance within three hours, hence we report results for these problems only for BiqBin and BiqCrunch.

We present the results of the comparisons of these two solvers on the Max-Cut instances in Tables 1 and 2, the results for other unconstrained BQPs are in Tables 3 and 4, and the results for the densest k-subgraph problem are in Tables 5 and 6.

Table 1.

instance group	# inst.	n	density	B&B (min)	B&B (max)	B&B (avg)	time (min)	time (max)	time (avg)	init. gap (min)	init. gap (max)	init. gap (avg)
g05_100	10	100	0.50	11	697	191.8	18.7	951.2	196.1	0.3%	1.1%	0.7%
pm1d_100	10	100	0.99	21	839	282.4	40.0	681.4	231.1	2.4%	8.1%	4.9%
pm1s_100	10	100	0.10	1	15	5.8	0.8	22.5	8.1	0.7%	3.3%	1.5%
pw01_100	10	100	0.10	1	13	4.2	1.5	31.4	12.5	0.0%	0.5%	0.1%
pw05_100	10	100	0.50	13	289	112.2	53.5	503.9	204.2	0.1%	0.9%	0.6%
pw09_100	10	100	0.90	39	201	114.4	67.9	349.4	187.7	0.2%	0.5%	0.4%
w01_100	10	100	0.10	1	13	2.4	1.7	20.7	5.3	0.1%	1.4%	0.3%
w05_100	10	100	0.50	9	199	78.0	37.4	371.7	167.1	0.9%	5.8%	3.2%
w09_100	10	100	0.90	3	1243	257.0	18.1	1114.1	289.4	0.1%	6.6%	3.7%

Table 1. Numerical Results Obtained with Sequential BiqBin on Rudy Instances for the Max-Cut Problem

For an explanation of the columns see page 17.

Table 2.

instance group	# inst.	n	density	B&B (min)	B&B (max)	B&B (avg)	time (min)	time (max)	time (avg)	init. gap (min)	init. gap (max)	init. gap (avg)
g05_100	10	100	0.50	33	1751	367.0	44.3	1933.5	408.8	0.4%	1.2%	0.7%
pm1d_100	10	100	0.99	39	1073	415.4	57.2	1302.3	519.6	3.2%	8.9%	5.7%
pm1s_100	10	100	0.10	1	39	12.6	0.9	51.9	17.9	0.7%	4.0%	2.0%
pw01_100	10	100	0.10	1	45	11.0	1.3	86.9	22.9	0.0%	0.9%	0.4%
pw05_100	10	100	0.50	49	999	346.0	101.2	1155.5	453.0	0.4%	1.2%	0.8%
pw09_100	10	100	0.90	113	589	280.8	170.4	824.2	399.2	0.4%	0.6%	0.5%
w01_100	10	100	0.10	1	23	3.8	1.1	39.0	7.0	0.1%	2.5%	0.4%
w05_100	10	100	0.50	31	401	210.4	58.9	564.0	304.2	2.3%	7.5%	4.7%
w09_100	10	100	0.90	7	1667	423.2	17.3	2447.4	626.7	0.9%	8.1%	5.1%

Table 2. Numerical Results Obtained with BiqCrunch on Rudy Instances for the Max-Cut Problem

Table 3.

instance group	# inst.	n	density	B&B (min)	B&B (max)	B&B (avg)	time (min)	time (max)	time (avg)	init. gap (min)	init. gap (max)	init. gap (avg)
be100	10	100	1.00	1	9	2.2	7.4	81.7	26.2	0.0%	0.3%	0.1%
be120.3	10	120	0.30	1	1	1.0	5.6	31.5	11.7	0.0%	0.0%	0.0%
be120.8	10	120	0.80	1	35	10.4	17.9	267.4	123.2	0.0%	1.0%	0.3%
be150.3	10	150	0.30	1	191	30.8	16.3	832.9	286.2	0.0%	1.8%	0.4%
be150.8	10	150	0.80	5	223	68.8	221.8	1303.6	656.6	0.2%	1.2%	0.7%
be200.3	10	200	0.30	1	959	168.4	80.1	30241.9	6085.5	0.0%	2.3%	0.9%
be200.8	10	200	0.80	7	1095	331.6	769.5	37449.9	12280.0	0.1%	2.2%	1.2%
be250	10	250	0.10	1	15	3.8	59.6	904.6	388.8	0.0%	0.3%	0.0%
bqp100	10	100	0.10	1	1	1.0	1.2	10.6	3.0	0.0%	0.0%	0.0%
bqp250	10	250	0.10	1	81	9.8	81.7	9137.0	1147.7	0.0%	1.3%	0.1%

Table 3. Numerical Results Obtained with Sequential BiqBin on Instances for Unconstrained BQP

Table 4.

instance group	# inst.	n	density	B&B (min)	B&B (max)	B&B (avg)	time (min)	time (max)	time (avg)	init. gap (min)	init. gap (max)	init. gap (avg)
be100	10	100	1.00	1	11	3.6	2.2	54.6	17.2	0.0%	0.7%	0.2%
be120.3	10	120	0.30	1	3	1.4	1.8	26.3	6.6	0.0%	0.1%	0.0%
be120.8	10	120	0.80	3	57	17.2	15.2	422.1	126.0	0.0%	1.6%	0.7%
be150.3	10	150	0.30	1	79	25.6	4.5	841.0	295.4	0.0%	2.3%	0.7%
be150.8	10	150	0.80	19	141	62.8	226.2	1801.5	795.1	0.7%	1.7%	1.1%
be200.3	10	200	0.30	3	2253	409.0	70.2	47534.1	9038.7	0.0%	2.8%	1.4%
be200.8	10	200	0.80	17	2803	823.0	424.5	66334.4	20303.7	0.2%	2.7%	1.5%
be250	10	250	0.10	1	13	4.6	18.3	529.9	185.7	0.0%	0.4%	0.1%
bqp100	10	100	0.10	1	1	1.0	0.8	3.3	1.3	0.0%	0.0%	0.0%
bqp250	10	250	0.10	1	325	35.8	20.9	12812.7	1413.5	0.0%	1.9%	0.3%

Table 4. Numerical Results Obtained with BiqCrunch on Instances for Unconstrained BQP

Table 5.

instance group	# inst.	n	density	B&B (min)	B&B (max)	B&B (avg)	time (min)	time (max)	time (avg)	init. gap (min)	init. gap (max)	init. gap (avg)
120_30_0.25	5	120	0.25	19	57	30.6	105.9	196.6	143.7	1.9%	2.7%	2.2%
120_30_0.5	5	120	0.50	35	123	62.6	134.8	617.7	300.0	1.6%	2.1%	1.8%
120_30_0.75	5	120	0.75	13	289	111.4	47.6	1241.3	473.0	0.7%	1.9%	1.3%
120_60_0.25	5	120	0.25	1	25	11.8	13.7	113.2	62.6	0.2%	0.6%	0.4%
120_60_0.5	5	120	0.50	15	29	19.0	73.1	146.3	104.9	0.3%	0.4%	0.4%
120_60_0.75	5	120	0.75	1	13	7.8	8.5	77.8	38.0	0.1%	0.2%	0.2%
120_90_0.25	5	120	0.25	1	1	1.0	3.6	12.1	7.9	0.1%	0.1%	0.1%
120_90_0.5	5	120	0.50	3	19	9.8	38.5	184.5	103.7	0.0%	0.2%	0.1%
120_90_0.75	5	120	0.75	1	1	1.0	4.8	14.8	10.5	0.0%	0.0%	0.0%
140_35_0.25	5	140	0.25	25	261	125.4	134.1	1426.4	753.8	1.6%	2.9%	2.3%
140_35_0.5	5	140	0.50	83	1079	383.4	479.0	7274.1	2337.9	1.6%	2.7%	2.0%
140_35_0.75	5	140	0.75	159	1089	530.2	987.8	6486.0	3157.0	1.2%	1.7%	1.5%
140_70_0.25	5	140	0.25	3	131	45.4	38.0	634.3	245.1	0.3%	0.7%	0.5%
140_70_0.5	5	140	0.50	17	313	168.2	116.7	2098.5	1264.9	0.3%	0.6%	0.5%
140_70_0.75	5	140	0.75	3	49	19.4	35.5	153.4	87.3	0.1%	0.2%	0.2%
140_105_0.25	5	140	0.25	1	1	1.0	7.0	18.3	14.6	0.1%	0.1%	0.1%
140_105_0.5	5	140	0.50	1	23	7.0	16.5	274.7	95.0	0.0%	0.1%	0.1%
140_105_0.75	5	140	0.75	1	21	7.8	10.0	208.4	78.1	0.0%	0.1%	0.0%
160_40_0.25	5	160	0.25	49	773	298.6	421.4	8247.5	2885.2	1.6%	3.0%	2.3%
160_40_0.5	5	160	0.50	369	19307	4472.6	3510.2	170137.7	39019.6	1.6%	3.0%	2.1%
160_40_0.75	5	160	0.75	241	9353	3213.0	2600.9	71863.7	27203.1	1.0%	1.8%	1.5%
160_80_0.25	5	160	0.25	13	255	108.6	123.9	2091.8	885.8	0.4%	0.7%	0.5%
160_80_0.5	5	160	0.50	29	627	252.2	272.3	5303.6	2080.1	0.2%	0.5%	0.4%
160_80_0.75	5	160	0.75	3	3841	1045.0	44.9	27446.0	7769.6	0.1%	0.5%	0.3%
160_120_0.25	5	160	0.25	1	67	21.4	24.9	824.4	275.3	0.0%	0.2%	0.1%
160_120_0.5	5	160	0.50	1	51	18.6	38.7	611.2	248.8	0.0%	0.1%	0.1%
160_120_0.75	5	160	0.75	3	39	19.0	60.6	475.6	264.2	0.0%	0.1%	0.1%

Table 5. Numerical Results Obtained with Sequential BiqBin on Instances for the Densest k-subgraph Problem

Table 6.

instance group	# inst.	n	density	B&B (min)	B&B (max)	B&B (avg)	time (min)	time (max)	time (avg)	init. gap (min)	init. gap (max)	init. gap (avg)
120_30_0.25	5	120	0.25	43	111	62.2	70.5	240.9	132.0	2.0%	2.9%	2.3%
120_30_0.5	5	120	0.50	55	185	108.6	82.5	332.3	182.7	1.7%	2.2%	2.0%
120_30_0.75	5	120	0.75	15	763	240.6	24.7	941.0	312.3	0.8%	2.0%	1.4%
120_60_0.25	5	120	0.25	3	61	27.0	7.2	128.2	60.2	0.3%	0.7%	0.5%
120_60_0.5	5	120	0.50	31	63	44.2	63.5	139.0	99.5	0.3%	0.4%	0.4%
120_60_0.75	5	120	0.75	7	31	19.4	12.3	81.3	47.3	0.1%	0.3%	0.2%
120_90_0.25	5	120	0.25	1	1	1.0	3.6	5.6	4.4	0.0%	0.1%	0.1%
120_90_0.5	5	120	0.50	3	15	7.8	11.6	69.9	34.9	0.0%	0.1%	0.1%
120_90_0.75	5	120	0.75	1	1	1.0	2.0	3.0	2.5	0.0%	0.0%	0.0%
140_35_0.25	5	140	0.25	39	447	225.4	101.2	1137.6	574.7	1.8%	3.1%	2.5%
140_35_0.5	5	140	0.50	139	1813	613.8	308.1	3714.1	1297.0	1.8%	3.0%	2.2%
140_35_0.75	5	140	0.75	247	2063	985.0	610.8	3433.2	1748.5	1.3%	1.8%	1.6%
140_70_0.25	5	140	0.25	7	179	69.8	24.7	486.0	200.6	0.3%	0.8%	0.6%
140_70_0.5	5	140	0.50	37	877	415.0	108.7	2322.6	1203.5	0.3%	0.7%	0.5%
140_70_0.75	5	140	0.75	7	59	27.0	31.6	152.2	76.3	0.1%	0.2%	0.2%
140_105_0.25	5	140	0.25	1	1	1.0	4.1	5.9	5.0	0.0%	0.1%	0.1%
140_105_0.5	5	140	0.50	1	11	4.2	4.4	67.5	24.6	0.0%	0.1%	0.0%
140_105_0.75	5	140	0.75	1	7	2.6	3.6	42.6	15.1	0.0%	0.1%	0.0%
160_40_0.25	5	160	0.25	73	1215	490.2	256.3	4317.5	1697.0	1.7%	3.1%	2.4%
160_40_0.5	5	160	0.50	509	25625	5825.8	1535.2	68554.5	15918.8	1.7%	3.0%	2.2%
160_40_0.75	5	160	0.75	327	12841	4636.6	1185.9	28700.9	11046.9	0.9%	1.9%	1.5%
160_80_0.25	5	160	0.25	31	485	213.4	126.1	1875.8	817.2	0.4%	0.8%	0.6%
160_80_0.5	5	160	0.50	33	1705	552.6	156.8	6592.5	2150.1	0.2%	0.6%	0.4%
160_80_0.75	5	160	0.75	3	8025	2191.8	17.6	29413.2	8229.8	0.0%	0.5%	0.3%
160_120_0.25	5	160	0.25	1	31	10.2	8.5	237.0	81.3	0.0%	0.2%	0.1%
160_120_0.5	5	160	0.50	1	21	7.0	7.1	186.3	62.4	0.0%	0.1%	0.0%
160_120_0.75	5	160	0.75	1	9	4.6	7.3	83.5	40.2	0.0%	0.0%	0.0%

Table 6. Numerical Results Obtained with (Tailored Version of) BiqCrunch on Instances for the Densest k-subgraph Problem

Before we list the tables, we summarize the results by the performance profile diagrams on Figure 3. These diagrams were created using the same raw data as we summarised in Tables 3 and 7. There were large time spans for each solver and each data set, so the diagrams are quite coarse-grained. Nevertheless, we can see that on the Max-Cut, unconstrained BQP and randomly generated instances of BQP our solver BiqBin is outperforming the other solvers. On the instances of the densest k-subgraph problem BiqCrunch is performing slightly better then BiqBin.

Fig. 3.

Table 7.

instance group	# inst.	n	density	solved instances	solved instances	solved instances
				BiqBin	Gurobi	SCIP
100_A_0_1_b_10_F_-10_10	15	100	0.95	15	2	0
100_A_0_1_b_10_F_-5_5	15	100	0.91	14	1	0
100_A_0_1_b_10_F_0_10	15	100	0.91	15	15	2
100_A_0_1_b_10_F_0_5	15	100	0.83	15	14	6
100_A_0_1_b_15_F_-10_10	15	100	0.95	15	1	0
100_A_0_1_b_15_F_-5_5	15	100	0.91	15	0	0
100_A_0_1_b_15_F_0_10	15	100	0.91	15	15	4
100_A_0_1_b_15_F_0_5	15	100	0.83	15	15	2
100_A_0_1_b_20_F_-10_10	15	100	0.95	15	2	0
100_A_0_1_b_20_F_-5_5	15	100	0.91	15	0	0
100_A_0_1_b_20_F_0_10	15	100	0.91	15	15	1
100_A_0_1_b_20_F_0_5	15	100	0.83	15	15	3
100_A_0_3_b_10_F_-10_10	15	100	0.95	7	15	1
100_A_0_3_b_10_F_-5_5	15	100	0.91	7	15	6
100_A_0_3_b_10_F_0_10	15	100	0.91	2	15	15
100_A_0_3_b_10_F_0_5	15	100	0.83	4	15	15
100_A_0_3_b_15_F_-10_10	15	100	0.95	6	7	0
100_A_0_3_b_15_F_-5_5	15	100	0.91	6	8	0
100_A_0_3_b_15_F_0_10	15	100	0.91	2	15	7
100_A_0_3_b_15_F_0_5	15	100	0.83	4	15	6
100_A_0_3_b_20_F_-10_10	15	100	0.95	5	0	0
100_A_0_3_b_20_F_-5_5	15	100	0.91	8	1	0
100_A_0_3_b_20_F_0_10	15	100	0.91	3	8	5
100_A_0_3_b_20_F_0_5	15	100	0.83	5	8	6
100_A_-1_1_b_0_F_-1_1	15	100	0.66	13	3	0
100_A_-1_1_b_0_F_-3_3	15	100	0.86	11	2	0
100_A_-1_1_b_0_F_-7_7	15	100	0.93	11	6	0
100_A_-3_3_b_0_F_-1_1	15	100	0.66	6	2	0
100_A_-3_3_b_0_F_-3_3	15	100	0.86	6	1	0
100_A_-3_3_b_0_F_-7_7	15	100	0.93	4	2	0
100_A_-7_7_b_0_F_-1_1	15	100	0.66	3	2	0
100_A_-7_7_b_0_F_-3_3	15	100	0.86	3	1	0
100_A_-7_7_b_0_F_-7_7	15	100	0.93	3	0	0
total	495			298	236	79

Table 7. Comparison of Sequential BiqBin, Gurobi and SCIP on Randomly Generated Instances of (BQP)

To highlight the difference between the solvers BiqBin and BiqCrunch, we also provide a relative performance profile in Figure 4. It shows, for each factor \(\alpha \in \lbrace 2^{-3},2^{-2},\ldots ,2^3\rbrace\) , the percentage of instances from the Max-Cut, unconstrained BQP and densest k-subgraph families of instances, respectively, for which the computation time required by BiqCrunch was shorter than \(\alpha\) times the computation time required by BiqBin. For the Max-Cut instances, this plot reveals that BiqCrunch does not require less than 1/2 of the time needed by BiqBin for any instance (i.e., it is never more than two times faster than BiqBin), but it is faster than BiqBin on about 13% of the Max-Cut instances, i.e., it is slower than BiqBin for about 87% of the Max-Cut instances. For the densest k-subgraph instances the situation is reversed: for approx. 84% of the instances, BiqCrunch is faster than BiqBin, while for the unconstrained BQP instances, both solvers perform similarly.

Fig. 4.

The first three columns in each of these tables contain the data about the instances: for each group of instances, we used several instances, e.g., for the Max-Cut problem and for the unconstrained BQP, we used 10 instances for every combination of size n and density d, and for the densest k-subgraph we used five instances. The second set of columns in each table contains the minimum, the maximum, and the average number of nodes in the B&B trees generated by BiqBin and BiqCrunch, respectively. Similarly, the third set of columns reports the minimum, the maximum, and the average computing times (in seconds) needed for the instances, and the last set of columns contains the minimum, the maximum, and the average initial gaps (in %).

For the Max-Cut problem, we can see in Tables 1 and 2 that the (sequential) BiqBin is always approximately two times faster than BiqCrunch and produces much fewer nodes in the B&B tree. One reason for this lies in the fact that the initial gap in the root node (the difference between the upper and lower bound, divided by the lower bound) is smaller for BiqBin.

For the unconstrained BQP (Tables 3 and 4), we can see that both, BiqBin and BiqCrunch, perform very similar. However, for BiqBin the B&B trees are much smaller, so there is potential for improving the computing times by increasing the computational efficiency in each B&B node.

For the densest k-subgraph problem (Tables 5 and 6), we can observe that BiqCrunch is in most cases slightly better (sometimes also much better) than BiqBin. The reason for this is that BiqCrunch has a specific version adapted to solving this problem, where the basic semidefinite relaxation is re-enforced with triangle and product constraints.

Turning to the randomly generated instances, GUROBI and SCIP performed well, while the BiqCrunch solver showed very poor performance. Thus, we report in Table 7 only the results for BiqBin, GUROBI and SCIP. The randomly generated instances have different intervals from which the entries of A, b, and F are selected, thus for every combination, we have 15 instances with an increasing number of constraints, from 1 to 15. Table 7 contains in the first column the name of the family of instances, the next three columns contain data about the numbers of instances (always 15), the sizes of problem n (always 100), and the densities of the data (the number of non-zero elements in F divided by \(n^2\) ). For each of the three solvers, we report the number of instances solved by the solver in the time limit of three hours. These instances are also considered in the performance profile 3d.

From Table 7 and from the performance profile 3d it can be observed that, overall, BiqBin is outperforming all the other solvers. It solves for each family at least two out of the 15 instances, and on average it solves 60% of the instances, while GUROBI and SCIP solve 48% and 16% of the instances, respectively.

We can see that BiqBin has worst performance on instances starting with 100_A_0_3_ in the middle part of the table. We do not have an unambiguous explanation for this, but the reason is very likely that the linear constraints enable generating good cutting planes, thus helping GUROBI and SCIP. Obviously, the linear constraints for the instances starting with 100_A_0_3_ yield more efficient cutting planes which enable these two solvers to more efficiently prune the B&B tree and therefore more often finish in a given time limit. Some of the 100_A_0_3 instances are also infeasible, while the instances having in the middle _b_0_ are always feasible (they have zeros on the right hand, i.e., \(\mathbf {b}= \mathbf {0}\) , hence \(\mathbf {x}= \mathbf {0}\) is always a feasible solution). GUROBI and SCIP are solving problems in the (BQP) form, while BiqBin first reformulates the instances into the(Max-Cut) format. This suggests to improve the performance of BiqBin by enhancing an early stopping condition for the case of infeasibility using Theorem 1.

7.3 Scaling Properties of BiqBin

In this section, we demonstrate how BiqBin is scaling across the high-performance computer that we used within the MPI framework. We measured wall-times needed to solve each instance using the sequential solver and the parallel solver with \(3, 6,\ldots ,48\) CPU cores (i.e., MPI processes). Note that one of the CPU cores was always reserved for the coordinator’s tasks, while the others were used as workers.

In the “sequential” columns of Table 8, we report the computational times needed by the sequential algorithm and the times needed for the computations in the root node of the B&B tree. The former are used to compute the speed-up factor while the latter are considered as the times for the non-parallelizable parts of BiqBin and are used to compute the upper bounds for the speed-up factors.

Table 8.

instance name	n	density	B&B	time	root time	B&B	time	B&B	time	B&B	time	B&B	time	B&B	time
			sequential			3 cores		6 cores		12 cores		24 cores		48 cores
g05_100.1	100	0.50	697	951.18	2.68	769	539.40	787	287.20	879	132.70	769	66.00	711	36.50
g05_100.3	100	0.50	603	320.52	2.61	585	182.60	447	80.40	677	40.80	417	28.60	403	15.50
pm1d_100.0	100	0.99	395	366.87	3.61	433	204.50	507	95.00	367	54.60	353	29.50	357	20.80
pm1d_100.1	100	0.99	839	681.38	2.74	545	363.60	681	159.60	951	86.90	867	41.10	1045	31.70
pm1d_100.2	100	0.99	551	372.83	3.70	493	220.80	621	105.70	443	56.40	525	32.10	369	16.70
pm1d_100.4	100	0.99	557	357.24	3.72	439	228.60	445	104.20	571	53.20	435	31.00	371	23.80
pw05_100.0	100	0.50	289	487.86	8.13	375	247.50	379	138.30	405	80.10	391	47.50	279	43.20
pw05_100.6	100	0.50	263	503.85	6.71	183	279.80	183	113.00	215	59.40	313	47.10	279	30.00
pw09_100.1	100	0.90	201	349.38	6.00	165	183.90	171	110.30	221	47.40	217	31.70	197	24.60
pw09_100.5	100	0.90	137	260.52	6.65	107	181.20	133	66.40	133	51.70	121	38.10	99	43.60
pw09_100.7	100	0.90	177	312.68	7.70	285	189.30	203	94.90	223	52.40	235	36.50	225	36.40
w05_100.4	100	0.50	199	371.73	8.41	161	192.10	181	111.10	197	55.50	127	36.20	137	29.60
w05_100.5	100	0.50	117	310.38	6.23	119	145.20	155	73.50	119	52.30	103	35.80	131	35.70
w05_100.8	100	0.50	151	293.63	8.35	119	185.30	139	102.20	153	47.80	195	35.60	185	29.50
w09_100.1	100	0.90	1243	1114.08	8.53	501	610.70	771	299.00	641	159.20	561	84.00	711	60.30
w09_100.2	100	0.90	325	357.42	8.49	255	179.00	293	90.40	253	50.20	207	27.40	229	35.20
w09_100.3	100	0.90	575	444.20	8.39	587	245.30	269	111.80	371	66.10	375	41.50	347	33.30
w09_100.4	100	0.90	115	297.50	8.03	101	202.00	105	83.60	99	54.40	127	52.10	99	43.50

Table 8. Numerical Results Obtained with Parallel BiqBin for Instances of the Max-Cut Problem

Pairs of columns denoted by “3 cores”, “6 cores”, etc. contain the sizes of the B&B trees and the wall-times of the parallel BiqBin using \(3, 6,\ldots\) CPU cores, respectively. For each instance inst from Table 8, we compute a vector of speed-up factors \({{\tt {speed-up} }_{{\tt inst}}}\) , defined as (see, e.g., [Equation (3.4)][31])

\begin{equation*} {{\tt {speed-up} }_{{\tt inst}}}(i)=\frac{{\tt time}_1}{{\tt time}_i}, \end{equation*}

where \({\tt time}_1\) is the time needed by the sequential solver while \({\tt time}_i\) denotes time needed by parallel solver using i CPU cores. These factors are aggregated in Table 9 for each family of instances and are depicted in Figure 5.

Fig. 5.

Table 9.

CPU cores	1	3	6	12	24	48	96
g05	1.00	1.76	3.46	7.33	13.44	24.46	34.28
pm1d	1.00	1.75	3.83	7.08	13.30	19.12	24.70
pw05	1.00	1.88	3.95	7.11	10.48	13.55	19.26
pw09	1.00	1.66	3.40	6.09	8.68	8.82	11.87
w05	1.00	1.87	3.40	6.27	9.07	10.29	12.48
w09	1.00	1.79	3.78	6.71	10.80	12.85	17.59

Table 9. Aggregated Scaling Factors

The g05 instances have on average best scaling factors, while instances pw09 and w05 scale worst.

Times from the “root time” column of Table 8 are used to estimate the proportion of the BiqBin solver that is non-parallelizable. We denote this estimate by s and compute it as the sum of the 6th column divided by the sum of the 5th column (result is \(s=0.0136\) ). The upper-bounds for the speed-up factors, are computed according to Amdahl’s law by formula ([31, Equation (3.6)])

\begin{equation*} \mbox{UB}(n)=\frac{1}{s+(1-s)/n}, \end{equation*}

where n corresponds to the number of CPU cores available. These upper-bounds are depicted by the green curve in Figure 5. We can see that speed-up factors are close to the theoretical bound for the problems g05 and deviate a lot for the problems pw09 and w05.

Table 8 also contains the numbers of nodes in the B&B trees generated by the parallel solver with different numbers of CPU cores. We can see that these numbers are different from the sequential solver and are varying with the number of CPU cores. The reason is mainly in the fact that parallel computations in different CPU cores are no more deterministic. Generating cutting planes in the process of solving ((MCHYP)) includes random numbers. We fix the seed for the random number generator at the beginning of the computation in each CPU core. However, when we vary the number of CPU cores, the amount and the order of the computational work in the cores change. Consequently, different cutting planes might be applied to the same B&B node and this results in slightly different bounds \({\mathrm{OPT}}_{\bf {HYP}}\) and finally in different sizes of B&B trees.

7.4 Solving Large Instances with Parallel BiqBin

In this section, we report numerical results obtained by parallel BiqBin on the Max-Cut, unconstrained BQP and the densest k-subgraph instances that were to the best of our knowledge not solved so far. We used 200 CPU cores and the results are collected in Tables 10 and 12. We can see that the hardest families of instances are again g05 and w09. To solve them, we need to compute more than one million B&B nodes, which took at least six hours of wall-time.

Table 10.

instance group	# inst.	n	density	B&B (avg)	time (avg)
				200 cores
g05_180	10	180	0.50	1712858.6	20700.5
pm1d_180	10	180	0.99	522354.4	10056.6
pm1s_180	10	180	0.10	60004.2	1163.1
pw01_180	10	180	0.10	4935.0	249.1
pw05_180	10	180	0.50	887735.0	20602.4
pw09_180	10	180	0.90	750231.8	18468.2
w01_180	10	180	0.10	36913.2	637.9
w05_180	10	180	0.50	674946.4	16216.1
w09_180	10	180	0.90	2061425.2	35320.7

Table 10. Numerical Results Obtained with Parallel BiqBin for Large Instances of the Max-Cut Problem

Table 11.

instance group	# inst.	n	density	B&B (avg)	time (avg)
				200 cores
be250.3	10	250	0.30	1367.4	1133.0
be250.8	10	250	0.80	7838.8	3583.2
be300.3	10	300	0.30	5462.8	4207.5
be300.8	10	300	0.80	110806.8	74765.2

Table 11. Numerical Results Obtained with Parallel BiqBin for Large Instances of the Unconstrained BQP

Table 12.

instance group	# inst.	n	density	B&B (avg)	time (avg)
				200 cores
120_30_0.25	5	120	0.25	33.8	32.1
120_30_0.5	5	120	0.50	70.2	33.9
120_30_0.75	5	120	0.75	117.0	29.0
120_60_0.25	5	120	0.25	17.4	27.8
120_60_0.5	5	120	0.50	21.4	34.7
120_60_0.75	5	120	0.75	10.2	20.2
120_90_0.25	5	120	0.25	1.0	9.8
120_90_0.5	5	120	0.50	12.6	45.0
120_90_0.75	5	120	0.75	1.0	12.5
140_35_0.25	5	140	0.25	146.2	51.5
140_35_0.5	5	140	0.50	395.8	59.7
140_35_0.75	5	140	0.75	479.4	59.4
140_70_0.25	5	140	0.25	58.2	37.9
140_70_0.5	5	140	0.50	173.8	56.0
140_70_0.75	5	140	0.75	12.6	29.3
140_105_0.25	5	140	0.25	1.0	17.9
140_105_0.5	5	140	0.50	9.8	50.3
140_105_0.75	5	140	0.75	7.4	37.6
160_40_0.25	5	160	0.25	395.0	84.5
160_40_0.5	5	160	0.50	4817.8	329.9
160_40_0.75	5	160	0.75	3398.6	234.5
160_80_0.25	5	160	0.25	113.4	62.1
160_80_0.5	5	160	0.50	253.8	76.8
160_80_0.75	5	160	0.75	1071.4	92.1
160_120_0.25	5	160	0.25	25.4	84.6
160_120_0.5	5	160	0.50	29.4	72.7
160_120_0.75	5	160	0.75	17.8	76.5
180_45_0.25	5	180	0.25	1548.2	188.6
180_45_0.5	5	180	0.50	2109.4	265.0
180_45_0.75	5	180	0.75	29243.4	2089.7
180_90_0.25	5	180	0.25	923.0	144.9
180_90_0.5	5	180	0.50	1167.4	165.8
180_90_0.75	5	180	0.75	2623.8	246.9
180_135_0.25	5	180	0.25	11.8	79.5
180_135_0.5	5	180	0.50	29.4	133.2
180_135_0.75	5	180	0.75	29.4	113.2
200_50_0.25	5	200	0.25	7400.6	849.4
200_50_0.5	5	200	0.50	24145.4	2578.5
200_50_0.75	5	200	0.75	53016.2	4980.4
200_100_0.25	5	200	0.25	1392.2	254.9
200_100_0.5	5	200	0.50	3801.8	501.3
200_100_0.75	5	200	0.75	3067.4	358.6
200_150_0.25	5	200	0.25	43.4	145.1
200_150_0.5	5	200	0.50	129.0	177.1
200_150_0.75	5	200	0.75	621.0	260.3

Table 12. Numerical Results Obtained with Parallel BiqBin for Large Instances of the Densest k-subgraph Problem

8 Conclusions

In this paper we describe BiqBin, a solver for linearly constrained quadratic problems, which is capable to solve optimality instances that are, due to their size, unsolvable by other existing methods and tools. The main idea underlying this solver is the exact penalty reformulation of a (BQP) instance to an instance of () introduced by Lasserre and enhanced by two co-authors of this paper in [16].

We provide the necessary theoretical results needed to explain the work-flow of the problem reformulations, relaxations, and finally, the details related to the C implementation of BiqBin as an efficient parallel solver. The solver is also available as a web service, which is connected to the high-performance computer at the University of Ljubljana, Faculty of mechanical engineering.

We present extensive numerical results, where BiqBin is benchmarked against BiqCrunch, GUROBI and SCIP. It can be concluded that BiqBin outperforms other solvers on the Max-Cut instances, that it is competitive with BiqCrunch on the instances of unconstrained binary quadratic problems, and that it is slightly worse than BiqCrunch on the instances of the densest k-subgraph problem. The latter is expected since BiqCrunch is specially adapted to solve problems of this type. On these three families of instances, GUROBI and SCIP are non-competitive.

However, when the number of linear constraints slightly increases, like it happens on the fourth family of benchmark instances (randomly generated instances of (BQP)), GUROBI and SCIP become solvers of interest. They solve the problems in the original formulation and the initial linear constraints are important in generating new cutting planes. Therefore, a larger number of linear constraints usually results in a better performance of these solvers, while BiqBin reformulates the problem into an instance of Max-Cut, hence the feasible set always consists of all possible binary vectors. However, on the benchmark instances, BiqBin is still best-performing, while BiqCrunch demonstrates very weak performance and was eliminated from the reported numerical results.

We showed that the BiqBin solver scales very well, hence high-performance computers are the infrastructure to be used to solve instances, that are out of reach for other (sequential) state-of-the-art solvers.

As part of our future work, we plan to merge BiqBin, which uses the bundle method as the computational core, with MADAM introduced in [19], where an alternating direction method of multipliers is developed for solving hard semidefinite relaxations. Additionally, the MPI communication among the processes can be further simplified to enable efficient scaling over larger HPC systems. This can be achieved by employing a one-sided MPI communication. We will also further improve the performance of BiqBin by enhancing an early stopping condition for the case of infeasibility of (BQP).

However, to go much further and solve much larger instances of (BQP) or even more general classes of discrete optimization problems, we need to enhance the existing approach with new advances from polynomial optimization, artificial intelligence and problem specific theoretical findings.

Acknowledgments

Fruitful discussions with Leon Kos, Franz Rendl, and Alen Vegi Kalamar helped a lot to resolve several challenges during the project. Finally, we thank two anonymous referees for improving an earlier version of this work.

Footnotes

http://www.biqbin.eu.

http://plato.asu.edu/ftp/milp.html.

http://plato.asu.edu/sub/nlores.html#QP-problem.

⁴

http://qplib.zib.de/index.html.

⁵

http://plato.asu.edu/ftp/qplib.html.

⁶

http://biqmac.aau.at/.

⁷

https://biqcrunch.lipn.univ-paris13.fr/BiqCrunch/solver/.

⁸

http://biqbin.eu/Home/Features.

⁹

http://biqbin.eu/Home/BenchmarkInstances.

References

[1]

Francisco Barahona, Michael Jünger, and Gerhard Reinelt. 1989. Experiments in quadratic 0-1 programming. Math. Program. Ser. A 44, 2 (1989), 127–137. MHPGA4

Abstract

1 Introduction

1.1 Motivation

1.2 Notation

1.3 Linearly Constrained Binary Quadratic Problem and the Max-Cut Problem

1.4 Our Contribution

2 Related Work

3 Semidefinite Programming Relaxations for the Max-Cut

4 From Binary Quadratic Problems with Linear Constraints to the Max-Cut Problem

5 BiqBin Solver

5.1 (Sequential) Branch-and-bound Algorithm

5.2 Parallelization of Branch-and-bound

6 BiqBin Web Application

7 Numerical Results

7.1 Four Special Families of (BQP) Instances Used for Numerical Tests

7.2 Comparison of Sequential Algorithms

7.3 Scaling Properties of BiqBin

7.4 Solving Large Instances with Parallel BiqBin

8 Conclusions

Acknowledgments

Footnotes

References

Cited By

Index Terms

Recommendations

A finite branch-and-bound algorithm for nonconvex quadratic programming via semidefinite relaxations

Exact quadratic convex reformulations of mixed-integer quadratically constrained problems

Globally solving box-constrained nonconvex quadratic programs with semidefinite-based finite branch-and-bound

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

HTML Format

Get Access

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations