Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

BiqBin: A Parallel Branch-and-bound Solver for Binary Quadratic Problems with Linear Constraints

Published: 19 July 2022 Publication History

Abstract

We present BiqBin, an exact solver for linearly constrained binary quadratic problems. Our approach is based on an exact penalty method to first efficiently transform the original problem into an instance of Max-Cut, and then to solve the Max-Cut problem by a branch-and-bound algorithm. All the main ingredients are carefully developed using new semidefinite programming relaxations obtained by strengthening the existing relaxations with a set of hypermetric inequalities, applying the bundle method as the bounding routine and using new strategies for exploring the branch-and-bound tree.
Furthermore, an efficient C implementation of a sequential and a parallel branch-and-bound algorithm is presented. The latter is based on a load coordinator-worker scheme using MPI for multi-node parallelization and is evaluated on a high-performance computer.
The new solver is benchmarked against BiqCrunch, GUROBI, and SCIP on four families of (linearly constrained) binary quadratic problems. Numerical results demonstrate that BiqBin is a highly competitive solver. The serial version outperforms the other three solvers on the majority of the benchmark instances. We also evaluate the parallel solver and show that it has good scaling properties. The general audience can use it as an on-line service available at http://www.biqbin.eu.

1 Introduction

1.1 Motivation

With the advent of data driven economy, a close cooperation between data science and mathematical optimization became a crucial driver for developing (i) new data science methods that are capable to reveal new knowledge hidden in the data and (ii) new algorithms for implementing these data science methods.
Studying how to group data instances according to their inner similarity, i.e., data clustering analysis [21, Ch.10.3], is a traditional and one of the most studied problems in data science. When we know that the data should be decomposed into a fixed number of groups, say K, and we want to find these K groups, we have the K-clustering problem, which is NP-hard [12].
Another interesting problem from data science, similar to the K-clustering problem, is the problem where we want to find a set of k vertices in a simple undirected weighted graph G such that the sum of the weights on the edges connecting these k vertices is maximum. This problem is called the densest k-subgraph problem [4] and can be formulated as an optimization problem with a quadratic objective function, one linear constraint, and binary ( \(0/1\) ) decision variables:
\begin{align} \max ~~ & \frac{1}{2}\mathbf {x}^\top W \mathbf {x}\nonumber \nonumber\\ \mathrm{s.t.}~~ & \mathbf {e}^\top \mathbf {x}= k \\ & \mathbf {x}\in \lbrace 0,1\rbrace ^{n}.\nonumber \nonumber \end{align}
(DkS)
Here, W is the weighted adjacency matrix of the underlying graph and \(\mathbf {e}\) is the all one vector.
The densest k-subgraph problem can be seen as a generalization of the Max-Clique problem [41] and also as a special case of the quadratic knapsack problem [34]. Even though it is easy to understand and it appears to be simple at first sight, it is not. In fact, it is one of the NP-hard problems for which even the approximability is not well understood. There is a huge gap between the best approximation algorithm and known inapproximability results [3].
The problems mentioned above are all special instances of binary quadratic problems with linear constraints, which are formally defined in Section 1.3. In practice, we typically solve such problems only approximately using heuristic algorithms. For example, in [6, 40, 42] one can find a vast number of such heuristics for the case of data clustering. However, to evaluate the performance of the heuristic algorithms, we still need the ground truth, i.e., the optimum solutions of the original problems. Therefore, solving (to optimality) constrained binary quadratic problems, like (DkS), on as large as possible problem instances is highly needed.
Many other optimization problems with clear real-life applications can also be represented in a similar way as a non-convex optimization problem in binary variables subject to linear constraints, e.g., the quadratic assignment problem, the stable set problem, and the graph coloring problem (see, e.g., [37] for definitions). Again, appropriate (meta)heuristic algorithms are used to approximately solve instances of big size, while optimum solutions for test instances of small or medium size are still needed to evaluate these heuristics.

1.2 Notation

We will use the following notation: \(\mathcal {S}_n\) denotes the space of symmetric \(n\times n\) matrices, \({\mathbb {R}}^n\) is the n-dimensional space of all n-tuples of real numbers and \({\mathbb {R}}^{m\times n}\) is the space of all \(m\times n\) real matrices. By \(\mathbf {e}_n\) we denote the vector of length n with all entries equal to one. Usually, its dimension is clear from the context, so we write only \(\mathbf {e}\) . Similarly, \(\mathbf {0}\) denotes the zero vector or the zero matrix. Given \(\mathbf {x}\in {\mathbb {R}}^n\) , \(\operatorname{Diag}(\mathbf {x})\) is the \(n \times n\) diagonal matrix with \(\mathbf {x}\) on its diagonal; and \(\operatorname{diag}(X)\) is the vector with the diagonal elements of matrix X. By \(\operatorname{rk}(X)\) we denote the rank of matrix X.
For an optimization problem \(\bf {P}\) , we refer to its optimum value by \({\mathrm{OPT}}_{\bf {P}}\) . For the SDP relaxations of (Max-Cut) we use slightly different notations for their optimum values. For example, the optimum value of the SDP relaxation (MCSDP) is denoted by \({\mathrm{OPT}}_{\bf {SDP}}\) .

1.3 Linearly Constrained Binary Quadratic Problem and the Max-Cut Problem

The central problem that we consider in the paper is the linearly constrained binary quadratic problem (BQP), which can be formulated as follows:
\begin{align} \begin{split} \min ~~ &\mathbf {x}^\top F \mathbf {x}+ \mathbf {c}^\top \mathbf {x}\\ \mathrm{s.t.}~~ &A\mathbf {x}= \mathbf {b},~~ \mathbf {x}\in \lbrace 0,1\rbrace ^{n}, \end{split} \end{align}
(BQP)
with given data \(F \in \mathcal {S}_n\) , \(c\in {\mathbb {R}}^{n}\) , \(A\in {\mathbb {R}}^{m\times n}\) , and \(\mathbf {b}\in {\mathbb {R}}^{m}\) . We assume to have only linear equality constraints, in case of inequalities we may add a slack variable and decompose it into a weighted sum of boolean variables. This problem encompasses also the densest k-subgraph problem and is thus an NP-hard problem.
The mathematical optimization community is interested in BQP as a problem per se. In this context, the main challenge is developing new algorithms for it by (i) exploring and exploiting new properties of the problem, (ii) using new results from other (non-optimization) areas (such as algebraic geometry), (iii) making new combinations of existing algorithms with best practical or theoretical performance, and (iv) exploiting best available high-performance hardware and software.
Lasserre [27] has proved that any instance of (BQP) can be transformed into an instance of the Max-Cut problem, which is an NP-hard optimization problem [12, 22] on graphs. This problem is among the most studied combinatorial optimization problems, it has connections to various fields of discrete mathematics and models a wide range of applications. The transformation is based on an exact penalty approach, that was further explored and advanced in a recent paper of two co-authors of this paper [16].
The Max-Cut problem can be defined as follows. Suppose a weighted undirected graph \(G=(V,E)\) is given, where V is the set of vertices, E is the set of edges, and each edge \(e\in E\) has the weight \(w_e\in {\mathbb {R}}\) . The Max-Cut problem asks to find a partition of V into two parts \((S, V\backslash S)\) such that the sum of the weights of the edges having one endpoint in S and the other one in \(V\backslash S\) is maximized.
If \(W=(w_{ij})\) is the weighted adjacency matrix of G, i.e., \(w_{ij} = w_{ji} = w_e\) for \(e=\lbrace i,j\rbrace \in E\) , and the Laplacian matrix of the graph associated with W is
\begin{equation*} L = \operatorname{Diag}(W\mathbf {e}) - W, \end{equation*}
(Max-Cut)
then computing the maximum cut amounts to solving the following binary quadratic problem in variables from \(\lbrace - 1,1\rbrace\) :
\begin{equation*} {\mathrm{OPT}}_{\bf {Max-Cut}}~=~\max \left\lbrace \frac{1}{4}\mathrm{x}^\top L\mathrm{x} \mid \mathrm{x}\in \lbrace - 1,1\rbrace ^{|{V}|} \right\rbrace . \end{equation*}
However, it is straightforward to transform it into a \(0/1\) problem by using the following simple linear substitution \(\mathbf {x}= 2\mathbf {z}- \mathbf {e}\) and by observing that
\begin{equation*} \mathbf {x}^\top L\mathbf {x}= 4\mathbf {z}^\top L\mathbf {z}-4\mathbf {z}^\top L\mathbf {e}+\mathbf {e}^\top L\mathbf {e}= 4\mathbf {z}^\top L\mathbf {z}, \end{equation*}
since \(L\mathbf {e}=0\) . Hence, we obtain the equivalent formulation of the Max-Cut, which has a structure of (BQP) (with no linear constraints):
\begin{equation*} {\mathrm{OPT}}_{\bf {Max-Cut}}~=~\max \left\lbrace \mathbf {z}^\top L\mathbf {z}\mid \mathbf {z}\in \lbrace 0,1\rbrace ^{|{V}|} \right\rbrace . \end{equation*}
The reason why the community is interested in reformulating (BQP) into (Max-Cut) is related to the fact that for the latter problem, there exists a wide range of approximate and exact methods and solvers and we want to employ them in solving the former problem. We are particularly interested in the methods computing global optima of both problems.

1.4 Our Contribution

The main contribution of this paper is the BiqBin solver for BQP, which outperforms the existing solvers on some special cases of BQP. The core of the BiqBin solver is the exact penalty algorithm EXPEDIS [16], meaning that we first transform every instance of (BQP) to a corresponding instance of (Max-Cut) and then solve the resulting instance of () BiqBin is coded in C and non-trivially improves existing solvers by
introducing a strengthened bounding routine based on hypermetric inequalities;
implementing a parallel branch-and-bound algorithm to solve (Max-Cut) instances using a Message Passing Interface library (MPI);
providing a web-based BiqBin service1 enabling researchers to submit their instances of (BQP) to one of the Slovenian Tier-2 supercomputers to be solved.
Additionally, we demonstrate practical efficiency of BiqBin by providing an extensive benchmarking with BiqCrunch [25], GUROBI [14], and SCIP [11] on the list of four special cases of BQP, including the Max-Cut problem, the unconstrained binary quadratic problem, the densest k-subgraph problem and randomly generated binary quadratic problems with linear constraints.
We can observe that BiqBin is performing very well on the instances with a small number of linear constraints, while on the instances with many linear constraints, the solvers which directly exploit the structure of the constraints are very competitive. Additionally, as we show on the benchmark instances, BiqBin is also highly scalable.

2 Related Work

Starting in the previous century, tackling the Max-Cut problem computationally attractedthe attention of many researchers. A number of ideas for solving the problem have been proposed in the literature; fast heuristic algorithms on one side and exact algorithms on the other side have been developed over the last decades. We mention here only a few concepts of such exact methods. A first success on solving (Max-Cut) instances to optimality appeared in the eighties, when linear programming based methods were implemented [1] and further developed, in particular, in the context of solving problems arising in physics [28]. These methods are specifically successful in the cases when the underlying graphs are sparse.
Other methods use a preprocessing phase where they try to fix some variables (based on the gradient) [32, 33] or to construct a convex problem having the same optimal solution. For such convex problems efficient solvers exist, the most prominent ones being the commercial solvers Gurobi [14] and CPLEX [20].
At the beginning of this century, methods based on semidefinite programming and the branch-and-bound (B&B) algorithm were proposed [17, 24, 25, 35]. One of them, the BiqMac solver [35], was the starting point for our research, since it is one of the best performing solvers for (Max-Cut) instances and can be employed to solve (BQP) instances as follows from the Lassere’s result [27]. Another one, the BiqCrunch [25] solver, we use to benchmark our results.
By setting \(F=0\) in (BQP), we obtain a linear optimization problem with binary variables. Such problems have been investigated for many decades, leading to enormous progress in their practical solutions. According to Hans Mittelmann’s web page,2 the state-of-the-art solvers (the web page contains results for the solvers CBC, GLPK, LP_SOLVE, MATLAB, SAS-OR, and (F)SCIP) can solve such problems to optimality within a few hours if the number of binary variables is up to 50.000. Moreover, they can also handle problems with several hundred thousand binary variables if sufficient structure is provided (see also the list of “easy” cases on the MIPLIB benchmark page [30]).
In the case of a (non-convex) quadratic objective function, the state-of-the-art optimization techniques perform weaker than those designed for linear problems. There are many solvers that can solve (BQP) instances: Mittelmann included in his decision tree for optimization software3 several solvers which use various optimization techniques to compute optimum solutions, such as the branch-and-cut algorithm, branch-and-bound, lift-and-project, convex reformulation combined with some first and second order methods, etc. He performed an extensive benchmarking of several solvers (BARON, (F)SCIP, ANTIGONE, MINOTAUR, OCTERACT, GUROBI) on instances from QPLIB4 \(^{,}\) 5 [10]. Mittelmann showed that these solvers can solve within one hour between 5% (MINOTAUR) and 63% (GUROBI) of the 128 benchmarking instances. These benchmarking problems have small to medium size: they mostly contain up to a few hundred binary variables with the largest instance having 8,904 binary variables. Large instances typically share some important structural properties, which make them solvable at least for some solvers.
The solvers mentioned above are available under various software licences. Most of them are freely available to all researchers for academic purposes upon registration and verification. If one does not have a strong local machine or does not want to bother with local installations, they can submit the problem instance to the web portals where some of the solvers are installed; such services are available, e.g., for NEOS [9], BiqMac,6 and BiqCrunch.7 While NEOS partially runs on fast supercomputers, BiqMac and BiqCrunch run on a small cluster and a strong single machine with multicore processors, respectively. The latter two solvers utilize B&B algorithms, but they do not perform any parallelization (a first trial to parallelize BiqMac was done in [38]). They are available on-line (more precisely, the users can submit their instance of the problem on-line) but they are running on a hardware, which cannot be compared to the state-of-the-art supercomputers.

3 Semidefinite Programming Relaxations for the Max-Cut

In the subsequent sections, we make use of tools from semidefinite programming. In order to make this paper self-contained, we recall here some definitions and algorithms.
As mentioned in Section 1.3, the Max-Cut problem can be formulated as
\begin{equation*} \max \left\lbrace \frac{1}{4}\mathbf {x}^\top L\mathbf {x}\mid \mathbf {x}\in \lbrace - 1,1\rbrace ^{|{V}|} \right\rbrace . \end{equation*}
Observe that for any \(\mathbf {x}\in \lbrace - 1,1\rbrace ^{|{V}|}\) , the matrix \(X = \mathbf {x}\mathbf {x}^\top\) is positive semidefinite and its diagonal is equal to the vector of all ones. Using this transformation and the property \(\mathbf {x}^\top L\mathbf {x}= \langle L, \mathbf {x}\mathbf {x}^\top \rangle\) , we can re-write (Max-Cut) as
\begin{equation*} \max \left\lbrace \frac{1}{4}\langle L, X \rangle \mid \operatorname{diag}(X) = \mathbf {e}, \ X \succcurlyeq 0, \ \operatorname{rk}(X) = 1 \right\rbrace . \end{equation*}
By dropping the rank-one constraint, we obtain the basic SDP relaxation
\begin{equation} {\mathrm{OPT}}_{\bf {SDP}}~=~\max \left\lbrace \frac{1}{4}\langle L, X \rangle \mid \ X \succcurlyeq 0, \ \operatorname{diag}(X) = \mathbf {e}\right\rbrace . \end{equation}
(MCSDP)
It is well-known that the bound \({\mathrm{OPT}}_{\bf {SDP}}\) is not strong enough to be successfully used within the B&B framework even for solving the Max-Cut problem to optimality on graphs with only 50 nodes. We overcome this problem by adding additional equality or inequality constraints, known as cutting planes, to strengthen the bound and consequently decrease the size of the B&B tree.
Similarly as in BiqMac and BiqCrunch, we use triangle inequalities. Furthermore, we strengthen the bound by using higher order k-gonal inequalities, which belong to the family of hypermetric inequalities [7]. They can be introduced as follows. Suppose that \(\mathbf {b}\) is an integer vector for which \(\mathbf {e}^\top \mathbf {b}\) is odd. This implies that \(\vert \mathbf {x}^\top \mathbf {b}\vert \ge 1 \textrm { for all } \mathbf {x}\in \lbrace - 1,1\rbrace ^n\) and therefore \(\langle \mathbf {b}\mathbf {b}^\top , \mathbf {x}\mathbf {x}^\top \rangle \ge 1.\) The hypermetric inequalities are the following set of linear inequalities, which can be applied to any symmetric matrix X of order n, and are valid for any X from the convex hull of rank-one matrices \(\mathbf {x}\mathbf {x}^\top\) , for \(\mathbf {x}\in \lbrace - 1,1\rbrace ^n\) :
\begin{equation*} \lbrace \langle \mathbf {b}\mathbf {b}^\top , X \rangle \ge 1\ \mid \ \mathbf {e}^\top \mathbf {b}\textrm { odd}, \ \mathbf {b}\textrm { integer}\rbrace . \end{equation*}
In this paper, we consider the subclasses of hypermetric inequalities generated by choosing \(\mathbf {b}\) with \(b_i \in \lbrace - 1,0,1\rbrace\) and by fixing the number of non-zero entries in \(\mathbf {b}\) to 3, 5, or 7. In this cases, we obtain triangle, pentagonal, and heptagonal inequalities, respectively. The latter two are also called 5-clique and 7-clique inequalities, respectively. More specifically, the triangle inequalities are defined as
\begin{align*} -X_{ij}-X_{ik}-X_{jk} \le 1, \\ -X_{ij}+X_{ik}+X_{jk} \le 1, \\ X_{ij}-X_{ik}+X_{jk} \le 1, \\ X_{ij}+X_{ik}-X_{jk} \le 1, \end{align*}
\(\forall\) distinct i, j, k. The first inequality is actually \(b^\top Xb = X_{ii} + X_{jj} + X_{kk} + 2\left(X_{ij} + X_{ik}+ X_{jk}\right) \ge 1\) , for the case \(\mathbf {b}_i = \mathbf {b}_j = \mathbf {b}_k = 1\) , by using the implicit constraint \(\operatorname{diag}(X) = \mathbf {e}\) . The other three inequalities from above, for selected \(i,j,k\) , follow by considering all the other combinations of signs of entries of \(\mathbf {b}_i,\mathbf {b}_j,\mathbf {b}_k\) (trivially, \(\mathbf {b}\) and \(-\mathbf {b}\) yield the same inequality). By introducing appropriate linear operators on the vector space of symmetric matrices and by using the fact that the matrices under consideration have diagonal entries equal to one, we write such inequalities as \(\mathcal {A}_3(X)\le \mathbf {e}\) , \(\mathcal {A}_5(X)\le \mathbf {e}\) and \(\mathcal {A}_7(X)\le \mathbf {e}\) , respectively.
Similarly, we denote by \(\mathcal {A}_{\bf {HYP}}(X) \le \mathbf {e}\) a set containing triangle, pentagonal, and heptagonal inequalities and call it (by a slight abuse of notation) hypermetric inequalities. We infer the following strengthening of (MCSDP):
\begin{equation} {\mathrm{OPT}}_{\bf {HYP}}~=~\max \left\lbrace \langle L, X \rangle \mid \ X \succcurlyeq 0, \ \operatorname{diag}(X) = \mathbf {e}, \ \mathcal {A}_{\bf {HYP}}(X)\le \mathbf {e}\right\rbrace . \end{equation}
(MCHYP)
We now describe the routine for separating the proposed cutting planes. There are \(4{n \choose 3}\) triangle inequalities; for problem sizes that we are interested in, we can enumerate them and identify the most violated ones.
Due to a large number of higher k-gonal inequalities, separation of pentagonal and heptagonal inequalities is done heuristically. Let \(\mathbf {e}= (1, 1, 1, 1, 1)^\top\) and define \(H_1 = \mathbf {e}\mathbf {e}^\top\) . Suppose we are searching for the pentagonal inequality (of the type where all nonzero entries of \(\mathbf {b}\) are ones) with large violation, i.e., for a given matrix X, we are looking for a 5-permutation p of n vertices such that the value \(\langle H_1, X(p,p) \rangle\) is minimal. By \(X(p,p)\) we denote the submatrix obtained by taking rows and columns contained in the permutation p. These numbers represent the indices of nonzero entries of \(\mathbf {b}\) determining the pentagonal inequality. Let H be an \(n \times n\) matrix having \(H_1\) as the leading principal submatrix of order 5 and all other elements are set to zero. Then the problem can be reformulated as a quadratic assignment problem of the form
\begin{align*} \min \quad & \langle H, PXP^\top \rangle \\ \mathrm{s.t.}\quad & P \in \Pi , \end{align*}
where \(\Pi\) is the set of all \(n \times n\) permutation matrices. This problem is approximately solved by using simulated annealing to obtain a pentagonal inequality with potentially large violation. By replacing the matrix \(H_1\) with rank-one matrices \(H_2 = \widehat{\mathbf {e}}\widehat{\mathbf {e}}^\top\) or \(H_3 = \widetilde{\mathbf {e}}\widetilde{\mathbf {e}}^\top\) , where \(\widehat{\mathbf {e}} = (- 1, 1, 1, 1, 1)\) and \(\widetilde{\mathbf {e}} = (- 1, - 1, 1, 1, 1)\) , different types of pentagonal inequalities are found. The same idea is applied for separating strongly violated heptagonal inequalities.
To sum up, in BiqBin, we first iteratively identify a subset of the promising cutting planes. Using a current approximate solution X, we find the promising hypermetric inequalities \(\mathcal {A}^{\prime }_{\bf {HYP}}(X)\le \mathbf {e}\) and solve ((MCHYP)) by using only these inequalities. Specifically, during the separation routine, we add \(10\cdot n\) triangle inequalities, 300 pentagonal inequalities, and 200 heptagonal inequalities.
Note that for convenience, we sometimes use instead of the cut vector \(\mathbf {x}\in \lbrace - 1,1\rbrace ^n\) the vector \(\begin{pmatrix} 1 \\ \mathbf {x}\end{pmatrix} \in \lbrace - 1,1\rbrace ^{n+1}\) to derive the SDP relaxation. This has the advantage that the values of the vector \(\mathbf {x}\) are given in the first row and column of the matrix
\begin{equation*} \begin{pmatrix} 1 \\ \mathbf {x}\end{pmatrix} \begin{pmatrix} 1\\ \mathbf {x}\end{pmatrix}^\top . \end{equation*}

4 From Binary Quadratic Problems with Linear Constraints to the Max-Cut Problem

In this section, we recall how to transform binary quadratic problems with linear constraints into the Max-Cut problem.
Exact penalty methods for solving constrained optimization problems construct a function, for which the (unconstrained) minimizers are also optimal solutions of the constrained problem, see [8] for an overview of classical results on this topic. Gusmeroli and Wiegele [16] introduced an exact penalty algorithm over discrete sets called EXPEDIS. Their work follows and improves the idea of Lasserre, see [27], and reformulates a linearly constrained binary quadratic problem as a Max-Cut instance.
The input of the EXPEDIS algorithm is an instance of (BQP). We consider its version with the binary variables being from \(\lbrace -1,1\rbrace\) .
In order to simplify notation, we define the sets of feasible and infeasible binary vectors as \(\Delta\) and \(\Delta ^c\) , respectively; i.e.,
\begin{equation*} \Delta = \left\lbrace \mathbf {x}\in \lbrace -1,1\rbrace ^{n} | A\mathbf {x}=\mathbf {b}\right\rbrace \quad \textrm {and} \quad \Delta ^c = \lbrace -1,1\rbrace ^{n} \setminus \Delta . \end{equation*}
Given a sufficiently large penalty parameter, denoted \(\sigma\) , we add a quadratic term to the objective function and obtain
\begin{equation*} h(\mathbf {x}) = \mathbf {x}^\top F\mathbf {x}+ \mathbf {c}^\top \mathbf {x}+ \sigma \Vert A\mathbf {x}-\mathbf {b}\Vert ^{2}. \end{equation*}
By defining the matrix
\begin{equation*} Q = \begin{bmatrix} \sigma b^\top b & \left(c - 2\sigma A^\top b \right)^\top /2 \\ \left(c - 2\sigma A^\top b\right)/2 & F + \sigma A^\top A \end{bmatrix} \end{equation*}
the function \(h(\mathbf {x})\) can alternatively be written as \(\bar{\mathbf {x}}^\top Q \bar{\mathbf {x}}\) , in which case we consider the following unconstrained binary optimization problem
\begin{align*} \begin{split} h^* = \min ~~ & \bar{\mathbf {x}}^\top Q \bar{\mathbf {x}} \\ \mathrm{s.t.}~~ & \bar{\mathbf {x}} \in \left\lbrace - 1,1 \right\rbrace ^{n+1} \\ & \bar{\mathbf {x}}_0 = 1, \end{split} \end{align*}
which is a Max-Cut problem on a graph with \(n+1\) vertices, i.e., \(\vert V \vert = \lbrace 0,1,\ldots , n\rbrace\) ; see [16] for more details.
Theorem 1 ([16, Theorem 2]).
Consider an instance of (BQP) with optimal value \(f^*\) . Let \(\rho\) be a threshold parameter and let \(\sigma\) be a penalty parameter such that
(1)
(BQP) has no feasible solution with value greater than \(\rho\) ; and
(2)
for any \(\mathbf {x}\) in the set \(\Delta ^c\) , we have \(h(\mathbf {x}) \gt \rho\) .
If \(f^* \lt \infty\) then the optimal values of the constrained and the unconstrained problem coincide, i.e., \(h^* = f^*\) . Moreover, this instance is infeasible if and only if \(h^* \gt \rho\) .
The choice of parameters used in [16] is
\begin{align*} \rho &= \tilde{u}\\ \sigma &= \tilde{u} - \tilde{\ell } + \epsilon , \end{align*}
where
\begin{align*} \tilde{\ell } &= \min \left\lbrace c^\top \mathbf {x}+ \langle F, X \rangle \mid \operatorname{diag}(X)=\mathbf {e}, \ X-\mathbf {x}\mathbf {x}^\top \succcurlyeq 0, \mathcal {A}_{3}(X)\le \mathbf {e}, \mathcal {A}_{5}(X)\le \mathbf {e}\right\rbrace \\ \tilde{u} &= \max \left\lbrace c^\top \mathbf {x}+ \langle F, X \rangle \mid \operatorname{diag}(X)=\mathbf {e}, \ X-\mathbf {x}\mathbf {x}^\top \succcurlyeq 0, \ [b, - A ] \cdot \begin{bmatrix} 1 & \mathbf {x}^\top \\ \mathbf {x}& X \end{bmatrix} = \mathbf {0}\right\rbrace . \end{align*}
In this way, \(\rho\) and \(\sigma\) fulfill the assumptions of Theorem 1, but are kept “sufficiently” small in order to avoid numerical difficulties when computing the maximum cut. For a more detailed study on the choice of the parameters \(\rho\) and \(\sigma\) we refer to [16].
In [16], further enhancements of the choice of the penalty parameter are discussed, e.g., an update is made as soon as a feasible solution of (BQP) is found, or an early stopping condition is added when infeasibility of (BQP) is detected.

5 BiqBin Solver

5.1 (Sequential) Branch-and-bound Algorithm

The BiqBin solver is a Max-Cut based solver for (BQP) instances that solves their reformulation to (Max-Cut) instances using a B&B algorithm.
The main ingredients of BiqBin are:
(1)
the procedure for the exact penalty reformulation of (BQP) instances into (Max-Cut) instances;
(2)
the bounding procedure, which provides for each instance of a problem (also for smaller subproblems obtained via branching) an upper bound on the optimum value;
(3)
the branching procedure, which splits the current problem into more problems of smaller dimensions by fixing some variables;
(4)
a heuristic for generating feasible solutions providing a lower bound.
The overall performance of the algorithm is determined by the quality of the lower and upper bounds, computed in each B&B node, by the computational efficiency of computing these bounds, and by the strategy of exploring the B&B tree. The main reason why we cannot solve large instances on personal computers is that the B&B tree grows too big, i.e., the pruning is too slow and a single processor is not capable to explore all the generated nodes in the tree.
Exact penalty reformulation. The procedure which reformulates every instance of (BQP) into an instance of (Max-Cut) is described in Section 4.
Bounding procedure. The starting point of the algorithm is the strengthened SDP relaxation ((MCHYP)). Since the number of inequalities is too large to solve this SDP by standard solvers, we will use a bundle method to find an approximate solution to the partial Lagrangian dual. By dualizing only the inequality constraints, we obtain the nonsmooth convex partial dual function
\begin{equation*} f(\gamma) = \max _{\textrm {diag}(X) \ = \ \mathbf {e},\\ X \succcurlyeq 0} \mathcal {L}(X,\gamma) = \mathbf {e}^\top \mathbf {\gamma }+ \max _{\textrm {diag}(X) \ = \ \mathbf {e},\\ X \succcurlyeq 0}\langle L - \mathcal {A}_{\bf {HYP}}^\top (\gamma), X \rangle , \end{equation*}
where \(\gamma\) are the nonnegative dual variables associated with the constraints \(\mathcal {A}_{\bf {HYP}}(X)\le \mathbf {e}\) . Evaluating the dual function \(f(\gamma)\) and computing the subgradient amounts to solving an SDP of the form (MCSDP), which can be efficiently computed using an interior-point method tailored for this problem. It provides us with the matching pair \((X_{\gamma }, \gamma)\) such that \(f(\gamma) = \mathcal {L}(X_{\gamma },\gamma)\) . Moreover, the subgradient of f at \(\gamma\) is given by \(\partial f(\gamma) = \mathbf {e}- \mathcal {A}_{\bf {HYP}}(X_{\gamma }).\) For obtaining an approximate minimizer of the dual problem
\begin{align} \begin{split} \min \quad & f(\gamma) \\ \mathrm{s.t.}\quad & \gamma \ge 0, \end{split} \end{align}
(1)
we use the bundle method [23, 35]. Let the current iterate be \(\hat{\gamma }\) . Suppose we have evaluated f at \(k \ge 1\) points \(\gamma _1,\ldots ,\gamma _k\) with matching pairs \(X_1,\ldots ,X_k\) and subgradients \(\mathbf {e}- \mathcal {A}_{\bf {HYP}}(X_i)\) for \(i\in \lbrace 1,\ldots ,k\rbrace\) . The bundle method combines the following two ideas:
(1)
the function \(f(\gamma)\) is approximated by
\begin{align*} f_{\mathrm{appr}}(\gamma) &= \max \lbrace \mathcal {L}(X,\gamma) \mid X \in \mathrm{conv}(X_1, \ldots , X_k)\rbrace \\ &= \max _{\lambda \ge 0, \ \mathbf {e}^\top \lambda = 1} \mathbf {e}^\top \gamma + \langle L-\mathcal {A}_{\bf {HYP}}^\top (\gamma),\mathcal {X}\lambda \rangle , \end{align*}
where the bundle of matrices \(\mathcal {X} = (X_1,\ldots ,X_k)\) is used to construct a minorant \(f_{\mathrm{appr}}\) and \(\mathcal {X}\lambda =\sum _i \lambda _i X_i\) ;
(2)
the proximal point idea, which penalizes the displacement from the current best point \(\hat{\gamma }\) with a quadratic regularization term proportional to \(\Vert \gamma - \hat{\gamma }\Vert ^2\) .
In summary, the bundle method finds a new trial point by minimizing, for some prescribed parameter \(t \gt 0\) , the function
\begin{equation*} f_{\mathrm{appr}}(\gamma) - \frac{1}{2t}\Vert \gamma - \hat{\gamma } \Vert ^2 \end{equation*}
over the nonnegative orthant. As the method terminates, we obtain the approximate minimizer of the dual function, as well as the convex weights \(\lambda \ge 0,\sum _i \lambda _i=1\) , determining the matrix \(\mathcal {X}\lambda\) .
In order to obtain a tight upper bound for an instance of () we use the cutting-plane approach, where multiple k-gonal inequalities are added and purged in the course of running the bundle algorithm. First, the optimum solution of the basic semidefinite relaxation (MCSDP) is computed using an interior-point method followed by separating a set of triangle inequalities. After doing a few bundle iterations with this set of constraints, we purge all inactive constraints. Next, invoking the separation routine described in Section 3, new violated k-gonal inequalities are added. The problem with the new set of constraints is solved and the process is iterated as long as there is a significant decrease of the upper bound.
Since the primal solution matrix X is not available, we use the same idea as in BiqMac to purge some inequality constraints. We look at the values of the corresponding dual multipliers in the vector \(\gamma\) . If the value of some dual multiplier is close to zero, this indicates that the corresponding constraint is not active and we remove it.
After each iteration of computing an upper bound and separating triangle inequalities, higher order k-gonal inequalities are added in order to further decrease the bound. We monitor the maximum violation of triangle inequalities \(r_{\text{tri}}=\max (\mathcal {A}_3(\mathcal {X} \lambda) - \mathbf {e})\) and as soon as the number is sufficiently small, the heuristic from Section 3 is used to add some strongly violated pentagonal inequalities to the relaxation. Similarly, as the maximum violation of pentagonal inequalities \(r_{\text{pent}}\) drops below some threshold, new heptagonal inequalities are separated and added to the relaxation. In our numerical tests, we have used the thresholds \(r_{\text{tri}} \lt 0.2\) and \(r_{\text{pent}}\lt 0.4\) .
In order to improve the performance of BiqBin, we can stop the bounding routine when we detect that we will not be able to prune the current node in the B&B tree. We again borrow an idea from BiqMac. After some cutting plane iterations, we make a linear (and hence optimistic) forecast to decide whether it is worth doing more iterations. If the gap cannot be closed, we terminate the bounding routine, branch the current node, and start evaluating new subproblems. This is especially important in the parallel solver, since its efficiency depends on how quickly idle workers receive subproblems.
Furthermore, at the beginning of the algorithm, the number of bundle iterations should be small. We start with three iterations. Then this number is increased after each separation of new cutting planes, until a limit is reached. In the BiqBin solver, this limit value is set to 15. This is motivated by the fact, that in the beginning it will take a while until we have identified the right cutting planes and we do not waste time by trying hard to decrease the bound if the current set of hypermetric inequalities does not allow much progress.
Branching strategies. BiqBin uses two branching strategies which are based on the bundle of matrices obtained from the bounding routine. Once the bundle method terminates, the last column \(\mathbf {x}\) of the matrix \(\mathcal {X}\lambda\) is extracted. Due to the diagonal and positive semidefiniteness constraints on the feasible matrices of ((MCHYP)), all the entries of \(\mathbf {x}\) lie in the interval \([-1,1]\) . Similarly to BiqCrunch, the decision on which variable \(x_i\) the branching should be performed is based on the following two strategies:
(1)
difficult first: we branch on the vertex i for which the variable \(x_i\) is closest to 0;
(2)
easy first: we branch on the vertex i for which the variable \(x_i\) is furthest from 0.
In the \(0/1\) formulation, these rules are usually referred to as most-fractional and least-fractional rules, respectively. The difficult first rule is set as the default strategy for the BiqBin solver.
When branching, two new subproblems, in which the chosen branching variable is fixed accordingly, are created and the corresponding nodes are added in the B&B tree, i.e., to the priority queue of unexplored problems. Priority is based on the upper bound obtained from the bundle method. When selecting the next subproblem, a node with the worst upper bound is evaluated first.
Rounding heuristic. For generating high quality feasible solutions of the Max-Cut problem, we apply the Goemans-Williamson rounding hyperplane technique [13]. Let X be an optimal solution of some SDP relaxation of Max-Cut. By computing the Cholesky factorization \(X = V^\top V\) with column vectors \(v_i\) of V and selecting some random vector r, the cut \((S,V\backslash S)\) is obtained by setting \(S = \lbrace i \mid v_i^\top r \ge 0 \rbrace .\) Since in our case we are working with the partial Lagrangian, the information about the primal X is not available. Instead, we use a convex combination of bundle matrices \(\mathcal {X}\lambda\) as the input. The cut vector x obtained from this heuristic is then further improved by flipping the vertices and using a convex combination of \(\mathcal {X}\lambda\) and the cut matrix \(xx^\top\) . To summarize, for generating good cuts, we use the following iterative scheme:
(1)
use the Goemans-Williamson rounding hyperplane technique to generate cut vector x from \(\mathcal {X}\lambda\) ;
(2)
the cut x is locally improved by checking all possible moves of a single vertex to the opposite partition block;
(3)
by using a convex combination of \(\mathcal {X}\lambda\) and \(xx^\top\) , we bring the rounding matrix towards a good cut. With this new matrix, go to step (1) and repeat as long as one finds better cuts.
Interestingly, in most of our numerical experiments, this heuristic finds the optimum solution already in the root node.
Strategy for faster enumeration of the B&B tree. As described in Section 3, simulated annealing is used to heuristically separate k-gonal inequalities. Adding these k-gonal inequalities to the model in each B&B node is not necessary, especially if one can not prune the node. In that case all the work done and time are wasted, since for bigger graphs we usually need to reach a certain depth in the B&B tree in order to prune the nodes, even if the bounding routine produces tight bounds. It is beneficial to check (before including cutting planes when processing the node) whether there is hope to prune the node or whether it is better to branch and produce smaller subproblems. We propose the following strategy.
In the root node we compute: (i) the bound \({\mathrm{OPT}}_{\bf {SDP}}\) of the basic SDP relaxation (MCSDP), which is not strong but is quick to compute; and (ii) the bound \({\mathrm{OPT}}_{\bf {HYP}}\) by iteratively including violated triangle, pentagonal, and heptagonal inequalities. Let
\begin{equation} {\mathrm{diff}} = {\mathrm{OPT}}_{\bf {SDP}} - {\mathrm{OPT}}_{\bf {HYP}}, \end{equation}
(2)
and let \({\mathrm{LB}}\) denote the current lower bound. Then, at all other nodes, we first compute only the basic SDP bound \({\mathrm{OPT}}_{\bf {SDP}}\) . If the condition
\begin{equation} {\mathrm{OPT}}_{\bf {SDP}} \le {\mathrm{LB}} + {\mathrm{diff}} + 1 \end{equation}
(3)
is satisfied, this means, we are already close to the lower bound, and so we add cutting planes to compute the tighter bound \({\mathrm{OPT}}_{\bf {HYP}}\) in order to increase the probability of pruning the node. With this idea, we can efficiently traverse the B&B tree and only invest time into the bounding routine when it is needed. Numerical results show that overall this strategy produces more B&B nodes than necessary, but the performance of the algorithm improves; in particular, it has a positive impact on the parallel version described in Section 5.2.
Numerical illustration of the bounding procedure. When using a B&B algorithm, one has two options regarding the quality of the upper bound. On one hand, a strong upper bound can be computed by iteratively adding and purging multiple cutting planes. In this way, more work is done within a node but overall this approach produces fewer B&B nodes. On the other hand, one can efficiently compute slightly weaker upper bounds, hence the whole tree grows larger but if the time spent for evaluating each node is small it can be traversed faster. In BiqBin we use the first approach.
We take the Beasley bqp250.8 problem from the BiqMac library and plot the convergence curve for the bounding routines in the solvers BiqBin and BiqCrunch in the root nodes of the B&B trees. Figure 1 depicts the decrease of the dual function values in the course of a bound computation. We note that by adding higher order k-gonal inequalities, our bounding routine attains a tighter bound compared to BiqCrunch. Consequently, BiqBin creates a smaller B&B tree, which consists of 81 nodes, while BiqCrunch terminates after traversing 325 nodes.
Fig. 1.
Fig. 1. Comparison of the bounding procedure of BiqCrunch and BiqBin in the root node on the Beasley bqp250.8 problem. The upper bound computed with BiqCrunch is approx. \(36\,472\) , whereas with BiqBin we obtain approx. \(36\,287\) .

5.2 Parallelization of Branch-and-bound

In this section, we describe how the algorithmic ingredients from sequential B&B algorithm are combined into a parallel solver which utilizes distributive memory parallelism. This is done in a similar fashion as in the parallel B&B solver for the stable set problem [18], introduced by some of the authors of this paper.
The load coordinator–worker paradigm with distributive work pools is applied, in which the rank 0 process becomes the master process carefully managing the status of each worker (idle or busy), while different workers concurrently explore branches of the B&B tree. Each worker has its own local queue of subproblems and the work is shared when one of them becomes idle. The master node knows the status of each worker and acts as a load coordinator receiving messages and, based on their content, replying in an appropriate manner.
At the beginning of the algorithm, the load coordinator reads and broadcasts the original graph to the workers and initializes the solution. It is important that every process has the knowledge of the original graph, since construction of subproblems via branching and encoding the MPI messages is done based on this information. All the data about the B&B nodes is encoded as an MPI structure, which is used in communication between different workers in order to efficiently exchange and construct the subproblems.
Next, the master process evaluates the root node and distributes the best lower bound. After the bounding step, two new subproblems are generated, which are sent to the first two idle processes. Afterwards, its job is restricted to monitoring the status of the workers, counting the number of B&B nodes, and distributing the best solution found so far. In order to solve multiple SDP relaxations in parallel, the workers need to send and receive subproblems and keep load balance. In this way, the work is shared and the whole dynamic B&B tree is enumerated faster.
After the initialization phase, the master process waits for three types of messages sent by the workers. Firstly, if the worker’s local queue is empty, the message is sent informing the master that the process is idle and can receive further work.
Secondly, the master process receives messages regarding the load balance and sharing of work. Throughout the algorithm, the master node manages a bool array containing the statuses of the workers. They specify whether the process is active or idle. The load coordinator uses this information to reply to the requests received from the workers during the branching step, in which the process wants to share one of the newly generated subproblems with some other free worker. The load coordinator sends a message specifying which workers are free and the value of best lower bound.
And thirdly, during the execution of the algorithm, the working processes compute multiple candidates for optimum solutions. The master node is keeping track of the currently best value and the corresponding solution. When a new solution is received, the value is compared, updated if necessary, and distributed back during the communication phase.
After a worker computes lower and upper bounds, it compares these values to see whether this branch of enumeration tree can be safely pruned or further branching is needed and construction of new subproblems takes place. In the latter case, a request message is sent to the load coordinator, asking for idle processes to share one of the newly generated subproblems or subproblems left in the queue from previous branching processes. If no idle worker is available, the generated subproblems are placed in the worker’s queue and the work continues locally. Otherwise, subproblems are encoded and sent to available idle workers. This is also where the exchange of the best lower bound happens.
When all the workers become idle, the master process sends a message to finish and the algorithm terminates. The algorithms for the load coordinator and the workers are summarized in Algorithms 1 and 2.
Lastly, we explain how the parallel version benefits from the strategy using the variable “ \({\mathrm{diff}}\) ” (see Equation (6) on page ). If the number of available workers is large, we need to reach a certain depth in the B&B tree in order for the processes to receive the work. Until this happens in the algorithm, the workers are idle. To fully exploit all HPC resources available, we need a strategy for the worker processes to start evaluating the nodes as soon as possible. This is where the idea from Section 5 helps. After the load coordinator evaluates the root node, the best lower bound found and the variable \({\mathrm{diff}}\) are distributed to all the workers. When the first two idle processes evaluate the generated subproblems, typically the value of the basic SDP relaxation is such that condition (7) is not satisfied. Hence, on the first few levels of the B&B tree, the workers compute only the basic SDP bound. Then branching of new subproblems takes place and idle workers quickly receive the generated subproblems. After the bounds are such that condition (7) is valid, hypermetric inequalities are added to compute tighter bounds \({\mathrm{OPT}}_{\bf {HYP}}\) approximately. This implies that more nodes are pruned, meaning that the size of the B&B tree decreases, and thus the algorithm terminates faster.

6 BiqBin Web Application

The BiqBin solver is available as a web application. Its main purpose is to enable (registered) users to test the solver on their own problem data.
The web application contains the main information about the BiqBin project and benchmark results, which are publicly available, see Figure 2 for a schematic framework. Upon registration, a user can access the core functionalities of the application. In short, the users upload their problem data files and decide which algorithm/solver they want to use to solve the problem. The application then sends these data to an HPC, where the selected algorithm is executed against the given data, and, after some time interval, it returns the solution (optimum or approximate) back to the web application, which finally notifies the users by an email that the result is ready.
Fig. 2.
Fig. 2. A workflow diagram from users’ inputs to algorithms’ execution on HPC.
Let us describe the web application in more details. The initial step for a user is to prepare the problem data in an appropriate format, which is described in the description of each function.8 Then, the user creates a new instance, which is the entity determining the problem data. The user determines the instance’s name, data (by uploading the file with all the data), and description. In this way, the user is able to reuse the data later, e.g., for testing the performance of other solvers on the same instance.
Next, the user creates a task, which is comprised of an instance and a solver (function) the user wants to run with the instance’s data. The user also determines the task’s name, and specifies if the result and the instance’s data can be listed publicly for benchmarking purposes. When all the properties are specified, the task can either be started automatically, or the user can decide to start it later.
Starting a task means that the system sends a demand with the execution data to a specified HPC. The administrators can configure which HPC should be used for each of the available functions or even for a user. For each sent task, a job is started on a specified number of cores. Execution of a solver is being regularly monitored by saving intermediate results per every given time interval. After a specified maximum execution time, or after finding an optimum solution, the solver’s execution is stopped, and the result files are sent back to the web application together with the task’s metadata. The result is saved in the database and the user is notified by an email that the job is completed. If the task has been marked as publicly available, then a benchmark record is also created and immediately visible on Benchmark’s site.
The system is designed in such a way that additional solvers or algorithms can be easily added. The number of cores used for a single task is fixed, so that one user does not use all of the available infrastructure. However, the users, who have special roles assigned, can specify the number of cores themselves. In this way, the architecture of the system is highly extendible in all main directions. Scalability depends on the HPC facilities, but this is also considered in the web application, since tasks for every solver can connect to a selected HPC.

7 Numerical Results

In the numerical part of the paper, we benchmark the BiqBin solver with BiqCrunch [25], Gurobi [14], and SCIP [11] on four families of test instances of (BQP), which we describe in the following paragraphs. We decided for these solvers because they are well-known to be among the best solvers for these type of problems and are freely available for research purposes. All the solvers received as input the original problem (BQP), hence only BiqBin applies the reformulation into a max-cut instance. This means that Gurobi and SCIP (with SoPlex as LP solver) use all the MIP techniques that are available. We ran Gurobi, SCIP and BiqCrunch using the default parameters. When selecting the new branching variable the BiqCrunch solver uses the most-fractional rule by default. We also planned to include CPLEX in this study but we were not able to obtain the academic licence for the HPC system that we were using.
All computations were performed on the HPC system at University of Ljubljana, Faculty of mechanical engineering. We used the E5-2680 V3 (1008 hyper-cores) DP cluster, with IB QDR interconnection, 164 TB of LUSTRE storage, 4.6 TB RAM, supplemented by GPU accelerators. As a Message Passing Interface library we used Open MPI and the code is compiled against OpenBLAS and LAPACK. The BiqBin solver was run directly on the cluster, so the computational times might be slightly different compared to the times obtained by the BiqBin on-line solver.

7.1 Four Special Families of (BQP) Instances Used for Numerical Tests

Max-Cut instances. The first family of (BQP) benchmark instances consists of Max-Cut instances. We selected benchmark instances from the BiqMac library that have been considered as hard by other solvers (see [39] for more details). Furthermore, we used rudy, a graph generator written by Giovanni Rinaldi [36], to construct new (hard) Max-Cut instances with 180 nodes, similar to the ones in the BiqMac library.
Unconstrained BQP instances. The second family of (BQP) instances consists of two sets of unconstrained BQP. The instances of the first set, due to Beasley [2], have density 0.1 with values uniformly chosen from \([- 100,100]\) . The second set of unconstrained BQP instances were generated by Billionet and Elloumi [5] according to [32]. They have size \(n \in \lbrace 100,120,150,200\rbrace\) and different densities. The diagonal coefficients are in the range \([ - 100,100]\) , while the off-diagonal ones are in from \([ - 50,50 ]\) . Similar instances with size 250 and density 0.1 were taken from the BiqMac library. For more details on these instances we refer to [39].
Densest k-subgraph instances. The third family of benchmark problems consists of instances of the (DkS), described in Section 1.1. It is a binary quadratic problem with one linear constraint, also called the cardinality constraint. These problems, also known as cardinality Boolean quadratic programming problems, include a wide range of applications in telecommunications or chemistry, see [29] for further details. We solve the benchmark instances that can be found at [26]. They have different sizes \(n \in \lbrace 120,140,160\rbrace\) , densities \(d \in \lbrace 0.25, 0.50, 0.75\rbrace\) , and values for the parameter \(k \in \lbrace n/4, n/2, 3n/4\rbrace\) . For testing the parallel version of BiqBin, we also created similar instances with sizes \(n \in \lbrace 180,200\rbrace\) . They can be found at the BiqBin web page.9
Randomly generated instances of (BQP). The fourth family of instances consists of randomly generated instances with a varying number of linear constraints. These instances have size \(n = 100\) and up to 15 constraints, their description can be found in [16] and they are available at [15].

7.2 Comparison of Sequential Algorithms

In this section, we compare the sequential version of BiqBin with BiqCrunch [25], GUROBI [14], and SCIP [11] solvers on the four families of problems introduced in the previous section. We note that for the first three families of benchmark problems GUROBI and SCIP did not solve any instance within three hours, hence we report results for these problems only for BiqBin and BiqCrunch.
We present the results of the comparisons of these two solvers on the Max-Cut instances in Tables 1 and 2, the results for other unconstrained BQPs are in Tables 3 and 4, and the results for the densest k-subgraph problem are in Tables 5 and 6.
Table 1.
instance group# inst.ndensityB&B (min)B&B (max)B&B (avg)time (min)time (max)time (avg)init. gap (min)init. gap (max)init. gap (avg)
g05_100101000.5011697191.818.7951.2196.10.3%1.1%0.7%
pm1d_100101000.9921839282.440.0681.4231.12.4%8.1%4.9%
pm1s_100101000.101155.80.822.58.10.7%3.3%1.5%
pw01_100101000.101134.21.531.412.50.0%0.5%0.1%
pw05_100101000.5013289112.253.5503.9204.20.1%0.9%0.6%
pw09_100101000.9039201114.467.9349.4187.70.2%0.5%0.4%
w01_100101000.101132.41.720.75.30.1%1.4%0.3%
w05_100101000.50919978.037.4371.7167.10.9%5.8%3.2%
w09_100101000.9031243257.018.11114.1289.40.1%6.6%3.7%
Table 1. Numerical Results Obtained with Sequential BiqBin on Rudy Instances for the Max-Cut Problem
For an explanation of the columns see page 17.
Table 2.
instance group# inst.ndensityB&B (min)B&B (max)B&B (avg)time (min)time (max)time (avg)init. gap (min)init. gap (max)init. gap (avg)
g05_100101000.50331751367.044.31933.5408.80.4%1.2%0.7%
pm1d_100101000.99391073415.457.21302.3519.63.2%8.9%5.7%
pm1s_100101000.1013912.60.951.917.90.7%4.0%2.0%
pw01_100101000.1014511.01.386.922.90.0%0.9%0.4%
pw05_100101000.5049999346.0101.21155.5453.00.4%1.2%0.8%
pw09_100101000.90113589280.8170.4824.2399.20.4%0.6%0.5%
w01_100101000.101233.81.139.07.00.1%2.5%0.4%
w05_100101000.5031401210.458.9564.0304.22.3%7.5%4.7%
w09_100101000.9071667423.217.32447.4626.70.9%8.1%5.1%
Table 2. Numerical Results Obtained with BiqCrunch on Rudy Instances for the Max-Cut Problem
Table 3.
instance group# inst.ndensityB&B (min)B&B (max)B&B (avg)time (min)time (max)time (avg)init. gap (min)init. gap (max)init. gap (avg)
be100101001.00192.27.481.726.20.0%0.3%0.1%
be120.3101200.30111.05.631.511.70.0%0.0%0.0%
be120.8101200.8013510.417.9267.4123.20.0%1.0%0.3%
be150.3101500.30119130.816.3832.9286.20.0%1.8%0.4%
be150.8101500.80522368.8221.81303.6656.60.2%1.2%0.7%
be200.3102000.301959168.480.130241.96085.50.0%2.3%0.9%
be200.8102000.8071095331.6769.537449.912280.00.1%2.2%1.2%
be250102500.101153.859.6904.6388.80.0%0.3%0.0%
bqp100101000.10111.01.210.63.00.0%0.0%0.0%
bqp250102500.101819.881.79137.01147.70.0%1.3%0.1%
Table 3. Numerical Results Obtained with Sequential BiqBin on Instances for Unconstrained BQP
Table 4.
instance group# inst.ndensityB&B (min)B&B (max)B&B (avg)time (min)time (max)time (avg)init. gap (min)init. gap (max)init. gap (avg)
be100101001.001113.62.254.617.20.0%0.7%0.2%
be120.3101200.30131.41.826.36.60.0%0.1%0.0%
be120.8101200.8035717.215.2422.1126.00.0%1.6%0.7%
be150.3101500.3017925.64.5841.0295.40.0%2.3%0.7%
be150.8101500.801914162.8226.21801.5795.10.7%1.7%1.1%
be200.3102000.3032253409.070.247534.19038.70.0%2.8%1.4%
be200.8102000.80172803823.0424.566334.420303.70.2%2.7%1.5%
be250102500.101134.618.3529.9185.70.0%0.4%0.1%
bqp100101000.10111.00.83.31.30.0%0.0%0.0%
bqp250102500.10132535.820.912812.71413.50.0%1.9%0.3%
Table 4. Numerical Results Obtained with BiqCrunch on Instances for Unconstrained BQP
Table 5.
instance group# inst.ndensityB&B (min)B&B (max)B&B (avg)time (min)time (max)time (avg)init. gap (min)init. gap (max)init. gap (avg)
120_30_0.2551200.25195730.6105.9196.6143.71.9%2.7%2.2%
120_30_0.551200.503512362.6134.8617.7300.01.6%2.1%1.8%
120_30_0.7551200.7513289111.447.61241.3473.00.7%1.9%1.3%
120_60_0.2551200.2512511.813.7113.262.60.2%0.6%0.4%
120_60_0.551200.50152919.073.1146.3104.90.3%0.4%0.4%
120_60_0.7551200.751137.88.577.838.00.1%0.2%0.2%
120_90_0.2551200.25111.03.612.17.90.1%0.1%0.1%
120_90_0.551200.503199.838.5184.5103.70.0%0.2%0.1%
120_90_0.7551200.75111.04.814.810.50.0%0.0%0.0%
140_35_0.2551400.2525261125.4134.11426.4753.81.6%2.9%2.3%
140_35_0.551400.50831079383.4479.07274.12337.91.6%2.7%2.0%
140_35_0.7551400.751591089530.2987.86486.03157.01.2%1.7%1.5%
140_70_0.2551400.25313145.438.0634.3245.10.3%0.7%0.5%
140_70_0.551400.5017313168.2116.72098.51264.90.3%0.6%0.5%
140_70_0.7551400.7534919.435.5153.487.30.1%0.2%0.2%
140_105_0.2551400.25111.07.018.314.60.1%0.1%0.1%
140_105_0.551400.501237.016.5274.795.00.0%0.1%0.1%
140_105_0.7551400.751217.810.0208.478.10.0%0.1%0.0%
160_40_0.2551600.2549773298.6421.48247.52885.21.6%3.0%2.3%
160_40_0.551600.50369193074472.63510.2170137.739019.61.6%3.0%2.1%
160_40_0.7551600.7524193533213.02600.971863.727203.11.0%1.8%1.5%
160_80_0.2551600.2513255108.6123.92091.8885.80.4%0.7%0.5%
160_80_0.551600.5029627252.2272.35303.62080.10.2%0.5%0.4%
160_80_0.7551600.75338411045.044.927446.07769.60.1%0.5%0.3%
160_120_0.2551600.2516721.424.9824.4275.30.0%0.2%0.1%
160_120_0.551600.5015118.638.7611.2248.80.0%0.1%0.1%
160_120_0.7551600.7533919.060.6475.6264.20.0%0.1%0.1%
Table 5. Numerical Results Obtained with Sequential BiqBin on Instances for the Densest k-subgraph Problem
Table 6.
instance group# inst.ndensityB&B (min)B&B (max)B&B (avg)time (min)time (max)time (avg)init. gap (min)init. gap (max)init. gap (avg)
120_30_0.2551200.254311162.270.5240.9132.02.0%2.9%2.3%
120_30_0.551200.5055185108.682.5332.3182.71.7%2.2%2.0%
120_30_0.7551200.7515763240.624.7941.0312.30.8%2.0%1.4%
120_60_0.2551200.2536127.07.2128.260.20.3%0.7%0.5%
120_60_0.551200.50316344.263.5139.099.50.3%0.4%0.4%
120_60_0.7551200.7573119.412.381.347.30.1%0.3%0.2%
120_90_0.2551200.25111.03.65.64.40.0%0.1%0.1%
120_90_0.551200.503157.811.669.934.90.0%0.1%0.1%
120_90_0.7551200.75111.02.03.02.50.0%0.0%0.0%
140_35_0.2551400.2539447225.4101.21137.6574.71.8%3.1%2.5%
140_35_0.551400.501391813613.8308.13714.11297.01.8%3.0%2.2%
140_35_0.7551400.752472063985.0610.83433.21748.51.3%1.8%1.6%
140_70_0.2551400.25717969.824.7486.0200.60.3%0.8%0.6%
140_70_0.551400.5037877415.0108.72322.61203.50.3%0.7%0.5%
140_70_0.7551400.7575927.031.6152.276.30.1%0.2%0.2%
140_105_0.2551400.25111.04.15.95.00.0%0.1%0.1%
140_105_0.551400.501114.24.467.524.60.0%0.1%0.0%
140_105_0.7551400.75172.63.642.615.10.0%0.1%0.0%
160_40_0.2551600.25731215490.2256.34317.51697.01.7%3.1%2.4%
160_40_0.551600.50509256255825.81535.268554.515918.81.7%3.0%2.2%
160_40_0.7551600.75327128414636.61185.928700.911046.90.9%1.9%1.5%
160_80_0.2551600.2531485213.4126.11875.8817.20.4%0.8%0.6%
160_80_0.551600.50331705552.6156.86592.52150.10.2%0.6%0.4%
160_80_0.7551600.75380252191.817.629413.28229.80.0%0.5%0.3%
160_120_0.2551600.2513110.28.5237.081.30.0%0.2%0.1%
160_120_0.551600.501217.07.1186.362.40.0%0.1%0.0%
160_120_0.7551600.75194.67.383.540.20.0%0.0%0.0%
Table 6. Numerical Results Obtained with (Tailored Version of) BiqCrunch on Instances for the Densest k-subgraph Problem
Before we list the tables, we summarize the results by the performance profile diagrams on Figure 3. These diagrams were created using the same raw data as we summarised in Tables 3 and 7. There were large time spans for each solver and each data set, so the diagrams are quite coarse-grained. Nevertheless, we can see that on the Max-Cut, unconstrained BQP and randomly generated instances of BQP our solver BiqBin is outperforming the other solvers. On the instances of the densest k-subgraph problem BiqCrunch is performing slightly better then BiqBin.
Fig. 3.
Fig. 3. Performance profile for BiqBin and BiqCrunch on the Max-Cut, unconstrained Binary Quadratic Problem and densest k-subgraph data (Figures a, b, and c), and for solvers BiqBin, Gurobi and Scip on random data (Figure d).
Table 7.
    BiqBinGurobiSCIP
instance group# inst.ndensitysolved instancessolved instancessolved instances
100_A_0_1_b_10_F_-10_10151000.951520
100_A_0_1_b_10_F_-5_5151000.911410
100_A_0_1_b_10_F_0_10151000.9115152
100_A_0_1_b_10_F_0_5151000.8315146
100_A_0_1_b_15_F_-10_10151000.951510
100_A_0_1_b_15_F_-5_5151000.911500
100_A_0_1_b_15_F_0_10151000.9115154
100_A_0_1_b_15_F_0_5151000.8315152
100_A_0_1_b_20_F_-10_10151000.951520
100_A_0_1_b_20_F_-5_5151000.911500
100_A_0_1_b_20_F_0_10151000.9115151
100_A_0_1_b_20_F_0_5151000.8315153
100_A_0_3_b_10_F_-10_10151000.957151
100_A_0_3_b_10_F_-5_5151000.917156
100_A_0_3_b_10_F_0_10151000.9121515
100_A_0_3_b_10_F_0_5151000.8341515
100_A_0_3_b_15_F_-10_10151000.95670
100_A_0_3_b_15_F_-5_5151000.91680
100_A_0_3_b_15_F_0_10151000.912157
100_A_0_3_b_15_F_0_5151000.834156
100_A_0_3_b_20_F_-10_10151000.95500
100_A_0_3_b_20_F_-5_5151000.91810
100_A_0_3_b_20_F_0_10151000.91385
100_A_0_3_b_20_F_0_5151000.83586
100_A_-1_1_b_0_F_-1_1151000.661330
100_A_-1_1_b_0_F_-3_3151000.861120
100_A_-1_1_b_0_F_-7_7151000.931160
100_A_-3_3_b_0_F_-1_1151000.66620
100_A_-3_3_b_0_F_-3_3151000.86610
100_A_-3_3_b_0_F_-7_7151000.93420
100_A_-7_7_b_0_F_-1_1151000.66320
100_A_-7_7_b_0_F_-3_3151000.86310
100_A_-7_7_b_0_F_-7_7151000.93300
total495  29823679
Table 7. Comparison of Sequential BiqBin, Gurobi and SCIP on Randomly Generated Instances of (BQP)
To highlight the difference between the solvers BiqBin and BiqCrunch, we also provide a relative performance profile in Figure 4. It shows, for each factor \(\alpha \in \lbrace 2^{-3},2^{-2},\ldots ,2^3\rbrace\) , the percentage of instances from the Max-Cut, unconstrained BQP and densest k-subgraph families of instances, respectively, for which the computation time required by BiqCrunch was shorter than \(\alpha\) times the computation time required by BiqBin. For the Max-Cut instances, this plot reveals that BiqCrunch does not require less than 1/2 of the time needed by BiqBin for any instance (i.e., it is never more than two times faster than BiqBin), but it is faster than BiqBin on about 13% of the Max-Cut instances, i.e., it is slower than BiqBin for about 87% of the Max-Cut instances. For the densest k-subgraph instances the situation is reversed: for approx. 84% of the instances, BiqCrunch is faster than BiqBin, while for the unconstrained BQP instances, both solvers perform similarly.
Fig. 4.
Fig. 4. This relative performance profile complements the performance profiles from Figures 3(a)–3(c). For each value i on the x-axis it shows the percentages of the instances from the datasets Max-Cut, unconstrained BQP and densest k-subgraph instances, respectively, on which the computation time needed by BiqCrunch was shorter than \(\alpha = 2^i\) times the computation time needed by BiqBin.
The first three columns in each of these tables contain the data about the instances: for each group of instances, we used several instances, e.g., for the Max-Cut problem and for the unconstrained BQP, we used 10 instances for every combination of size n and density d, and for the densest k-subgraph we used five instances. The second set of columns in each table contains the minimum, the maximum, and the average number of nodes in the B&B trees generated by BiqBin and BiqCrunch, respectively. Similarly, the third set of columns reports the minimum, the maximum, and the average computing times (in seconds) needed for the instances, and the last set of columns contains the minimum, the maximum, and the average initial gaps (in %).
For the Max-Cut problem, we can see in Tables 1 and 2 that the (sequential) BiqBin is always approximately two times faster than BiqCrunch and produces much fewer nodes in the B&B tree. One reason for this lies in the fact that the initial gap in the root node (the difference between the upper and lower bound, divided by the lower bound) is smaller for BiqBin.
For the unconstrained BQP (Tables 3 and 4), we can see that both, BiqBin and BiqCrunch, perform very similar. However, for BiqBin the B&B trees are much smaller, so there is potential for improving the computing times by increasing the computational efficiency in each B&B node.
For the densest k-subgraph problem (Tables 5 and 6), we can observe that BiqCrunch is in most cases slightly better (sometimes also much better) than BiqBin. The reason for this is that BiqCrunch has a specific version adapted to solving this problem, where the basic semidefinite relaxation is re-enforced with triangle and product constraints.
Turning to the randomly generated instances, GUROBI and SCIP performed well, while the BiqCrunch solver showed very poor performance. Thus, we report in Table 7 only the results for BiqBin, GUROBI and SCIP. The randomly generated instances have different intervals from which the entries of A, b, and F are selected, thus for every combination, we have 15 instances with an increasing number of constraints, from 1 to 15. Table 7 contains in the first column the name of the family of instances, the next three columns contain data about the numbers of instances (always 15), the sizes of problem n (always 100), and the densities of the data (the number of non-zero elements in F divided by \(n^2\) ). For each of the three solvers, we report the number of instances solved by the solver in the time limit of three hours. These instances are also considered in the performance profile 3d.
From Table 7 and from the performance profile 3d it can be observed that, overall, BiqBin is outperforming all the other solvers. It solves for each family at least two out of the 15 instances, and on average it solves 60% of the instances, while GUROBI and SCIP solve 48% and 16% of the instances, respectively.
We can see that BiqBin has worst performance on instances starting with 100_A_0_3_ in the middle part of the table. We do not have an unambiguous explanation for this, but the reason is very likely that the linear constraints enable generating good cutting planes, thus helping GUROBI and SCIP. Obviously, the linear constraints for the instances starting with 100_A_0_3_ yield more efficient cutting planes which enable these two solvers to more efficiently prune the B&B tree and therefore more often finish in a given time limit. Some of the 100_A_0_3 instances are also infeasible, while the instances having in the middle _b_0_ are always feasible (they have zeros on the right hand, i.e., \(\mathbf {b}= \mathbf {0}\) , hence \(\mathbf {x}= \mathbf {0}\) is always a feasible solution). GUROBI and SCIP are solving problems in the (BQP) form, while BiqBin first reformulates the instances into the(Max-Cut) format. This suggests to improve the performance of BiqBin by enhancing an early stopping condition for the case of infeasibility using Theorem 1.

7.3 Scaling Properties of BiqBin

In this section, we demonstrate how BiqBin is scaling across the high-performance computer that we used within the MPI framework. We measured wall-times needed to solve each instance using the sequential solver and the parallel solver with \(3, 6,\ldots ,48\) CPU cores (i.e., MPI processes). Note that one of the CPU cores was always reserved for the coordinator’s tasks, while the others were used as workers.
In the “sequential” columns of Table 8, we report the computational times needed by the sequential algorithm and the times needed for the computations in the root node of the B&B tree. The former are used to compute the speed-up factor while the latter are considered as the times for the non-parallelizable parts of BiqBin and are used to compute the upper bounds for the speed-up factors.
Table 8.
   sequential3 cores6 cores12 cores24 cores48 cores
instance namendensityB&Btimeroot timeB&BtimeB&BtimeB&BtimeB&BtimeB&Btime
g05_100.11000.50697951.182.68769539.40787287.20879132.7076966.0071136.50
g05_100.31000.50603320.522.61585182.6044780.4067740.8041728.6040315.50
pm1d_100.01000.99395366.873.61433204.5050795.0036754.6035329.5035720.80
pm1d_100.11000.99839681.382.74545363.60681159.6095186.9086741.10104531.70
pm1d_100.21000.99551372.833.70493220.80621105.7044356.4052532.1036916.70
pm1d_100.41000.99557357.243.72439228.60445104.2057153.2043531.0037123.80
pw05_100.01000.50289487.868.13375247.50379138.3040580.1039147.5027943.20
pw05_100.61000.50263503.856.71183279.80183113.0021559.4031347.1027930.00
pw09_100.11000.90201349.386.00165183.90171110.3022147.4021731.7019724.60
pw09_100.51000.90137260.526.65107181.2013366.4013351.7012138.109943.60
pw09_100.71000.90177312.687.70285189.3020394.9022352.4023536.5022536.40
w05_100.41000.50199371.738.41161192.10181111.1019755.5012736.2013729.60
w05_100.51000.50117310.386.23119145.2015573.5011952.3010335.8013135.70
w05_100.81000.50151293.638.35119185.30139102.2015347.8019535.6018529.50
w09_100.11000.9012431114.088.53501610.70771299.00641159.2056184.0071160.30
w09_100.21000.90325357.428.49255179.0029390.4025350.2020727.4022935.20
w09_100.31000.90575444.208.39587245.30269111.8037166.1037541.5034733.30
w09_100.41000.90115297.508.03101202.0010583.609954.4012752.109943.50
Table 8. Numerical Results Obtained with Parallel BiqBin for Instances of the Max-Cut Problem
Pairs of columns denoted by “3 cores”, “6 cores”, etc. contain the sizes of the B&B trees and the wall-times of the parallel BiqBin using \(3, 6,\ldots\) CPU cores, respectively. For each instance inst from Table 8, we compute a vector of speed-up factors \({{\tt {speed-up} }_{{\tt inst}}}\) , defined as (see, e.g., [Equation (3.4)][31])
\begin{equation*} {{\tt {speed-up} }_{{\tt inst}}}(i)=\frac{{\tt time}_1}{{\tt time}_i}, \end{equation*}
where \({\tt time}_1\) is the time needed by the sequential solver while \({\tt time}_i\) denotes time needed by parallel solver using i CPU cores. These factors are aggregated in Table 9 for each family of instances and are depicted in Figure 5.
Fig. 5.
Fig. 5. This plot shows how the code scales compared to upper-bounds \(\mbox{UB}(n)\) , represented by the green line. The red lines represent the scaling factors from Table 9, while the blue line shows the average scaling factors (column-wise mean values of Table 9).
Table 9.
CPU cores13612244896
g051.001.763.467.3313.4424.4634.28
pm1d1.001.753.837.0813.3019.1224.70
pw051.001.883.957.1110.4813.5519.26
pw091.001.663.406.098.688.8211.87
w051.001.873.406.279.0710.2912.48
w091.001.793.786.7110.8012.8517.59
Table 9. Aggregated Scaling Factors
The g05 instances have on average best scaling factors, while instances pw09 and w05 scale worst.
Times from the “root time” column of Table 8 are used to estimate the proportion of the BiqBin solver that is non-parallelizable. We denote this estimate by s and compute it as the sum of the 6th column divided by the sum of the 5th column (result is \(s=0.0136\) ). The upper-bounds for the speed-up factors, are computed according to Amdahl’s law by formula ([31, Equation (3.6)])
\begin{equation*} \mbox{UB}(n)=\frac{1}{s+(1-s)/n}, \end{equation*}
where n corresponds to the number of CPU cores available. These upper-bounds are depicted by the green curve in Figure 5. We can see that speed-up factors are close to the theoretical bound for the problems g05 and deviate a lot for the problems pw09 and w05.
Table 8 also contains the numbers of nodes in the B&B trees generated by the parallel solver with different numbers of CPU cores. We can see that these numbers are different from the sequential solver and are varying with the number of CPU cores. The reason is mainly in the fact that parallel computations in different CPU cores are no more deterministic. Generating cutting planes in the process of solving ((MCHYP)) includes random numbers. We fix the seed for the random number generator at the beginning of the computation in each CPU core. However, when we vary the number of CPU cores, the amount and the order of the computational work in the cores change. Consequently, different cutting planes might be applied to the same B&B node and this results in slightly different bounds \({\mathrm{OPT}}_{\bf {HYP}}\) and finally in different sizes of B&B trees.

7.4 Solving Large Instances with Parallel BiqBin

In this section, we report numerical results obtained by parallel BiqBin on the Max-Cut, unconstrained BQP and the densest k-subgraph instances that were to the best of our knowledge not solved so far. We used 200 CPU cores and the results are collected in Tables 10 and 12. We can see that the hardest families of instances are again g05 and w09. To solve them, we need to compute more than one million B&B nodes, which took at least six hours of wall-time.
Table 10.
    200 cores
instance group# inst.ndensityB&B (avg)time (avg)
g05_180101800.501712858.620700.5
pm1d_180101800.99522354.410056.6
pm1s_180101800.1060004.21163.1
pw01_180101800.104935.0249.1
pw05_180101800.50887735.020602.4
pw09_180101800.90750231.818468.2
w01_180101800.1036913.2637.9
w05_180101800.50674946.416216.1
w09_180101800.902061425.235320.7
Table 10. Numerical Results Obtained with Parallel BiqBin for Large Instances of the Max-Cut Problem
Table 11.
    200 cores
instance group# inst.ndensityB&B (avg)time (avg)
be250.3102500.301367.41133.0
be250.8102500.807838.83583.2
be300.3103000.305462.84207.5
be300.8103000.80110806.874765.2
Table 11. Numerical Results Obtained with Parallel BiqBin for Large Instances of the Unconstrained BQP
Table 12.
    200 cores
instance group# inst.ndensityB&B (avg)time (avg)
120_30_0.2551200.2533.832.1
120_30_0.551200.5070.233.9
120_30_0.7551200.75117.029.0
120_60_0.2551200.2517.427.8
120_60_0.551200.5021.434.7
120_60_0.7551200.7510.220.2
120_90_0.2551200.251.09.8
120_90_0.551200.5012.645.0
120_90_0.7551200.751.012.5
140_35_0.2551400.25146.251.5
140_35_0.551400.50395.859.7
140_35_0.7551400.75479.459.4
140_70_0.2551400.2558.237.9
140_70_0.551400.50173.856.0
140_70_0.7551400.7512.629.3
140_105_0.2551400.251.017.9
140_105_0.551400.509.850.3
140_105_0.7551400.757.437.6
160_40_0.2551600.25395.084.5
160_40_0.551600.504817.8329.9
160_40_0.7551600.753398.6234.5
160_80_0.2551600.25113.462.1
160_80_0.551600.50253.876.8
160_80_0.7551600.751071.492.1
160_120_0.2551600.2525.484.6
160_120_0.551600.5029.472.7
160_120_0.7551600.7517.876.5
180_45_0.2551800.251548.2188.6
180_45_0.551800.502109.4265.0
180_45_0.7551800.7529243.42089.7
180_90_0.2551800.25923.0144.9
180_90_0.551800.501167.4165.8
180_90_0.7551800.752623.8246.9
180_135_0.2551800.2511.879.5
180_135_0.551800.5029.4133.2
180_135_0.7551800.7529.4113.2
200_50_0.2552000.257400.6849.4
200_50_0.552000.5024145.42578.5
200_50_0.7552000.7553016.24980.4
200_100_0.2552000.251392.2254.9
200_100_0.552000.503801.8501.3
200_100_0.7552000.753067.4358.6
200_150_0.2552000.2543.4145.1
200_150_0.552000.50129.0177.1
200_150_0.7552000.75621.0260.3
Table 12. Numerical Results Obtained with Parallel BiqBin for Large Instances of the Densest k-subgraph Problem

8 Conclusions

In this paper we describe BiqBin, a solver for linearly constrained quadratic problems, which is capable to solve optimality instances that are, due to their size, unsolvable by other existing methods and tools. The main idea underlying this solver is the exact penalty reformulation of a (BQP) instance to an instance of () introduced by Lasserre and enhanced by two co-authors of this paper in [16].
We provide the necessary theoretical results needed to explain the work-flow of the problem reformulations, relaxations, and finally, the details related to the C implementation of BiqBin as an efficient parallel solver. The solver is also available as a web service, which is connected to the high-performance computer at the University of Ljubljana, Faculty of mechanical engineering.
We present extensive numerical results, where BiqBin is benchmarked against BiqCrunch, GUROBI and SCIP. It can be concluded that BiqBin outperforms other solvers on the Max-Cut instances, that it is competitive with BiqCrunch on the instances of unconstrained binary quadratic problems, and that it is slightly worse than BiqCrunch on the instances of the densest k-subgraph problem. The latter is expected since BiqCrunch is specially adapted to solve problems of this type. On these three families of instances, GUROBI and SCIP are non-competitive.
However, when the number of linear constraints slightly increases, like it happens on the fourth family of benchmark instances (randomly generated instances of (BQP)), GUROBI and SCIP become solvers of interest. They solve the problems in the original formulation and the initial linear constraints are important in generating new cutting planes. Therefore, a larger number of linear constraints usually results in a better performance of these solvers, while BiqBin reformulates the problem into an instance of Max-Cut, hence the feasible set always consists of all possible binary vectors. However, on the benchmark instances, BiqBin is still best-performing, while BiqCrunch demonstrates very weak performance and was eliminated from the reported numerical results.
We showed that the BiqBin solver scales very well, hence high-performance computers are the infrastructure to be used to solve instances, that are out of reach for other (sequential) state-of-the-art solvers.
As part of our future work, we plan to merge BiqBin, which uses the bundle method as the computational core, with MADAM introduced in [19], where an alternating direction method of multipliers is developed for solving hard semidefinite relaxations. Additionally, the MPI communication among the processes can be further simplified to enable efficient scaling over larger HPC systems. This can be achieved by employing a one-sided MPI communication. We will also further improve the performance of BiqBin by enhancing an early stopping condition for the case of infeasibility of (BQP).
However, to go much further and solve much larger instances of (BQP) or even more general classes of discrete optimization problems, we need to enhance the existing approach with new advances from polynomial optimization, artificial intelligence and problem specific theoretical findings.

Acknowledgments

Fruitful discussions with Leon Kos, Franz Rendl, and Alen Vegi Kalamar helped a lot to resolve several challenges during the project. Finally, we thank two anonymous referees for improving an earlier version of this work.

Footnotes

References

[1]
Francisco Barahona, Michael Jünger, and Gerhard Reinelt. 1989. Experiments in quadratic 0-1 programming. Math. Program. Ser. A 44, 2 (1989), 127–137. MHPGA4
[2]
John E. Beasley. 1998. Heuristic algorithms for the unconstrained binary quadratic programming problem. London, England 4 (1998).
[3]
Aditya Bhaskara, Moses Charikar, Aravindan Vijayaraghavan, Venkatesan Guruswami, and Yuan Zhou. 2012. Polynomial integrality gaps for strong SDP relaxations of densest k-subgraph. In Proceedings of the Twenty-third Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’12). SIAM, 388–405. http://dl.acm.org/citation.cfm?id=2095116.2095150
[4]
Alain Billionnet. 2005. Different formulations for solving the heaviest k-subgraph problem. INFOR Inf. Syst. Oper. Res. 43, 3 (2005), 171–186.
[5]
Alain Billionnet and Sourour Elloumi. 2007. Using a mixed integer quadratic programming solver for the unconstrained quadratic \(0-1\) problem.Math. Program. Ser. A 109, 1 (2007), 55–68.
[6]
Swagatam Das, Ajith Abraham, and Amit Konar. 2009. Metaheuristic pattern clustering–an overview. In Metaheuristic Clustering. Springer, 1–62.
[7]
Michel Deza and Monique Laurent. 1994. Applications of cut polyhedra. I, II. J. Comput. Appl. Math. 55, 2 (1994), 191–216, 217–247. JCAMDI
[8]
Gianni Di Pillo. 1994. Exact penalty methods. In Algorithms for Continuous Optimization. Springer, 209–253.
[9]
Elizabeth D. Dolan. 2001. The NEOS Server 4.0 Administrative Guide. Technical Memorandum ANL/MCS-TM-250. Mathematics and Computer Science Division, Argonne National Laboratory.
[10]
Fabio Furini, Emiliano Traversi, Pietro Belotti, Antonio Frangioni, Ambros Gleixner, Nick Gould, Leo Liberti, Andrea Lodi, Ruth Misener, Hans Mittelmann, Nikolaos Sahinidis, Stefan Vigerske, and Angelika Wiegele. 2019. QPLIB: A library of quadratic programming instances. Math. Program. Comput. 11, 2 (2019), 237–265.
[11]
Gerald Gamrath, Daniel Anderson, Ksenia Bestuzheva, Wei-Kun Chen, Leon Eifler, Maxime Gasse, Patrick Gemander, Ambros Gleixner, Leona Gottwald, Katrin Halbig, Gregor Hendel, Christopher Hojny, Thorsten Koch, Pierre Le Bodic, Stephen J. Maher, Frederic Matter, Matthias Miltenberger, Erik Mühmer, Benjamin Müller, Marc E. Pfetsch, Franziska Schlösser, Felipe Serrano, Yuji Shinano, Christine Tawfik, Stefan Vigerske, Fabian Wegscheider, Dieter Weninger, and Jakob Witzig. 2020. The SCIP Optimization Suite 7.0. Technical Report. Optimization Online. http://www.optimization-online.org/DB_HTML/2020/03/7705.html.
[12]
Michael R. Garey and David S. Johnson. 1979. Computers and Intractability. W. H. Freeman and Co., San Francisco, Calif. x+338 pages. A guide to the theory of NP-completeness, A Series of Books in the Mathematical Sciences.
[13]
Michel X. Goemans and David P. Williamson. 1995. Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. J. Assoc. Comput. Mach. 42, 6 (1995), 1115–1145. JACOAH
[14]
Gurobi Optimization, LLC. 2022. Gurobi Optimizer Reference Manual. https://www.gurobi.com.
[15]
Nicolò Gusmeroli. 2019. EXPEDIS: Randomly Generated Instances. https://www.aau.at/en/mathematics/publications/software/. (2019).
[16]
Nicolò Gusmeroli and Angelika Wiegele. 2021. EXPEDIS: An exact penalty method over discrete sets. Discrete Optimization (2021), 100622. DOI:
[17]
Christoph Helmberg and Franz Rendl. 1998. Solving quadratic \((0,1)\) -problems by semidefinite programs and cutting planes. Math. Program. Ser. A 82, 3 (1998), 291–315. MHPGA4
[18]
Timotej Hrga, Borut Lužar, Janez Povh, and Angelika Wiegele. 2020. BiqBin: Moving boundaries for NP-hard problems by HPC. In Advances in High Performance Computing, Proceedings of HPC2019 Conference (HPC’19). 388–405.
[19]
Timotej Hrga and Janez Povh. 2020. MADAM: A parallel exact solver for Max-Cut based on semidefinite programming and ADMM. arXiv preprint arXiv:2010.07839 (2020).
[20]
[21]
Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2013. An Introduction to Statistical Learning. Springer Texts Statist., Vol. 103. Springer, New York. xiv+426 pages. DOI:With applications in R.
[22]
Richard M. Karp. 1972. Reducibility among combinatorial problems. In Complexity of Computer Computations (Proc. Sympos., IBM Thomas J. Watson Res. Center, Yorktown Heights, N.Y., 1972). Plenum, New York, 85–103.
[23]
Krzysztof C. Kiwiel. 1989. A survey of bundle methods for nondifferentiable optimization. In Mathematical Programming (Tokyo, 1988). Math. Appl. (Japanese Ser.), Vol. 6. SCIPRESS, Tokyo, 263–282.
[24]
Nathan Krislock, Jérôme Malick, and Frédéric Roupin. 2014. Improved semidefinite bounding procedure for solving max-cut problems to optimality. Math. Program. Ser. A 143, 1-2 (2014), 61–86.
[25]
Nathan Krislock, Jérôme Malick, and Frédéric Roupin. 2017. BiqCrunch: A semidefinite branch-and-bound method for solving binary quadratic problems. ACM Trans. Math. Software (TOMS) 43, 4 (2017), 1–23.
[26]
Amélie Lambert. 2018. Densest k-subgraph Problem, Benchmark Instances. http://cedric.cnam.fr/ lamberta/Library/k-cluster.html. (2018). Accessed: 2018-02-27.
[27]
Jean B. Lasserre. 2016. A MAX-CUT formulation of 0/1 programs.Oper. Res. Lett. 44, 2 (2016), 158–164.
[28]
Frauke Liers, Michael Jünger, Gerhard Reinelt, and Giovanni Rinaldi. 2004. Computing exact ground states of hard ising spin glass problems by branch-and-cut. In New Optimization Algorithms in Physics, Alexander Hartmann and Heiko Rieger (Eds.). Wiley, 47–68.
[29]
Ricardo M. Lima and Ignacio E. Grossmann. 2017. On the solution of nonconvex cardinality Boolean quadratic programming problems: A computational study. Comput. Optim. Appl. 66, 1 (2017), 1–37. DOI:
[30]
miplib2017 2018. MIPLIB 2017. (2018). miplib.zib.de.
[31]
Cristobal A. Navarro, Nancy Hitschfeld-Kahler, and Luis Mateu. 2014. A survey on parallel computing and its applications in data-parallel problems using GPU architectures. Commun. Comput. Phys. 15, 2 (2014), 285–329.
[32]
Panos M. Pardalos and Gregory P. Rodgers. 1990. Computational aspects of a branch and bound algorithm for quadratic zero-one programming. Computing 45, 2 (1990), 131–144. CMPTA2
[33]
Panos M. Pardalos and Gregory P. Rodgers. 1990. Parallel branch and bound algorithms for quadratic zero-one programs on the hypercube architecture. Ann. Oper. Res. 22, 1-4 (1990), 271–292.
[34]
David Pisinger. 2007. The quadratic knapsack problem–a survey. Discrete Appl. Math. 155, 5 (2007), 623–648.
[35]
Franz Rendl, Giovanni Rinaldi, and Angelika Wiegele. 2010. Solving max-cut to optimality by intersecting semidefinite and polyhedral relaxations. Math. Program. Ser. A 121, 2 (2010), 307–335. DOI:
[36]
Giovanni Rinaldi. 1998. Rudy. http://www-user.tu-chemnitz.de/ helmberg/rudy.tar.gz. (1998).
[37]
Alexander Schrijver. 2003. Combinatorial Optimization: Disjoint Paths, Hypergraphs. Springer. 2002036693https://books.google.si/books?id=HNcpAQAAMAAJ
[38]
Alen Vegi Kalamar, Drago Bokal, and Janez Povh. 2019. Parallelization of BiqMac solver. In SOR’19 Proceedings. Slovenian Society Informatika, Section for Operational Research, 161–166.
[39]
Angelika Wiegele. 2007. BiqMac Library. http://biqmac.aau.at/biqmaclib.html. (2007).
[40]
Christian Wiwie, Jan Baumbach, and Richard Röttger. 2015. Comparing the performance of biomedical clustering methods. Nat. Methods 12, 11 (2015), 1033.
[41]
Qinghua Wu and Jin-Kao Hao. 2015. A review on algorithms for maximum clique problems. European J. Oper. Res. 242, 3 (2015), 693–709.
[42]
Rui Xu and Donald Wunsch II. 2005. Survey of clustering algorithms. IEEE Trans. Neural Netw. 16, 3 (2005), 645–678.

Cited By

View all
  • (2024)Using Parallel Branch-and-Bound Method to Accelerate Solving Mixed Integer Linear Programming2024 7th World Conference on Computing and Communication Technologies (WCCCT)10.1109/WCCCT60665.2024.10541621(56-63)Online publication date: 12-Apr-2024
  • (2024)A large population island framework for the unconstrained binary quadratic problemComputers & Operations Research10.1016/j.cor.2024.106684168(106684)Online publication date: Aug-2024
  • (2024)Solving the Set Covering Problem with Conflicts on Sets: A new parallel GRASPComputers & Operations Research10.1016/j.cor.2024.106620166(106620)Online publication date: Jun-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Mathematical Software
ACM Transactions on Mathematical Software  Volume 48, Issue 2
June 2022
308 pages
ISSN:0098-3500
EISSN:1557-7295
DOI:10.1145/3530304
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 July 2022
Accepted: 01 January 2022
Revised: 01 January 2022
Received: 01 July 2020
Published in TOMS Volume 48, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Solvers
  2. semidefinite programming
  3. binary quadratic programming
  4. parallel algorithms

Qualifiers

  • Research-article
  • Refereed

Funding Sources

  • Austrian Science Fund (FWF)
  • Slovenian Research Agency
  • European Union’s Horizon 2020

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)589
  • Downloads (Last 6 weeks)80
Reflects downloads up to 14 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Using Parallel Branch-and-Bound Method to Accelerate Solving Mixed Integer Linear Programming2024 7th World Conference on Computing and Communication Technologies (WCCCT)10.1109/WCCCT60665.2024.10541621(56-63)Online publication date: 12-Apr-2024
  • (2024)A large population island framework for the unconstrained binary quadratic problemComputers & Operations Research10.1016/j.cor.2024.106684168(106684)Online publication date: Aug-2024
  • (2024)Solving the Set Covering Problem with Conflicts on Sets: A new parallel GRASPComputers & Operations Research10.1016/j.cor.2024.106620166(106620)Online publication date: Jun-2024
  • (2023)Faster exact solution of sparse MaxCut and QUBO problemsMathematical Programming Computation10.1007/s12532-023-00236-615:3(445-470)Online publication date: 15-Apr-2023

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media