1. Introduction
Quantum annealing is an emerging technology with the potential to provide high quality solutions to NP-hard problems. In this work, we focus on the devices built by D-Wave Systems, Inc., specifically the D-Wave 2000Q annealer, designed to minimize functions of the following form,
where
are unknown binary variables. The linear weights
and the quadratic coupler weights
are chosen by the user and define the problem under investigation. If
, Equation (
1) is called a
quadratic unconstrained binary optimization (QUBO) problem, and if
it is called an
Ising model. Both QUBO and Ising model formulations are equivalent [
1]. Many important NP-hard problems can be expressed as the minimization of a function of the form of Equation (
1), see [
2]. In the remainder of the article, we focus on the QUBO formulation.
The D-Wave 2000Q device aims to minimize a function of the form of Equation (
1) via a process called quantum annealing. For this, Equation (
1) is mapped to a physical system, consisting of hardware qubits which are situated on the D-Wave chip. Those are initialized in a superposition (enforced by a so-called transverse field), with no information about the problem (Equation (
1)) at the start. During annealing, the system transitions from the initial superposition of all qubits to the actual problem Hamiltonian of Equation (
1), where it is hoped that a slow transition gives the system enough time to arrange itself in an energy minimum along the path. After annealing, a solution of Equation (
1) is read off from the hardware qubits. Although solutions returned by the D-Wave 2000Q annealer for minimization problems of the type of Equation (
1) are typically of very high quality, they are not guaranteed to be optimal. In fact, there is no guarantee that the solutions returned by a D-Wave device keep any level of accuracy.
In this article, we aim to identify those types of QUBOs that are solvable on D-Wave 2000Q with given hardware parameters, without actually using the annealer. This could help decide on what resources (annealing time, number of reads) should be allocated to some problem of interest in advance.
We focus on the maximum clique (MC) problem, and explore two different types of classifications. First, we train a binary machine learning classification model on basic problem characteristics (such as density of the input graph) to discriminate between solvable and unsolvable instances of MC. The resulting model also allows us to rank the features used for training by importance. Classification is performed using the open source machine learning package
scikit-learn in Python [
3]. Second, we show that the decision problem of whether an MC instance will be solved optimally by D-Wave can be predicted with high accuracy by a simple decision tree on the same basic problem characteristics. Third, we train a machine learning regression model to predict the clique size returned by the annealer.
Let be a graph (referred to as original graph) consisting of a vertex set and edge set . A clique in G is any subgraph C of G that is complete, i.e., there is an edge between each pair of vertices of C. The MC problem asks us to find a clique of maximum size, called a maximum clique, which is an NP-hard problem with many applications. To solve an instance of MC on the D-Wave annealer, we proceed as follows:
The task of finding a maximum clique has to be mapped to a formulation of the type of Equation (
1). As shown in [
1,
4], the QUBO function
for all
, achieves a minimum if and only if the vertices
for which
form a maximum clique, where the two constants can be chosen as
,
[
4]. It is noteworthy that any function of the type of Equation (
1) can be represented as a graph
P itself, which is used to embed the QUBO problem into the quantum processor. In this representation, each of the
n variables
becomes a vertex with vertex weight
, and each edge between vertices
i and
j is assigned an edge weight
. We call
P the
unembedded graph.
To minimize Equation (
1) on D-Wave, its corresponding unembedded graph has to be mapped onto the physical hardware qubits on the chip of the D-Wave 2000Q. Since the connectivity of the hardware qubits, called the Chimera graph (see
Figure 1), is limited (in particular, all pairwise connections are not possible), a
minor embedding of
P onto the Chimera graph has to be computed. In a minor embedding, each vertex of
P (referred to as a
logical qubit) is mapped onto a connected set of hardware qubits, called a
chain. Note that in Equation (
1), all pairwise connections require the specification of a weight, and, as such, a coupler weight, called the
chain strength, is also required for each pair of connected chain qubits. If chosen appropriately, the chain strength ensures that hardware qubits representing the same logical qubit take the same value after annealing is complete. The minor embedding of
P onto the Chimera graph is a subgraph of the Chimera graph and, thus, a new graph
. We call
the
embedded graph. Roughly speaking, the closer the topology of
P resembles the one of the Chimera graph, the easier it will be to embed it onto the D-Wave 2000Q hardware, and the larger the embeded problems
P can be.
After annealing, it is not guaranteed that chained qubits on the Chimera graph take the same value, even though they represent the same logical qubit. This phenomenon is called a
broken chain. To obtain an interpretable solution of logical qubits, definite values have to be assigned to each logical qubit occurring in Equation (
1). Although there is no unique way to accomplish this task, D-Wave offers several default methods to
unembed chains. Those include
majority vote or
minimize energy. Majority vote sets the value of a chained qubit to the value (zero or one) taken most often among the hardware qubits which represent the logical qubit. In case of a draw, an unbiased coin flip is used. The minimize energy option determines the final value of all logical qubits via a post-processing step using some form of gradient descent. We employ majority vote in the remainder of the article.
Applying machine learning to predict features of a quantum device is a timely area of research. Existing work mostly focuses on gate quantum computing. For instance, machine learning is employed to help tune the behavior of quantum systems, such as computing the overlap of two quantum states [
5], quantum circuit optimization [
6,
7], or readout of states with higher assignment fidelities under noise conditions [
8]. Other works address parameter tuning of variational quantum algorithms [
9] or the quantum approximate optimization algorithm (QAOA) algorithm of [
10], see [
11]. The closest to our approach is [
12], wherein the authors use machine learning techniques to identify graph problems that are easy to solve using QAOA. However, they use the quantum gate model and their specific objective is a bit different—to decide whether QAOA or the classical Goemans-Williamson algorithm will perform better on instances of the maximum cut problem. In contrast, we compare the quality of the solution found with the quantum algorithm against an optimal solution, and our optimization problem is the maximum clique problem.
Somewhat related, in [
13,
14] the authors use a differential evolution optimizer to tune three parameters of D-Wave 2000Q, namely spin reversal, anneal offsets, and chain weights. However, machine learning methods are not used, and the objective is to optimize parameters, rather than predict performance.
Finally, some authors have addressed the difficulty of solving NP-hard problems to optimality by trying to develop classical approximation algorithms for them or proving the difficulty of their prediction. For the MC problem, it has been shown in [
15] that for every positive
, no polynomial-time algorithm exists that approximately computes the maximum clique size to within a factor of
, unless P = NP, which indicates the MC problem is hard to solve even approximately. There have also been results establishing the fixed-parameter intractability of the MC problem [
16]. The difference between those results and our work is that we do not try to determine the difficulty of any particular instance of the MC clique problem per se, but how difficult that problem is for the D-Wave 2000Q quantum annealer. Moreover, some of our results also take into account some tunable hardware parameters used for the annealer, in addition to the input graph parameters. No similar results currently exist in the literature.
This article is structured as follows.
Section 2 introduces the machine learning methodology we employ to predict the behavior of D-Wave 2000Q. Experimental results are given in
Section 3, where we consider the classification problem of predicting whether D-Wave 2000Q can solve a particular MC instance, the computation of a simple decision tree allowing us to classify problem instances manually, and the prediction of the clique size returned by the annealer using a machine learning regression model. The article concludes with a discussion in
Section 4.
2. Methods
We aim to predict both if an instance of MC is solvable by D-Wave 2000Q to optimality, as well as the size of the clique which will be found by the annealer.
We first generate a set of 47000 Erdős-Rényi random graphs. For each graph, we first choose its number of vertices
uniformly at random in
, and its density
p uniformly at random in
. We then insert any of the
pairwise edges after an independent coin flip with probability
p. Using these 47000 Erdős-Rényi random graphs, we generate a dataset of 47000 MC instances by applying the QUBO formulation of Equation (
1). We use
of those 47000 MC instances for training, the remaining
are used for validation. The upper bound of 65 vertices is determined by the largest size of a complete graph embeddable on D-Wave’s Chimera architecture. We specifically do not use any graphs with zero degree nodes.
Next, we compute an embedding
E of a complete 65 node graph onto the Chimera architecture with the help of the
minorminer module of the D-Wave Ocean API [
17]. Since any graph with at most 65 nodes is a subgraph of the complete 65 node graph, we can re-use the embedding
E to map any problem instance of up to 65 vertices onto the D-Wave 2000Q annealer. Different embeddings typically use different hardware qubits, and map logical qubits to chains of different lengths, both of which can alter the performance of the D-Wave 2000Q. In order to be able to compare and interpret results across different graphs, we thus keep the embedding fixed in all experiments. Moreover, the D-Wave API provides a tool to compute the chain strengths for an embedding, called
uniform torque compensation [
18]. We always compute all chain strengths using this feature. For each graph on which MC is to be solved, we request 1000 anneals from D-Wave 2000Q after embedding. The best D-Wave result (i.e., the largest clique found) among those 1000 anneals is saved for each graph after unembedding with the majority vote option (see
Section 1).
2.1. Classification
Our task is to relate graph features to a given binary indicator from D-Wave expressing if an instance could be solved by the annealer to optimality. Several avenues exist to accomplish this task, for instance via (penalized) linear regression or logistic regression, or via more sophisticated machine learning models. The following highlights some of the challenges.
First, in order to have a well-defined decision problem for finding maximum cliques, we require a ground truth determined with a classical method. We define an instance of MC to be solvable by the D-Wave 2000Q if the clique determined by D-Wave has size equal to the maximum clique size found with the function
graph_clique_number of the
NetworkX package in Python [
19,
20], which is an exact solver. In this case, the class label was set to 1 (solvable), and 0 otherwise.
Second, a couple of choices have to be made, both regarding the machine learning model for regression, as well as the set of graph features selected for prediction. We decided to use a decision tree classifier for two main reasons: The classifier achieved good performance in the classification task we consider and, most importantly, it allows us to obtain an interpretable output in the form of a decision tree. The decision tree highlights in what order or importance the features contribute to solvability on the D-Wave 2000Q, and by tuning the decision tree for simplicity, it allows one to (manually) determine with high probability in advance if an instance is likely solvable.
Third, an assortment of graph-related features has to be selected to serve as inputs to the machine learning model. Those features are selected to cover a wide variety of potential metrics impacting solvability, however the list below is neither exhaustive nor rigorously proven. The following features went into the machine learning model threefold (unless noted otherwise), precisely for the original input graph
G, its unembedded graph
P of the MC problem, and its embedded graph
on the Chimera architecture (see
Section 1):
the graph density (Graph_Density);
the minimal, maximal, and average degree of any vertex (Graph_Min_Degree,
Graph_Max_Degree, and Graph_Mean_Degree);
the number of triangles in the input and unembedded graphs (Graph_Num_Triangles);
the number of nodes and edges (Graph_Num_Nodes and Graph_Num_Edges);
the five largest eigenvalues of the adjacency matrix (Graph_Largest_Eigenvalue,
Graph_2nd_Largest_Eigenvalue, etc., up to Graph_5th_Largest_Eigenvalue), and the spectral gap (the difference between the moduli of the two largest eigenvalues) of the adjacency matrix (Graph_Spectral_Gap). For brevity of notation, we refer with “eigenvalue of a graph” to the eigenvalue of its adjacency matrix.
Additionally, for the embedded graph on the Chimera architecture, we included the following features into the model:
the minimal, maximal, and average chain length occurring in the embedding
(Min_Chain_Length, Max_Chain_Length, and Avg_Chain_Length);
the chain strength computed with D-Wave’s
uniform torque compensation feature (
Chain_Strength), see [
18], using either a fixed UTC prefactor (
Section 3.1) or a randomly selected one in
(
Section 3.2);
the annealing time (in microseconds), which was selected uniformly at random in (Annealing_Time).
In total, there were 46 features that were included as inputs to the machine learning model.
Using those features of the training MC instances and their maximum clique result computed by the D-Wave 2000Q annealer, we train a machine learning classifier implemented in the
sklearn.tree.DecisionTreeClassifier() class provided in
scikit-learn [
3]. After performing a grid search across the following parameters, we selected
max_depth=5,
random_state=0, and
min_impurity_decrease=0.005. All other parameters were kept at their default values. To weigh solvable MC instances by D-Wave more heavily than unsolvable ones, the option
class_weight=’balanced’ was employed. The option of balanced class weightings was chosen as only about
of the test problems are solvable. Therefore, a model that classifies all instances as unsolvable will still be 87–89% accurate. By setting
class_weight=’balanced’, we assign a weight to the solvable problems that is inversely proportional to how often they appear in the dataset, thus penalizing incorrectly classified solvable problems by a factor of about
.
During training, we noticed that the decision tree machine learning model appears to have trouble with problems that return a clique size of zero. This was due to a suboptimal QUBO formulation we used for MC, and the problem disappeared when using the formulation of [
1,
4]. However, similar issues might occur with other problems, highlighting that the present approach might be most suitable for problem classes that can be solved sufficiently well on D-Wave. Additionally, it is challenging to tune the decision tree in such a way as to obtain good performance and interpretable results simultaneously. Applying the decision tree classifier using default parameters usually results in very large trees having many redundant branches, which are poorly interpretable. However, this issue can be alleviated by increasing the Gini impurity (parameter
min_impurity_decrease) while simultaneously decreasing the maximal depth of the tree (parameter
max_depth).
2.2. Regression
In contrast to predicting solvability alone, we also attempt to predict the actual clique size found by the D-Wave 2000Q device from graph and annealer features alone. Therefore, in this section, our aim is not the prediction of a binary decision (solvable, not solvable), but predicting an integer response (precisely, an integer within the range of possible clique sizes ). For this task, no ground truth (via a classical solver) is needed.
As in the previous section, several regression models are suitable for this task. We chose to predict the clique size returned by D-Wave 2000Q with gradient boosting, a popular machine learning regression model. Whereas random forests build deep independent trees, gradient boosting works by constructing an ensemble of dependent shallow trees, each one improving on the previous one. Although shallow trees by themselves are usually only weak predictive models, combining (or “boosting”) them in an ensemble allows for powerful machine learning models. We employed the
GradientBoostingRegressor() class [
21] in
scikit-learn [
3]. Similarly to the classification setting, we performed a grid search across selected parameters, and set
n_estimators=200, and
random_state=0. All other parameters were kept at their default values.
The graph and annealer features included in the model are the same as the ones outlined in the previous section. One caveat is worth mentioning. It is not always true that the set of vertices (given by the bitstring indicating the clique vertices with a 1) returned by the D-Wave 2000Q annealer actually forms a clique. Therefore, to train the model, we verify first if the result returned by D-Wave 2000Q is a clique. If it is, we use the clique size in the regression, otherwise we use a value of zero.
4. Discussion
In this contribution, we aim to understand some of the factors contributing to the hardness of a problem instance sent to the D-Wave 2000Q annealer. We focus on the MC problem, and train several machine learning models on several thousand randomly generated input problems with the aim to learn features to (a) predict if D-Wave 2000Q will be able to solve an instance of MC to optimality, and (b) predict the size of the clique that the D-Wave 2000Q device will find.
We show that indeed, for specific problem of predicting the clique size, a relatively small number of features of the input problem, its QUBO formulation and its embedding onto the D-Wave Chimera architecture seems to suffice to predict solubility with high precision. Those features can be summarized in a simple decision tree, which even allows one to manually classify MC instances. A regression analysis demonstrated that the clique size the D-Wave 2000Q will find can, likewise, be predicted with a low root mean square error.
This article leaves scope for a variety of avenues for future work. For instance, it is unknown how these results generalize to other NP-hard problems, how well prediction will work on future D-Wave machines (or even other quantum annealers), and it remains to be investigated if the presented results can be improved by fitting more sophisticated machine learning models.