Without loss of generality, we may assume that
f has a unique minimizer
\(x^*\) in Theorems
1.1 and
1.2. To justify this statement, suppose the set of minimizers
\(K^*\) of
f satisfies assumption (⋆). Let
\(x^* \in K^*\) be the unique lexicographically minimal minimizer, i.e., every other minimizer
\(x \in K^*\) satisfies
\(x_i \gt x_i^*\) for the smallest coordinate
\(i \in [n]\) in which
\(x_i \ne x_i^*\) . Whenever
\(\mathsf {SO}\) is queried at a minimizer
\(y \in K^*\) and outputs “YES”, our algorithm continues to minimize the linear objective
\(e_i^\top x\) , where
\(i \in [n]\) is the smallest index such that the
ith standard orthonormal basis vector
\(e_i\) is not orthogonal to the current working subspace, by pretending that
\(\mathsf {SO}\) returns
5 the vector
\(-e_i\) (until its search set contains a single point). Equivalently, our algorithm minimizes the linear objectives
\(e_1^\top x, \ldots , e_n^\top x\) in the given order inside
\(K^*\) , and this optimization problem has the unique solution
\(x^*\) . We make the assumption that
f has a unique minimizer
\(x^*\) in the rest of this paper.
On a high level, our algorithm maintains a convex search set
K that contains the integral minimizer
\(x^*\) of
f, and iteratively shrinks
K using the cutting plane method; as the volume of
K becomes small enough, our algorithm finds a hyperplane
P that contains all the integral points in
K and recurse on the lower-dimensional search set
\(K \cap P\) . The assumption that
\(x^*\) is integral guarantees that
\(x^* \in K \cap P\) . This natural idea was previously used in [
18,
19] to handle rational polytopes that are not full-dimensional and in [
29] to argue that
\(O(n^3 \log (n))\) oracle calls is information theoretically sufficient for SFM. The main technical difficulties in efficiently implementing such an idea are two-fold:
The second difficulty is key to achieving a small oracle complexity and deserves some further explanation. To see why shrinking K arbitrarily might result in a loss of progress, it’s instructive to consider the following toy example: suppose an algorithm starts with the unit cube \(K = [0,1]^n\) and \(x^*\) lies on the hyperplane \(K_1 = \lbrace x: x_1 = 0\rbrace\) ; suppose the algorithm obtains, in its ith call to \(\mathsf {SO}\) , the halfspace \(H_i = \lbrace x: x_1 \le 2^{-i}\rbrace\) . After T calls to \(\mathsf {SO}\) , the algorithm obtains the refined search set \(K \cap H_T\) with volume \(2^{-T}\) . However, when the algorithm reduces the dimension and recurses on the hyperplane \(K_1\) , the \((n-1)\) -dimensional volume of the search set again becomes 1, and the progress made by the algorithm in shrinking the volume of K is entirely lost. In contrast, the correct algorithm can reduce the dimension after only one call to \(\mathsf {SO}\) when it’s already clear that \(x^* \in K_1\) .
1.3.1 The Grötschel-Lovász-Schrijver Approach.
For the moment, let’s take
K to be an ellipsoid. Such an ellipsoid can be obtained by Vaidya’s volumetric center cutting plane method
6 [
49]. One natural idea to find the hyperplane comes from the following geometric intuition: when the ellipsoid
K is “flat” enough in one direction, then all of its integral points lie on a hyperplane
P. To find such a hyperplane
P, Grötschel, Lovász and Schrijver [
18,
19] gave an elegant application of simultaneous Diophantine approximation. We explain the main ideas behind this application in the following. We refer interested readers to [
19, Chapter 6] for a more comprehensive presentation of their approach and its implications to finding exact LP solutions.
For simplicity, we assume
K is centered at 0. Let
a be the unit vector parallel to the shortest axis of
K and
\(\mu _{\min }\) be the Euclidean length of the shortest axis of
K. Approximating the vector
a using the efficient simultaneous Diophantine approximation algorithm by Lenstra, Lenstra and Lovász [
30], one obtains an integral vector
\(v \in \mathbb {Z}^n\) and a positive integer
\(q \in \mathbb {Z}\) such that
This implies that for any integral point
\(x \in K \cap \lbrace 0,1\rbrace ^n\) ,
When \(\mu _{\min } \lt 2^{-3n^2}\) , the integral inner product \(v^\top x\) has to be 0 and therefore all integral points in K lie on the hyperplane \(P = \lbrace x: v^\top x = 0\rbrace\) . An efficient algorithm immediately follows: we first run the cutting plane method until the shortest axis of K has length \(\mu _{\min } \approx 2^{-3n^2}\) , then apply the above procedure to find the hyperplane P on which we recurse.
To analyze the oracle complexity of this algorithm, one naturally uses
\(\mathsf {vol}(K)\) as the potential function. An amortized analysis using such a volume potential previously appeared, for example, in [
14] for finding maximum support solutions in the linear conic feasibility problem. Roughly speaking, each cutting plane step (corresponding to one oracle call) decreases
\(\mathsf {vol}(K)\) by a constant factor; each dimension reduction step increases
\(\mathsf {vol}(K)\) by roughly
\(1/\mu _{\min } \approx 2^{3n^2}\) . As there are
n dimension reduction steps before the problem becomes trivial, the total number of oracle calls is thus
\(O(n^3)\) . The exponential time oracle complexity bound of
\(O(n^2 \log (n))\) can be obtained similarly by using Dirichlet’s approximation theorem on simultaneous Diophantine approximation (e.g., [
6, Section 1.10]) instead.
One might wonder if the oracle complexity upper bound for their polynomial time algorithm can be improved using a better analysis. However, there is some fundamental issue in getting such an improvement. In particular, the upper bound of
\(2^{O(n^2)}\) on
q in efficient simultaneous Diophantine approximation corresponds to the
\(2^{O(n)}\) -approximation factor of the Shortest Vector Problem in lattices, first obtained by Lenstra, Lenstra and Lovász [
30]. Despite forty years of effort, this approximation factor has only been improved slightly to
\(2^{n \log \log (n)/\log n}\) for polynomial time algorithms [
4].
1.3.2 Lattices to the Rescue: A Reduction to the Shortest Vector Problem.
To bypass the previous bottleneck and prove Theorem
1.1, we give a reduction to the Shortest Vector Problem directly. We give a new method to find the hyperplane for dimension reduction based on an approximately shortest vector of certain lattice, and analyze its oracle complexity via a novel potential function that captures simultaneously the volume of the search set
K and the density of the lattice. The change in the potential function after dimension reduction is analyzed through a high dimensional slicing lemma. The details for this algorithm and its analysis are given in Sections
4 and
5.
Finding the hyperplane. We maintain a polytope
K (which we assume to be centered at 0 for simplicity) using an efficient implementation of the center of gravity method due to Bertsimas and Vempala [
5]. The following sandwiching condition is standard in convex geometry
where
\(\mathsf {Cov}(K)\) is the covariance matrix of the uniform distribution over
K. Sufficiently good approximation to
\(\mathsf {Cov}(K)\) can be obtained efficiently by sampling from
K [
5] so we ignore any computational issue for now.
To find a hyperplane
P that contains all integral points in
K, it suffices to preserve all the integral points in the outer ellipsoid
\(E = 2n \cdot E(\mathsf {Cov}(K)^{-1})\) on the
right-hand side (RHS) of (
1). Let
\(x \in E \cap \mathbb {Z}^n\) be an arbitrary integral point. For any vector
v,
As long as
\(\left\Vert v\right\Vert _{\mathsf {Cov}(K)} \lt 1/10n\) and
\(v^\top x\) is an integer, we can conclude that
\(v^\top x = 0\) and this implies that all integral points in
K lie on the hyperplane
\(P = \lbrace x: v^\top x = 0\rbrace\) . Note that by (
2), such a vector
v with small
\(\Vert v\Vert _{\mathsf {Cov}(K)}\) essentially controls the ellipsoid width
\(\mathsf {width}_E(v) := \max _{x \in E} v^\top x - \min _{x \in E} v^\top x\) .
One might attempt to guarantee that \(v^\top x\) is integral by choosing v to be an integral vector. However, this idea has a fundamental flaw: as the algorithm reduces the dimension by restricting on a subspace W, the set of integral points on W might become much sparser. As such, one needs \(\mathsf {vol}(K)\) to be very small to guarantee that \(\left\Vert v\right\Vert _{\mathsf {Cov}(K)} \lt 1/10n\) and this results in a very large oracle complexity.
To avoid this issue, we take
\(v = \Pi _W(z) \ne 0\) as the projection of some integral point
\(z \in \mathbb {Z}^n\) on
W, where
W is the subspace on which
K lies. Since
\(z - v \in W^\bot\) , we have
\(v^\top x = z^\top x\) and this guarantees that
\(v^\top x\) is integral. For the general case where
K is not centered at 0, a simple rounding procedure computes the desired hyperplane. We postpone the details of constructing the hyperplane to Lemma
3.1.
How do we find a vector \(v \in \Pi _W(\mathbb {Z}^n) \setminus \lbrace 0\rbrace\) that satisfies \(\left\Vert v\right\Vert _{\mathsf {Cov}(K)} \lt 1/10n\) ? This is where lattices come into play. In particular, since \(\Lambda = \Pi _W(\mathbb {Z}^n)\) forms a lattice, we can apply any \(\gamma\) -approximation algorithm for the Shortest Vector Problem. If the shortest non-zero vector in \(\Lambda\) has \(\mathsf {Cov}(K)\) -norm at most \(1/10 \gamma n\) , then we can find a non-zero vector v that satisfies \(\left\Vert v\right\Vert _{\mathsf {Cov}(K)} \lt 1/10n\) .
The algorithm. This new approach for finding the hyperplane immediately leads to the following algorithm: we run the approximate center of gravity method for one step to decrease the volume of the polytope K by a constant factor; then we run the \(\gamma\) -approximation algorithm for SVP to find a non-zero vector v for dimension reduction. If \(\left\Vert v\right\Vert _{\mathsf {Cov}(K)} \ge 1/10n\) , then we continue to run the cutting plane method; otherwise, we use the above procedure to find a hyperplane P containing all integral points in K, update the polytope K to be \(K \cap P\) and recurse.
Potential function analysis. To analyze such an algorithm, one might attempt to use \(\mathsf {vol}(K)\) as the potential function as in the Grötschel-Lovász-Schrijver approach. However, one quickly realizes that \(\mathsf {vol}(K \cap P) / \mathsf {vol}(P)\) can be as large as \(\left\Vert v\right\Vert _2 / \left\Vert v\right\Vert _{\mathsf {Cov}(K)}\) . While it’s expectable that \(\left\Vert v\right\Vert _{\mathsf {Cov}(K)}\) is not too small since we are frequently checking for a short lattice vector, one has no control over \(\left\Vert v\right\Vert _2\) in general.
Key to our analysis is the potential function
\(\Phi = \mathsf {vol}(K) \cdot \det (\Lambda)\) that measures simultaneously the volume of
K and the covolume
\(\det (\Lambda)\) of the lattice
\(\Lambda\) . Essentially, this potential function controls the lattice width
\(\min _{v \in \Lambda \setminus \lbrace 0\rbrace } \mathsf {width}_E(v)\) of the outer ellipsoid
E. In fact, Minkowski’s first theorem (Theorem
2.4) implies that there always exists a vector
\(v \in \Lambda \setminus \lbrace 0\rbrace\) such that
\(\mathsf {width}_E(v) \le \mathsf {poly}(n) \cdot \Phi ^{1/n}\) , and thus the potential function would never get too small before dimension reduction takes place.
Continuing with the analysis via the potential function
\(\Phi\) , while
\(\mathsf {vol}(K)\) increases by
\(\left\Vert v\right\Vert _2 / \left\Vert v\right\Vert _{\mathsf {Cov}(K)}\) after the dimension reduction, standard fact on lattice projection (Fact
2.2) shows that the covolume of the lattice decreases by a factor of
\(\left\Vert v\right\Vert _2\) . The decrease in the covolume of the lattice thus elegantly cancels out the increase in
\(\mathsf {vol}(K)\) , leading to an overall increase in the potential of at most
\(1/\left\Vert v\right\Vert _{\mathsf {Cov}(K)} = O(\gamma n)\) . It follows that the total increase in the potential over all
n dimension reduction steps is at most
\((\gamma n)^n\) . Note that each cutting plane step still decreases the potential function by a constant factor since the lattice is unchanged. Therefore, the total number of oracle calls is at most
\(O(n \log (\gamma n))\) .
High dimensional slicing lemma for consecutive dimension reduction steps. The argument above ignores a slight technical issue: while we can guarantee that \(\left\Vert v\right\Vert _{\mathsf {Cov}(K)} \ge 1/\gamma n\) after cutting plane steps by checking for short non-zero lattice vectors, it’s not clear why \(\left\Vert v\right\Vert _{\mathsf {Cov}(K)}\) cannot be too small after a sequence of dimension reduction steps. It turns out that this can happen only when \(\mathsf {Cov}(K)\) becomes much smaller (e.g., the hyperplane P is far from the centroid of K) after dimension reduction, in which case \(\mathsf {vol}(K)\) as well as the potential also become much smaller.
To formally analyze the change in the potential function after a sequence of
k consecutive dimension reduction steps, we note that the polytope
K (which we assume to be isotropic for simplicity) becomes a “slice”
\(K \cap W\) and the lattice
\(\Lambda\) becomes the projected lattice
\(\Pi _W(\Lambda)\) , where
W is a subspace. One can show using standard convex geometry tools that
\(\mathsf {vol}(K \cap W) / \mathsf {vol}(K)\) is at most
\(k^{O(k)}\) , and via Minkowski’s first theorem that
\(\det (\Pi _W(\Lambda)) / \det (\Lambda)\) is at most
\(\sqrt {k}^k / \lambda _1(\Lambda)^k\) , where
\(\lambda _1(\Lambda)\) is the Euclidean length of the shortest non-zero vector in
\(\Lambda\) . We leave the details of this high dimensional slicing lemma to Lemma
3.2. Since we know that
\(\lambda _1(\Lambda) \ge 1/\gamma n\) in the first dimension reduction step, the potential function increases by a factor of at most
\((\gamma n)^{O(k)}\) over a sequence of
k consecutive dimension reduction steps. This gives a more precise analysis of the
\(O(n \log (\gamma n))\) oracle complexity.