Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

Minimizing Convex Functions with Rational Minimizers

Published: 19 December 2022 Publication History
  • Get Citation Alerts
  • Abstract

    Given a separation oracle SO for a convex function f defined on ℝn that has an integral minimizer inside a box with radius R, we show how to find an exact minimizer of f using at most
    O(n (n log log (n)/log (n) + log (R))) calls to SO and poly (n, log (R)) arithmetic operations, or
    O(n log (nR) calls to SO and exp (O(n)) ⋅ poly (log (R)) arithmetic operations.
    When the set of minimizers of f has integral extreme points, our algorithm outputs an integral minimizer of f. This improves upon the previously best oracle complexity of O(n2 (n + log (R))) for polynomial time algorithms and O(n2 log (nR) for exponential time algorithms obtained by [Grötschel, Lovász and Schrijver, Prog. Comb. Opt. 1984, Springer 1988] over thirty years ago. Our improvement on Grötschel, Lovász and Schrijver’s result generalizes to the setting where the set of minimizers of f is a rational polyhedron with bounded vertex complexity.
    For the Submodular Function Minimization problem, our result immediately implies a strongly polynomial algorithm that makes at most O(n3 log log (n)/log (n)) calls to an evaluation oracle, and an exponential time algorithm that makes at most O(n2 log (n)) calls to an evaluation oracle. These improve upon the previously best O(n3 log2(n)) oracle complexity for strongly polynomial algorithms given in [Lee, Sidford and Wong, FOCS 2015] and [Dadush, Végh and Zambelli, SODA 2018], and an exponential time algorithm with oracle complexity O(n3 log (n)) given in the former work.
    Our result is achieved via a reduction to the Shortest Vector Problem in lattices. We show how an approximately shortest vector of an auxiliary lattice can be used to effectively reduce the dimension of the problem. Our analysis of the oracle complexity is based on a potential function that simultaneously captures the size of the search set and the density of the lattice, which we analyze via tools from convex geometry and lattice theory.

    1 Introduction

    In this paper, we investigate the problem of minimizing a convex function f on \(\mathbb {R}^n\) accessed through a separation oracle \(\mathsf {SO}\) [17]. When queried with a point x, the oracle returns “YES” if x minimizes f; otherwise, the oracle returns a hyperplane that separates x from the minimizer of f. An algorithm is said to be strongly polynomial [19] for such a problem if it makes \(\mathsf {poly}(n)\) calls to \(\mathsf {SO}\) , uses \(\mathsf {poly}(n)\) arithmetic operations, and the size of the numbers occurring during the algorithm is polynomially bounded by n and the size of the output of the separation oracle.
    Designing strongly polynomial algorithms for continuous optimization problems with certain underlying combinatorial structure is a well-studied but challenging task in general. To this date, despite tremendous effort, it remains a major open question to solve linear programming (LP) in strongly polynomial time. This problem is also widely known as Smale’s 9th question [46]. Despite this barrier, such algorithms are known under additional assumptions: linear systems with at most two non-zero entries per row [2, 9, 34] or per column [39, 51] in the constraint matrix, LPs with bounded entries in the constraint matrix [11, 48, 50], and LPs with 0-1 optimal solutions [7, 8].
    For minimizing a general convex function f, strongly polynomial algorithms are hopeless unless f satisfies additional combinatorial properties. In this work, we study the setting where the minimizer of f is an integral point inside a box with radius1 \(R = 2^{\mathsf {poly}(n)}\) . The integrality assumption on the minimizer is natural and is general enough to encapsulate well-known problems such as submodular function minimization where \(R = 1\) . Prior to our work, an elegant application of simultaneous Diophantine approximation due to Grötschel, Lovász and Schrijver [18, 19] gives2 a strongly polynomial algorithm that minimizes f using \(O(n^2(n + \log (R)))\) calls to the separation oracle and an exponential time algorithm that minimizes f using \(O(n^2 \log (nR))\) oracle calls.
    In fact, Grötschel, Lovász and Schrijver’s approach applies to the more general setting of rational polyhedra, which they use to derive polynomial time algorithms for a wide range of combinatorial optimization problems [17, 19]. In the rational polyhedra setting, the set of minimizers of f is a polyhedron \(K^*\) inside a box with radius R, and the vertices of \(K^*\) are all rational vectors with LCM vertex complexity3 bounded by at most \(\varphi \ge 0\) (Definition 2.6). In particular, the case of integral minimizers in the previous paragraph corresponds to when \(\varphi = 0\) . For the more general setting of rational polyhedra, Grötschel, Lovász and Schrijver’s approach implies a polynomial time algorithm that finds a vertex of \(K^*\) using \(O(n^2(n + \varphi + \log (R)))\) separation oracle calls, and an exponential time algorithm that uses \(O(n^2(\varphi + \log (nR)))\) oracle calls. We refer interested readers to [19, Chapter 6] for a detailed presentation of their approach. The purpose of the present paper is to design a new method to improve the number of separation oracle calls.
    A closely related problem, known as the Convex Integer Minimization problem, asks to minimize a convex function f over the set of integer points. Dadush [10, Section 7.5] gave an algorithm for this problem that takes \(n^{O(n)}\) time and exponential space. In fact, the Convex Integer Minimization problem generalizes integer linear programming and thus cannot be solved in sub-exponential time under standard complexity assumptions, so the integrality/rationality assumption on the minimizer of f is, in some sense, necessary for obtaining efficient algorithms.
    The number of separation oracle calls made by an algorithm for minimizing a convex function f, known as the oracle complexity, plays a central role in black-box models of convex optimization. For weakly polynomial algorithms, it’s well-known that \(\Theta (n \log (nR/\epsilon))\) oracle calls is optimal, with \(\epsilon\) being the accuracy parameter. The first exponential time algorithm that achieves the optimal oracle complexity is the famous center of gravity method discovered independently by Levin [31] and Newman [38]. As for polynomial time algorithms, an oracle complexity of this order was first achieved over thirty years ago by the method of inscribed ellipsoids [28, 37]. In contrast, the optimal oracle complexity for strongly polynomial algorithms is largely unknown to this date. This motivates the present paper to place a focus on the oracle complexity aspect of our algorithms.

    1.1 Our Results

    To formally state our result, we first define the notion of a separation oracle as formulated in [17].
    Definition 1.1 (Separation Oracle [17]).
    Let f be a convex function on \(\mathbb {R}^n\) and \(K^*\) be the set of minimizers of f. Then a (strong) separation oracle \(\mathsf {SO}\) for f is one that:
    (a)
    when queried with a minimizer \(x \in K^*\) , it outputs “YES”;
    (b)
    when queried with a point \(x \notin K^*\) , it outputs a non-zero vector \(c \in \mathbb {R}^n\) such that \(\min _{y \in K^*} c^\top y \gt c^\top x\) .
    The setting of integral minimizers. The main result of the paper in this setting is the following reduction to the Shortest Vector Problem (see Section 2.2.3) given in Theorem 1.1. The seemingly strong assumption (⋆) guarantees that our algorithm finds an integral minimizer of f, which is crucial for our application to submodular function minimization. To find an arbitrary minimizer of f, we only need the much weaker assumption that f has an integral minimizer (see Remark 1.4).
    Theorem 1.1 (Main Result for Integral Minimizers).
    Given a separation oracle \(\mathsf {SO}\) for a convex function f defined on \(\mathbb {R}^n\) , and a \(\gamma\) -approximation algorithm ApproxSVP for the shortest vector problem which takes \(T_{{\rm\small SVP}}\) arithmetic operations. If the set of minimizers \(K^*\) of f is contained in a box of radius R and satisfies
    (⋆)
    all extreme points of \(K^*\) are integral,
    then there is a randomized algorithm that with high probability finds an integral minimizer of f using \(O(n \log (\gamma n R))\) calls to \(\mathsf {SO}\) and \(\mathsf {poly}(n, \log (\gamma R)) \cdot T_{{\rm\small SVP}}\) arithmetic operations.
    In particular, taking ApproxSVP to be the polynomial time \(2^{n \log \log (n)/\log (n)}\) -approximation algorithm4 in [4] or the exponential time algorithms for exact SVP [3, 4, 35] gives the following corollary.
    Corollary 1.2 (Instantiations of Main Result).
    Under the same assumptions as in Theorem 1.1, there is a randomized algorithm that with high probability finds an integral minimizer of f using
    (a)
    \(O(n (n \log \log (n)/\log (n) + \log (R)))\) calls to \(\mathsf {SO}\) and \(\mathsf {poly}(n, \log (R))\) arithmetic operations, or
    (b)
    \(O(n \log (nR))\) calls to \(\mathsf {SO}\) and \(\exp (O(n)) \cdot \mathsf {poly}(\log (R))\) arithmetic operations.
    More generally, for any integer \(r \gt 1\) , one can use the \(r^{O(n/r)}\) -approximation algorithm in \(2^{O(r)} \mathsf {poly}(n)\) time for SVP given in [4, 35] to obtain a smooth tradeoff between time and oracle complexity in Theorem 1.1, but we omit the explicit statements of these results.
    Remark 1.3 (Assumption (⋆) and Lower Bound)
    Without assumption (⋆), we give a \(2^{\Omega (n)}\) information theoretic lower bound on the number of \(\mathsf {SO}\) calls needed to find an integral minimizer of f. Consider the unit cube \(K=[0,1]^n\) and let \(V(K) = \lbrace 0,1\rbrace ^n\) be the set of vertices. For each \(v \in V(K)\) , define the simplex \(\Delta (v) = \lbrace x \in K: \left\Vert x - v\right\Vert _1 \lt 0.01 \rbrace\) . Randomly pick a vertex \(u \in V(K)\) and consider the convex function
    \(\begin{align*} f_u(x) = {\left\lbrace \begin{array}{ll}0 \qquad & x \in K \setminus (\cup _{v \in V(K) \setminus \lbrace u\rbrace } \Delta (v))\\ \infty \qquad & \text{otherwise} \end{array}\right.}. \end{align*}\)
    When queried with a point \(x \in \Delta (v)\) for some \(v \in V(K) \setminus \lbrace u\rbrace\) , we let \(\mathsf {SO}\) output a separating hyperplane H such that \(K \cap H \subseteq \Delta (v)\) ; when queried with \(x \notin K\) , we let \(\mathsf {SO}\) output a hyperplane that separates x from K. Notice that u is the unique integral minimizer of \(f_u\) , and to find u, one cannot do better than randomly checking vertices in \(V(K)\) which takes \(2^{\Omega (n)}\) queries to \(\mathsf {SO}\) .
    We next argue that \(\Omega (n \log (R))\) calls to \(\mathsf {SO}\) is information theoretically necessary in Theorem 1.1. Consider f with a unique integral minimizer which is a random integral point in \(B_\infty (R) \cap \mathbb {Z}^n\) , where \(B_\infty (R)\) is the \(\ell _\infty\) ball with radius R. In this case, one cannot hope to do better than just bisecting the search space for each call to \(\mathsf {SO}\) and this strategy takes \(\Omega (n \log (R))\) calls to \(\mathsf {SO}\) to reduce the size of the search space to a constant.
    Remark 1.4 (A Weaker Assumption).
    As shown in the previous remark, it is impossible in general to find an integral minimizer of f efficiently without assumption (⋆). However, one can still find a minimizer (which is not necessarily integral) of funder the much weaker assumption that f has an integral minimizer, i.e., \(K^* \cap \mathbb {Z}^n \ne \emptyset\) . In this case, one can use the same algorithm as in Theorem 1.1 until \(\mathsf {SO}\) first returns “YES” and simply output the query point. The guarantees in Theorem 1.1 also applies to this case.
    Generalization to the rational polyhedra setting. Theorem 1.1 generalizes to the setting of rational polyhedra, where the set of minimizers \(K^*\) of f is a polyhedron contained in a box of radius R, and all vertices of \(K^*\) are rational vectors with LCM vertex complexity at most \(\varphi \ge 0\) . Roughly speaking, this means that the least common multiple of the denominators in the fractional representation of each vertex is upper bounded by \(2^\varphi\) . We postpone the precise definitions of least common multiple (LCM) vertex complexity and rational polyhedra to Section 2.2.4 (Definitions 2.6 and 2.7). The proof of the following theorem (which also implies Theorem 1.1) will be given in Section 5.
    Theorem 1.2 (Main Result for Rational Polyhedra).
    Given a separation oracle \(\mathsf {SO}\) for a convex function f defined on \(\mathbb {R}^n\) , and a \(\gamma\) -approximation algorithm ApproxSVP for the shortest vector problem which takes \(T_{{\rm\small SVP}}\) arithmetic operations. If the set of minimizers \(K^*\) of f is a rational polyhedron contained in a box of radius R and has LCM vertex complexity at most \(\varphi \ge 0\) , then there is a randomized algorithm that with high probability finds a vertex of \(K^*\) using \(O(n(\varphi + \log (\gamma n R)))\) calls to \(\mathsf {SO}\) and \(\mathsf {poly}(n, \varphi , \log (\gamma R)) \cdot T_{{\rm\small SVP}}\) arithmetic operations.

    1.2 Application to Submodular Function Minimization

    Submodular function minimization (SFM) has been recognized as an important problem in the field of combinatorial optimization. Classical examples of submodular functions include graph cut functions, set coverage function, and utility functions from economics. Since the seminal work by Edmonds in 1970 [15], SFM has served as a popular tool in various fields such as theoretical computer science, operations research, game theory, and machine learning. For a more comprehensive account of the rich history of SFM, we refer interested readers to the excellent surveys [21, 33].
    The formulation of SFM we consider is the standard one: we are given a submodular function f defined over subsets of an n-element ground set. The values of f are integers, and are evaluated by querying an evaluation oracle that takes time \(\mathsf {EO}\) . Since the breakthrough work by Grötschel, Lovász, Schrijver [17, 19] that the ellipsoid method can be used to construct a strongly polynomial algorithm for SFM, there has been a vast literature on obtaining better strongly polynomial algorithms (see Table 1). These include the very first combinatorial strongly polynomial algorithms constructed by Iwata, Fleischer and Fujishige [22] and Schrijver [43]. Very recently, a major improvement was made by Lee, Sidford and Wong [29] using an improved cutting plane method. Their algorithm achieves the state-of-the-art oracle complexity of \(O(n^3 \log ^2(n))\) for strongly polynomial algorithms. A simplified variant of this algorithm achieving the same oracle complexity was given in [13].
    Table 1.
    AuthorsYearOracle ComplexityRemarks
    Grötschel, Lovász, Schrijver [17, 19]1981,88 \(\widetilde{O}(n^5)\) [33]first strongly
    Schrijver [43]2000 \(O(n^8)\) first comb. strongly
    Iwata, Fleischer, Fujishige [22]2000 \(O(n^7 \log (n))\) first comb. strongly
    Fleischer, Iwata [16]2000 \(O(n^7)\)  
    Iwata [20]2002 \(O(n^6 \log (n))\)  
    Vygen [52]2003 \(O(n^7)\)  
    Orlin [40]2007 \(O(n^5)\)  
    Iwata, Orlin [23]2009 \(O(n^5 \log (n))\)  
    Lee, Sidford, Wong [29]2015 \(O(n^3 \log ^2(n))\) current best strongly
    Lee, Sidford, Wong [29]2015 \(O(n^3 \log (n))\) exponential time
    Dadush, Végh, Zambelli [13]2018 \(O(n^3 \log ^2(n))\) current best strongly
    This paper2020 \(O(n^3 \log \log (n)/\log (n))\)  
    This paper2020 \(O(n^2 \log (n))\) exponential time
    Table 1. Strongly Polynomial Algorithms for Submodular Function Minimization
    The oracle complexity measures the number of calls to the evaluation oracle \(\mathsf {EO}\) . In the case where a paper is published in both conference and journal, the year we provide is the earliest one.
    The authors of [29] also noted that \(O(n^3 \log (n))\) oracle calls are information theoretically sufficient for SFM ([29, Theorem 71]), but were unable to give an efficient algorithm achieving such an oracle complexity. They asked as open problems ([29, Section 16.1]):
    (a)
    whether there is a strongly polynomial algorithm achieving \(O(n^3 \log (n))\) oracle complexity;
    (b)
    whether one could further (even information theoretically) remove the extraneous \(\log (n)\) factor from the oracle complexity.
    The significance of these questions stem from their belief that \(\Theta (n^3)\) is the tight oracle complexity for strongly polynomial algorithms for SFM (see [29, Section 16.1] for a more detailed discussion).
    We answer both these open questions affirmatively in the following Theorem 1.3, which follows from applying Corollary 1.2 to the Lovász extension \(\hat{f}\) of the function f, together with the standard fact that a separation oracle for \(\hat{f}\) can be implemented using n calls to the evaluation oracle ([29, Theorem 61]). We provide details on these definitions and the proof of Theorem 1.3 in Section 6.
    Theorem 1.3 (Submodular Function Minimization).
    Given an evaluation oracle \(\mathsf {EO}\) for a submodular function f defined over subsets of an n-element ground set, there exist
    (a)
    a strongly polynomial algorithm that minimizes f using \(O(n^3 \log \log (n)/\log (n))\) calls to \(\mathsf {EO}\) , and
    (b)
    an exponential time algorithm that minimizes f using \(O(n^2 \log (n))\) calls to \(\mathsf {EO}\) .
    To the best of our knowledge, the results in Theorem 1.3 represent the first algorithms that achieve \(o(n^3)\) oracle complexity for SFM, even information theoretically. The first result in Theorem 1.3 breaks the natural \(O(n^3)\) barrier for the oracle complexity of strongly polynomial algorithms. The second result pushes the information theoretic oracle complexity for exact SFM down to nearly quadratic.
    Our algorithm is conceptually simpler than the algorithms given in [13, 29]. Moreover, while most of the previous strongly polynomial algorithms for SFM vastly exploit different combinatorial structures of submodularity, our result is achieved via a very general algorithm and uses the structural properties of submodular functions in a minimal way.

    1.3 Proof Overview

    Without loss of generality, we may assume that f has a unique minimizer \(x^*\) in Theorems 1.1 and 1.2. To justify this statement, suppose the set of minimizers \(K^*\) of f satisfies assumption (⋆). Let \(x^* \in K^*\) be the unique lexicographically minimal minimizer, i.e., every other minimizer \(x \in K^*\) satisfies \(x_i \gt x_i^*\) for the smallest coordinate \(i \in [n]\) in which \(x_i \ne x_i^*\) . Whenever \(\mathsf {SO}\) is queried at a minimizer \(y \in K^*\) and outputs “YES”, our algorithm continues to minimize the linear objective \(e_i^\top x\) , where \(i \in [n]\) is the smallest index such that the ith standard orthonormal basis vector \(e_i\) is not orthogonal to the current working subspace, by pretending that \(\mathsf {SO}\) returns5 the vector \(-e_i\) (until its search set contains a single point). Equivalently, our algorithm minimizes the linear objectives \(e_1^\top x, \ldots , e_n^\top x\) in the given order inside \(K^*\) , and this optimization problem has the unique solution \(x^*\) . We make the assumption that f has a unique minimizer \(x^*\) in the rest of this paper.
    For simplicity, we further assume in the subsequent discussions that \(x^* \in \lbrace 0,1\rbrace ^n\) , i.e., \(R = 1\) in the setting of integral minimizer, which does not change the problem inherently.
    On a high level, our algorithm maintains a convex search set K that contains the integral minimizer \(x^*\) of f, and iteratively shrinks K using the cutting plane method; as the volume of K becomes small enough, our algorithm finds a hyperplane P that contains all the integral points in K and recurse on the lower-dimensional search set \(K \cap P\) . The assumption that \(x^*\) is integral guarantees that \(x^* \in K \cap P\) . This natural idea was previously used in [18, 19] to handle rational polytopes that are not full-dimensional and in [29] to argue that \(O(n^3 \log (n))\) oracle calls is information theoretically sufficient for SFM. The main technical difficulties in efficiently implementing such an idea are two-fold:
    (a)
    we need to find the hyperplane P that contains \(K \cap \mathbb {Z}^n\) ;
    (b)
    we need to carefully control the amount \(\mathsf {vol}(K)\) is shrunk so that progress is not lost.
    The second difficulty is key to achieving a small oracle complexity and deserves some further explanation. To see why shrinking K arbitrarily might result in a loss of progress, it’s instructive to consider the following toy example: suppose an algorithm starts with the unit cube \(K = [0,1]^n\) and \(x^*\) lies on the hyperplane \(K_1 = \lbrace x: x_1 = 0\rbrace\) ; suppose the algorithm obtains, in its ith call to \(\mathsf {SO}\) , the halfspace \(H_i = \lbrace x: x_1 \le 2^{-i}\rbrace\) . After T calls to \(\mathsf {SO}\) , the algorithm obtains the refined search set \(K \cap H_T\) with volume \(2^{-T}\) . However, when the algorithm reduces the dimension and recurses on the hyperplane \(K_1\) , the \((n-1)\) -dimensional volume of the search set again becomes 1, and the progress made by the algorithm in shrinking the volume of K is entirely lost. In contrast, the correct algorithm can reduce the dimension after only one call to \(\mathsf {SO}\) when it’s already clear that \(x^* \in K_1\) .

    1.3.1 The Grötschel-Lovász-Schrijver Approach.

    For the moment, let’s take K to be an ellipsoid. Such an ellipsoid can be obtained by Vaidya’s volumetric center cutting plane method6 [49]. One natural idea to find the hyperplane comes from the following geometric intuition: when the ellipsoid K is “flat” enough in one direction, then all of its integral points lie on a hyperplane P. To find such a hyperplane P, Grötschel, Lovász and Schrijver [18, 19] gave an elegant application of simultaneous Diophantine approximation. We explain the main ideas behind this application in the following. We refer interested readers to [19, Chapter 6] for a more comprehensive presentation of their approach and its implications to finding exact LP solutions.
    For simplicity, we assume K is centered at 0. Let a be the unit vector parallel to the shortest axis of K and \(\mu _{\min }\) be the Euclidean length of the shortest axis of K. Approximating the vector a using the efficient simultaneous Diophantine approximation algorithm by Lenstra, Lenstra and Lovász [30], one obtains an integral vector \(v \in \mathbb {Z}^n\) and a positive integer \(q \in \mathbb {Z}\) such that
    \(\begin{align*} \left\Vert q a - v\right\Vert _\infty \lt 1/3n \qquad \text{and} \qquad 0 \lt q \lt 2^{2n^2} . \end{align*}\)
    This implies that for any integral point \(x \in K \cap \lbrace 0,1\rbrace ^n\) ,
    \(\begin{align*} |v^\top x| \le |qa^\top x| + \frac{1}{3n} \cdot \left\Vert x\right\Vert _1 \le q \cdot \mu _{\min } + 1/3 . \end{align*}\)
    When \(\mu _{\min } \lt 2^{-3n^2}\) , the integral inner product \(v^\top x\) has to be 0 and therefore all integral points in K lie on the hyperplane \(P = \lbrace x: v^\top x = 0\rbrace\) . An efficient algorithm immediately follows: we first run the cutting plane method until the shortest axis of K has length \(\mu _{\min } \approx 2^{-3n^2}\) , then apply the above procedure to find the hyperplane P on which we recurse.
    To analyze the oracle complexity of this algorithm, one naturally uses \(\mathsf {vol}(K)\) as the potential function. An amortized analysis using such a volume potential previously appeared, for example, in [14] for finding maximum support solutions in the linear conic feasibility problem. Roughly speaking, each cutting plane step (corresponding to one oracle call) decreases \(\mathsf {vol}(K)\) by a constant factor; each dimension reduction step increases \(\mathsf {vol}(K)\) by roughly \(1/\mu _{\min } \approx 2^{3n^2}\) . As there are n dimension reduction steps before the problem becomes trivial, the total number of oracle calls is thus \(O(n^3)\) . The exponential time oracle complexity bound of \(O(n^2 \log (n))\) can be obtained similarly by using Dirichlet’s approximation theorem on simultaneous Diophantine approximation (e.g., [6, Section 1.10]) instead.
    One might wonder if the oracle complexity upper bound for their polynomial time algorithm can be improved using a better analysis. However, there is some fundamental issue in getting such an improvement. In particular, the upper bound of \(2^{O(n^2)}\) on q in efficient simultaneous Diophantine approximation corresponds to the \(2^{O(n)}\) -approximation factor of the Shortest Vector Problem in lattices, first obtained by Lenstra, Lenstra and Lovász [30]. Despite forty years of effort, this approximation factor has only been improved slightly to \(2^{n \log \log (n)/\log n}\) for polynomial time algorithms [4].

    1.3.2 Lattices to the Rescue: A Reduction to the Shortest Vector Problem.

    To bypass the previous bottleneck and prove Theorem 1.1, we give a reduction to the Shortest Vector Problem directly. We give a new method to find the hyperplane for dimension reduction based on an approximately shortest vector of certain lattice, and analyze its oracle complexity via a novel potential function that captures simultaneously the volume of the search set K and the density of the lattice. The change in the potential function after dimension reduction is analyzed through a high dimensional slicing lemma. The details for this algorithm and its analysis are given in Sections 4 and 5.
    Finding the hyperplane. We maintain a polytope K (which we assume to be centered at 0 for simplicity) using an efficient implementation of the center of gravity method due to Bertsimas and Vempala [5]. The following sandwiching condition is standard in convex geometry
    \(\begin{align} E(\mathsf {Cov}(K)^{-1}) \subseteq K \subseteq 2n \cdot E(\mathsf {Cov}(K)^{-1}), \end{align}\)
    (1)
    where \(\mathsf {Cov}(K)\) is the covariance matrix of the uniform distribution over K. Sufficiently good approximation to \(\mathsf {Cov}(K)\) can be obtained efficiently by sampling from K [5] so we ignore any computational issue for now.
    To find a hyperplane P that contains all integral points in K, it suffices to preserve all the integral points in the outer ellipsoid \(E = 2n \cdot E(\mathsf {Cov}(K)^{-1})\) on the right-hand side (RHS) of (1). Let \(x \in E \cap \mathbb {Z}^n\) be an arbitrary integral point. For any vector v,
    \(\begin{align} |v^\top x| \le \left\Vert v\right\Vert _{\mathsf {Cov}(K)} \cdot \left\Vert x\right\Vert _{\mathsf {Cov}(K)^{-1}} \le 2n \cdot \left\Vert v\right\Vert _{\mathsf {Cov}(K)}. \end{align}\)
    (2)
    As long as \(\left\Vert v\right\Vert _{\mathsf {Cov}(K)} \lt 1/10n\) and \(v^\top x\) is an integer, we can conclude that \(v^\top x = 0\) and this implies that all integral points in K lie on the hyperplane \(P = \lbrace x: v^\top x = 0\rbrace\) . Note that by (2), such a vector v with small \(\Vert v\Vert _{\mathsf {Cov}(K)}\) essentially controls the ellipsoid width \(\mathsf {width}_E(v) := \max _{x \in E} v^\top x - \min _{x \in E} v^\top x\) .
    One might attempt to guarantee that \(v^\top x\) is integral by choosing v to be an integral vector. However, this idea has a fundamental flaw: as the algorithm reduces the dimension by restricting on a subspace W, the set of integral points on W might become much sparser. As such, one needs \(\mathsf {vol}(K)\) to be very small to guarantee that \(\left\Vert v\right\Vert _{\mathsf {Cov}(K)} \lt 1/10n\) and this results in a very large oracle complexity.
    To avoid this issue, we take \(v = \Pi _W(z) \ne 0\) as the projection of some integral point \(z \in \mathbb {Z}^n\) on W, where W is the subspace on which K lies. Since \(z - v \in W^\bot\) , we have \(v^\top x = z^\top x\) and this guarantees that \(v^\top x\) is integral. For the general case where K is not centered at 0, a simple rounding procedure computes the desired hyperplane. We postpone the details of constructing the hyperplane to Lemma 3.1.
    How do we find a vector \(v \in \Pi _W(\mathbb {Z}^n) \setminus \lbrace 0\rbrace\) that satisfies \(\left\Vert v\right\Vert _{\mathsf {Cov}(K)} \lt 1/10n\) ? This is where lattices come into play. In particular, since \(\Lambda = \Pi _W(\mathbb {Z}^n)\) forms a lattice, we can apply any \(\gamma\) -approximation algorithm for the Shortest Vector Problem. If the shortest non-zero vector in \(\Lambda\) has \(\mathsf {Cov}(K)\) -norm at most \(1/10 \gamma n\) , then we can find a non-zero vector v that satisfies \(\left\Vert v\right\Vert _{\mathsf {Cov}(K)} \lt 1/10n\) .
    The algorithm. This new approach for finding the hyperplane immediately leads to the following algorithm: we run the approximate center of gravity method for one step to decrease the volume of the polytope K by a constant factor; then we run the \(\gamma\) -approximation algorithm for SVP to find a non-zero vector v for dimension reduction. If \(\left\Vert v\right\Vert _{\mathsf {Cov}(K)} \ge 1/10n\) , then we continue to run the cutting plane method; otherwise, we use the above procedure to find a hyperplane P containing all integral points in K, update the polytope K to be \(K \cap P\) and recurse.
    Potential function analysis. To analyze such an algorithm, one might attempt to use \(\mathsf {vol}(K)\) as the potential function as in the Grötschel-Lovász-Schrijver approach. However, one quickly realizes that \(\mathsf {vol}(K \cap P) / \mathsf {vol}(P)\) can be as large as \(\left\Vert v\right\Vert _2 / \left\Vert v\right\Vert _{\mathsf {Cov}(K)}\) . While it’s expectable that \(\left\Vert v\right\Vert _{\mathsf {Cov}(K)}\) is not too small since we are frequently checking for a short lattice vector, one has no control over \(\left\Vert v\right\Vert _2\) in general.
    Key to our analysis is the potential function \(\Phi = \mathsf {vol}(K) \cdot \det (\Lambda)\) that measures simultaneously the volume of K and the covolume \(\det (\Lambda)\) of the lattice \(\Lambda\) . Essentially, this potential function controls the lattice width \(\min _{v \in \Lambda \setminus \lbrace 0\rbrace } \mathsf {width}_E(v)\) of the outer ellipsoid E. In fact, Minkowski’s first theorem (Theorem 2.4) implies that there always exists a vector \(v \in \Lambda \setminus \lbrace 0\rbrace\) such that \(\mathsf {width}_E(v) \le \mathsf {poly}(n) \cdot \Phi ^{1/n}\) , and thus the potential function would never get too small before dimension reduction takes place.
    Continuing with the analysis via the potential function \(\Phi\) , while \(\mathsf {vol}(K)\) increases by \(\left\Vert v\right\Vert _2 / \left\Vert v\right\Vert _{\mathsf {Cov}(K)}\) after the dimension reduction, standard fact on lattice projection (Fact 2.2) shows that the covolume of the lattice decreases by a factor of \(\left\Vert v\right\Vert _2\) . The decrease in the covolume of the lattice thus elegantly cancels out the increase in \(\mathsf {vol}(K)\) , leading to an overall increase in the potential of at most \(1/\left\Vert v\right\Vert _{\mathsf {Cov}(K)} = O(\gamma n)\) . It follows that the total increase in the potential over all n dimension reduction steps is at most \((\gamma n)^n\) . Note that each cutting plane step still decreases the potential function by a constant factor since the lattice is unchanged. Therefore, the total number of oracle calls is at most \(O(n \log (\gamma n))\) .
    High dimensional slicing lemma for consecutive dimension reduction steps. The argument above ignores a slight technical issue: while we can guarantee that \(\left\Vert v\right\Vert _{\mathsf {Cov}(K)} \ge 1/\gamma n\) after cutting plane steps by checking for short non-zero lattice vectors, it’s not clear why \(\left\Vert v\right\Vert _{\mathsf {Cov}(K)}\) cannot be too small after a sequence of dimension reduction steps. It turns out that this can happen only when \(\mathsf {Cov}(K)\) becomes much smaller (e.g., the hyperplane P is far from the centroid of K) after dimension reduction, in which case \(\mathsf {vol}(K)\) as well as the potential also become much smaller.
    To formally analyze the change in the potential function after a sequence of k consecutive dimension reduction steps, we note that the polytope K (which we assume to be isotropic for simplicity) becomes a “slice” \(K \cap W\) and the lattice \(\Lambda\) becomes the projected lattice \(\Pi _W(\Lambda)\) , where W is a subspace. One can show using standard convex geometry tools that \(\mathsf {vol}(K \cap W) / \mathsf {vol}(K)\) is at most \(k^{O(k)}\) , and via Minkowski’s first theorem that \(\det (\Pi _W(\Lambda)) / \det (\Lambda)\) is at most \(\sqrt {k}^k / \lambda _1(\Lambda)^k\) , where \(\lambda _1(\Lambda)\) is the Euclidean length of the shortest non-zero vector in \(\Lambda\) . We leave the details of this high dimensional slicing lemma to Lemma 3.2. Since we know that \(\lambda _1(\Lambda) \ge 1/\gamma n\) in the first dimension reduction step, the potential function increases by a factor of at most \((\gamma n)^{O(k)}\) over a sequence of k consecutive dimension reduction steps. This gives a more precise analysis of the \(O(n \log (\gamma n))\) oracle complexity.

    2 Preliminaries

    2.1 Notations

    We use \(\mathbb {R}_+\) to denote the set of non-negative real numbers. For any positive integer n, we use \([n]\) to denote the set \(\lbrace 1, \ldots , n\rbrace\) . Given a real number \(a \in \mathbb {R}\) , the floor of a, denoted as \(\lfloor a \rfloor\) , is the largest integer that is at most a. Define the closest integer to a, denoted as \(\lceil a \rfloor\) , to be \(\lceil a \rfloor := \lfloor a + 1/2 \rfloor\) . Given an integer \(\varphi \ge 0\) and \(a \in \mathbb {R}\) , we use \(\lceil a \rfloor _\varphi\) to denote the closest rational number to a with denominator at most \(2^\varphi\) . Given integers \(a_1, \ldots , a_m\) which are not all 0, we denote \(\mathsf {gcd}(a_1, \ldots , a_m)\) their greatest common divisor. Given non-zero integers \(a_1, \ldots , a_m\) , we denote \(\mathsf {lcm}(a_1, \ldots , a_m)\) their least common multiple.
    For any \(i \in [n]\) , we denote \(e_i\) the ith standard orthonormal basis vector of \(\mathbb {R}^n\) . We use \(B_p(R)\) to denote the \(\ell _p\) -ball of radius R in \(\mathbb {R}^n\) and \(B_p = B_p(1)\) the unit \(\ell _p\) -ball. For any set of vectors \(V \subseteq \mathbb {R}^n\) , we use \(\mathsf {span}\lbrace V\rbrace\) to denote the linear span of vectors in V. Throughout, a subspace W is a linear subspace of \(\mathbb {R}^n\) with \(0 \in W\) ; an affine subspace W is a translation of a subspace of \(\mathbb {R}^n\) (and thus might not pass through the origin). Given a subspace W, we denote \(W^\bot\) the orthogonal complement of W and \(\Pi _W(\cdot)\) the orthogonal projection onto the subspace W. Given a positive semi-definite (PSD) matrix \(A \in \mathbb {R}^{n \times n}\) and a subspace \(V \subseteq \mathbb {R}^n\) , we say A has full rank on V if \(\mathrm{rank}(A)=\dim (V)\) and the eigenvectors corresponding to non-zero eigenvalues of A form an orthogonal basis of V.
    Given a subspace \(V \subseteq \mathbb {R}^n\) and a PSD matrix \(A \in \mathbb {R}^{n \times n}\) that has full rank on V, the function \(\langle \cdot , \cdot \rangle _A\) given by \(\langle x, y \rangle _A = x^\top A y\) defines an inner product on V. The inner product \(\langle \cdot , \cdot \rangle _A\) induces a norm on V, i.e., \(\left\Vert x\right\Vert _A = \sqrt {\langle x, x \rangle _A}\) for any \(x \in V\) , which we call the A-norm. Given a point \(x_0 \in \mathbb {R}^n\) and a PSD matrix \(A \in \mathbb {R}^{n \times n}\) , we use \(E(x_0, A)\) to denote the (might not be full-rank) ellipsoid given by \(E(x_0, A) := \lbrace x \in x_0 + W_A: (x - x_0)^\top A (x - x_0) \le 1\rbrace\) , where \(W_A\) is the subspace spanned by eigenvectors corresponding to non-zero eigenvalues of A. When the ellipsoid is centered at 0, we use the short-hand notation \(E(A)\) to denote \(E(0,A)\) .

    2.2 Lattices

    Given a set of linearly independent vectors \(b_1, \ldots , b_k \in \mathbb {R}^n\) , denote \(\Lambda (b_1, \ldots , b_k) = \lbrace \sum _{i=1}^k \lambda _i b_i, \lambda _i \in \mathbb {Z}\rbrace\) the lattice generated by \(b_1, \ldots , b_k\) . Here, k is called the rank of the lattice. A lattice is said to have full-rank if \(k = n\) . Any set of k linearly independent vectors that generates the lattice \(\Lambda = \Lambda (b_1, \ldots , b_k)\) under integer linear combinations is called a basis of \(\Lambda\) . In particular, the set \(\lbrace b_1, \ldots , b_k\rbrace\) is a basis of \(\Lambda\) . Different basis of a full-rank lattice are related by unimodular matrices, which are integer matrices with determinant \(\pm 1\) .
    Given a basis \(B \in \mathbb {R}^{n \times k}\) , the fundamental parallelepiped of \(\Lambda = \Lambda (B)\) is the polytope \(\mathcal {P}(B):=\lbrace \sum _{i=1}^k \lambda _i b_i: \lambda _i \in [0,1), \forall i \in [k]\rbrace\) . The determinant of the lattice (also known as the covolume), denoted as \(\det (\Lambda)\) , is defined to be the volume of the fundamental parallelepiped, which is independent of the basis. We also define the notion of dual lattices below.
    Definition 2.1 (Dual Lattice).
    Given a lattice \(\Lambda \subseteq \mathbb {R}^n\) , the dual lattice \(\Lambda ^*\) is the set of all vectors \(x \in \mathsf {span}\lbrace \Lambda \rbrace\) such that \(\langle x, y \rangle \in \mathbb {Z}\) for all \(y \in \Lambda\) .
    We refer interested readers to standard textbooks (e.g., [42]) for a more comprehensive introduction to lattice theory.

    2.2.1 Lattice Projection and Intersection with Subspaces.

    The following standard facts on lattice projection follow from Gram-Schmidt orthogonalization.
    Fact 2.2 (Lattice Projection).
    Let \(\Lambda\) be a full-rank lattice in \(\mathbb {R}^n\) and W be a linear subspace such that \(\dim (\mathsf {span}\lbrace \Lambda \cap W \rbrace) = \dim (W)\) . Then we have
    \(\begin{align*} \det (\Lambda) = \det (\Lambda \cap W) \cdot \det (\Pi _{W^\bot }(\Lambda)) . \end{align*}\)
    Fact 2.3 (Dual of Lattice Projection).
    Let \(\Lambda\) be a full-rank lattice in \(\mathbb {R}^n\) and W be a linear subspace such that \(\dim (\mathsf {span}\lbrace \Lambda \cap W \rbrace) = \dim (W)\) . Then we have the following duality
    \(\begin{align*} (\Pi _W (\Lambda))^* = \Lambda ^* \cap W. \end{align*}\)

    2.2.2 Minkowski’s First Theorem.

    Minkowski’s first theorem [36] asserts the existence of a non-zero lattice point in a symmetric convex set with large enough volume. An important consequence of it is the following upper bound on \(\lambda _1(\Lambda , A)\) , the length of the shortest non-zero vector in lattice \(\Lambda\) under A-norm.
    Theorem 2.4 (Consequence of Minkowski’s First Theorem, [36]).
    Let \(\Lambda\) be a full-rank lattice in \(\mathbb {R}^n\) and \(A \in \mathbb {R}^{n \times n}\) be a positive definite matrix. Then
    \(\begin{align*} \lambda _1(\Lambda , A) \le \sqrt {n} \cdot \det (A^{1/2})^{1/n} \cdot \det (\Lambda)^{1/n} . \end{align*}\)

    2.2.3 The Shortest Vector Problem and the Lenstra-Lenstra-Lovász Algorithm.

    Given a lattice \(\Lambda\) and a PSD matrix A that has full rank on \(\mathsf {span}\lbrace \Lambda \rbrace\) , the Shortest Vector Problem (SVP) asks to find a shortest non-zero vector in \(\Lambda\) under A-norm,7 whose length is denoted as \(\lambda _1(\Lambda , A)\) . SVP is one of the most fundamental computational problems in lattice theory and is known to be NP-hard. For this problem, the celebrated Lenstra-Lenstra-Lovász (LLL) algorithm [30] finds in polynomial time a \(2^{n/2}\) -approximation to \(\lambda _1(\Lambda ,A)\) . Building on top of a block-reduction algorithm by Schnorr [41], Ajtai, Kumar and Sivakumar [4] obtained the current best polynomial time approximation factor of \(2^{n \log \log (n)/\log (n)}\) for SVP.
    Theorem 2.5 ([4]).
    Given a basis \(b_1, \ldots , b_n \in \mathbb {Z}^n\) for lattice \(\Lambda\) and a positive definite matrix \(A \in \mathbb {Z}^{n \times n}\) . Let \(D \in \mathbb {Z}\) be such that \(\left\Vert b_i\right\Vert _A^2 \le D\) for any \(i \in [n]\) . Then there exists an algorithm that outputs in \(\mathsf {poly}(n, \log (D))\) arithmetic operations a vector \(b_1^{\prime }\) such that
    \(\begin{align*} \left\Vert b_1^{\prime }\right\Vert _A \le 2^{n \log \log (n)/\log (n)} \cdot \lambda _1(\Lambda ,A) . \end{align*}\)
    Moreover, the integers occurring in the algorithm have bit sizes at most \(\mathsf {poly}(n, \log (D))\) .
    In fact, for any integer \(r \gt 1\) , [4] gave a \(2^{O(r)}\mathsf {poly}(n)\) -time \(r^{O(n/r)}\) -approximation algorithm for SVP, allowing a smooth tradeoff between time and approximation quality.
    For solving SVP exactly, the state-of-the-art is a deterministic \(\widetilde{O}(2^{2n})\) -time and \(\widetilde{O}(2^n)\) -space algorithm given by Micciancio and Voulgaris [35], and a randomized \(2^{n+o(n)}\) -time and space algorithm due to Aggarwal et al. [3]. We refer to these excellent papers and the references therein for a comprehensive account of the rich history of SVP.

    2.2.4 Rational Polyhedra.

    We start with the definition of the LCM vertex complexity of a rational vector.
    Definition 2.6 (LCM Vertex Complexity).
    Given a rational vector \(a = (p_1/q_1, \ldots , p_n/q_n)\) , where integers \(p_i\) and \(q_i \ge 1\) are coprime for all \(i \in [n]\) , we define its LCM vertex complexity to be the smallest integer \(\varphi \ge 0\) such that the 1-dimensional lattice \(L_a := \lbrace a^\top z: z \in \mathbb {Z}^n\rbrace\) is a sub-lattice of \(\mathbb {Z}/q\) for some positive integer \(q \le 2^{\varphi }\) .
    In particular, the number q above is \(\mathsf {lcm}(q_1, \ldots , q_n)\) . When \(\mathsf {gcd}(p_1, \ldots , p_n) = 1\) , by Bézout’s identity, we in fact have that \(L_a = \mathbb {Z}/q\) . We next formally define the notion of rational polyhedra with bounded LCM vertex complexity.
    Definition 2.7 (Rational Polyhedra with Bounded LCM Vertex Complexity).
    A bounded convex set \(K \subseteq \mathbb {R}^n\) is a rational polyhedron with LCM vertex complexity at most \(\varphi \ge 0\) if K is a polyhedron and the LCM vertex complexity of every vertex of K is at most \(\varphi\) .
    For convenience, we define the set of all rational vectors with bounded LCM vertex complexity.
    Definition 2.8 (Rational Vectors with Bounded LCM Vertex Complexity).
    For any integer \(\varphi \ge 0\) , we define \(S_\varphi ^n\) the set of all rational vectors in \(\mathbb {R}^n\) with LCM vertex complexity at most \(\varphi\) .
    Remark 2.9 (Different Definitions).
    We remark that our definition of LCM vertex complexity in Definition 2.6 is different from the standard definition of vertex complexity in the literature used by Grötschel, Lovász and Schrijver [19], who defined the vertex complexity of a rational vector a to be its binary description length, i.e., bit complexity. The LCM vertex complexity of a rational vector as in Definition 2.6 is always smaller than its bit complexity, and in fact might be much smaller. The reason we deviate from Grötschel, Lovász and Schrijver’s more standard notion of vertex complexity is that Definition 2.6 allows a slightly cleaner presentation of the results and proofs in this paper. In particular, one can obtain the results and proofs in the setting of integral minimizers by taking \(\varphi = 0\) .

    2.3 Convex Geometry

    A function \(g: \mathbb {R}^n \rightarrow \mathbb {R}_+\) is log-concave if its support \(\mathsf {supp}(g)\) is convex and \(\log (g)\) is concave on \(\mathsf {supp}(g)\) . An integrable function \(g: \mathbb {R}^n \rightarrow \mathbb {R}_+\) is a density function, if \(\int _{\mathbb {R}^n} g(x) dx = 1\) . The centroid of a density function \(g: \mathbb {R}^n \rightarrow \mathbb {R}_+\) is defined as \(\mathsf {cg}(g) = \int _{\mathbb {R}^n} g(x) x dx\) ; the covariance matrix of the density function g is defined as \(\mathsf {Cov}(g) = \int _{\mathbb {R}^n} g(x) (x - \mathsf {cg}(g)) (x - \mathsf {cg}(g))^\top d x\) . A density function \(g: \mathbb {R}^n \rightarrow \mathbb {R}_+\) is isotropic, if its centroid is 0 and its covariance matrix is the identity matrix, i.e., \(\mathsf {cg}(g) = 0\) and \(\mathsf {Cov}(g) = I\) .
    A typical example of a log-concave distribution is the uniform distribution over a convex body \(K \subseteq \mathbb {R}^n\) . Given a convex body K in \(\mathbb {R}^n\) , its volume is denoted as \(\mathsf {vol}(K)\) . The centroid (resp. covariance matrix) of K, denoted as \(\mathsf {cg}(K)\) (resp. \(\mathsf {Cov}(K)\) ), is defined to be the centroid (resp. covariance matrix) of the uniform distribution over K. A convex body K is said to be isotropic if the uniform density over it is isotropic. Any convex body can be put into its isotropic position via an affine transformation.
    Sometimes we will be working with a bounded convex set \(K \subseteq W\) , where W is an affine subspace that might not be full dimensional. For convenience, we extend the definitions above to this case by first applying a linear transformation and then restricting to W so that K becomes full-dimensional.
    Theorem 2.10 (Brunn’s Principle).
    Let K be a convex body and W be a subspace in \(\mathbb {R}^n\) . Then the function \(g_{K,W}: W^\bot \rightarrow \mathbb {R}_+\) defined as \(g_{K,W}(x) := \mathsf {vol}(K \cap (W + x))\) is log-concave on its support.
    Theorem 2.11 (Property of Log-concave Density, Theorem 5.14 of [32]).
    Let \(f: \mathbb {R}^n \rightarrow \mathbb {R}_+\) be an isotropic log-concave density function. Then we have \(f(x) \le 2^{8n} n^{n/2}\) for every x.
    We also need the following result from [25].
    Theorem 2.12 (Ellipsoidal Approximation of Convex Body, [25]).
    Let K be an isotropic convex body in \(\mathbb {R}^n\) . Then,
    \(\begin{align*} \sqrt {\frac{n+1}{n}} \cdot B_2 \subseteq K \subseteq \sqrt {n(n+1)} \cdot B_2 , \end{align*}\)
    where \(B_2\) is the unit Euclidean ball in \(\mathbb {R}^n\) .
    The following lemma is an immediate consequence of Theorem 2.12.
    Lemma 2.13 (Stability of Covariance).
    Let K be a convex body in \(\mathbb {R}^n\) and \(x \in K\) satisfies \(\left\Vert x - \mathsf {cg}(K)\right\Vert _{\mathsf {Cov}(K)^{-1}} \le 0.1\) . Let H be a halfspace such that \(x \in H\) , then we have
    \(\begin{align*} \frac{1}{5 n^2} \cdot \mathsf {Cov}(K) \preceq \mathsf {Cov}(K \cap H) \preceq n^2 \cdot \mathsf {Cov}(K) . \end{align*}\)
    Proof.
    Without loss of generality, we may assume that K is in isotropic position, in which case the condition that \(\left\Vert x - \mathsf {cg}(K)\right\Vert _{\mathsf {Cov}(K)^{-1}} \le 0.1\) becomes \(\left\Vert x\right\Vert _2 \le 0.1\) . Theorem 2.12 then gives
    \(\begin{align*} \sqrt {\frac{n+1}{n}} \cdot B_2 \subseteq K \subseteq \sqrt {n (n+1)} \cdot B_2 . \end{align*}\)
    Let halfspace \(H_1\) be the translation of halfspace H such that x lies on its boundary hyperplane \(H_1^{\prime }\) . Note that \(K \cap H_1 \subseteq K \cap H\) . Let \(x^{\prime } := \Pi _{H_1^{\prime }}(\mathsf {cg}(K))\) be the orthogonal projection of \(\mathsf {cg}(K) = 0\) onto the hyperplane \(H_1^{\prime }\) . Then,
    \(\begin{align*} \Vert x^{\prime }\Vert _2 \le \Vert x- 0\Vert _2 \le 0.1 . \end{align*}\)
    This shows that the hyperplane \(H_1^{\prime }\) is at Euclidean distance at most 0.1 from 0. It then follows that \(\sqrt {\frac{n+1}{n}} B_2 \cap H_1\) contains a ball of radius at least
    \(\begin{align*} \frac{1}{2} \cdot \left(\sqrt {\frac{n+1}{n}} - 0.1 \right) \ge 0.45 \sqrt {\frac{n+1}{n}} \ge \sqrt {\frac{n+1}{5n}} , \end{align*}\)
    where the last inequality uses \(\sqrt {5} \times 0.45 \ge 1\) . Since we have \(\sqrt {\frac{n+1}{n}} B_2 \cap H_1 \subseteq K \cap H_1 \subseteq K \cap H\) , this implies that \(K \cap H\) contains a ball of radius \(\sqrt {\frac{n+1}{5n}}\) , and is contained in a ball of radius \(\sqrt {n(n+1)}\) . Consider the ellipsoid \(E_{K \cap H} = \lbrace y: y^\top \mathsf {Cov}(K \cap H)^{-1} y \le 1 \rbrace\) . Then Theorem 2.12 implies that
    \(\begin{align*} \mathsf {cg}(K \cap H) + \sqrt {\frac{n+1}{n}} \cdot E_{K \cap H} \subseteq K \cap H \subseteq \mathsf {cg}(K \cap H) + \sqrt {n(n+1)} \cdot E_{K \cap H} . \end{align*}\)
    We thus have \(\frac{1}{\sqrt {5} n} \cdot B_2 \subseteq E_{K \cap H} \subseteq n \cdot B_2\) , and the statement of the lemma follows immediately.□
    We note that some of these convex geometry tools have previously been used, for example, to find the densest sub-lattice in arbitary norm [12].

    2.4 Cutting Plane Methods

    Cutting plane methods optimize a convex function f by maintaining a convex set K that contains the minimizer of f, which gets refined iteratively using the separating hyperplanes returned by the separation oracle. One of the most classical cutting plane methods is the center of gravity method, discovered independently by Levin [31] and Newman [38].
    Theorem 2.14 (Center of Gravity Method [31, 38]).
    Given a separation oracle \(\mathsf {SO}\) for a convex function f defined on \(\mathbb {R}^n\) with minimizers \(K^*\) , and a convex body \(K \subseteq \mathbb {R}^n\) containing \(K^*\) . If \(\mathsf {cg}(K)\) doesn’t minimize f, then the convex body \(K^{\prime }\) returned by \(\text{CenterOfGravity}(\mathsf {SO}, K)\) above contains \(K^*\) and satisfies \(\mathsf {vol}(K^{\prime }) \le (1-1/e) \cdot \mathsf {vol}(K)\) .
    The center of gravity method is not efficient as it involves computing the centroid of convex bodies. Using sampling techniques to estimate \(\mathsf {cg}(K)\) and \(\mathsf {Cov}(K)\) , an efficient implementation of the center of gravity method was given in [5]. We start with the definition of \(\epsilon\) -approximate centroid and covariance.
    Definition 2.15 (Ε-approximate Centroid and Covariance)
    Let \(0 \lt \epsilon \lt 1\) be a parameter. Given a convex body \(K \subseteq \mathbb {R}^n\) , we call \(x_K \in \mathbb {R}^n\) an \(\epsilon\) -approximate centroid of K if \(\Vert x_K - \mathsf {cg}(K)\Vert _{\mathsf {Cov}(K)^{-1}} \le \epsilon\) . We call PSD matrix \(\Sigma _K \in \mathbb {R}^{n \times n}\) an \(\epsilon\) -approximate covariance matrix if \((1-\epsilon) \cdot \mathsf {Cov}(K) \preceq \Sigma _K \preceq (1 + \epsilon) \cdot \mathsf {Cov}(K)\) .
    Constructing \(\epsilon\) -approximate centroids and covariance matrices via sampling for well-rounded convex bodies appeared in the works of [1, 26, 47]. The formulation of the following theorem is from [24, Lemma 2.5 and Theorem 2.7] together with the standard fact that the uniform distribution over a convex body is log-concave.
    Theorem 2.16 (Approximate Centroid and Covariance by Sampling, [1, 26, 47]).
    Let parameters \(0\lt \epsilon \lt 1\) and \(0 \lt \delta \lt 1/2\) . Given a convex body \(K \subseteq \mathbb {R}^n\) specified by m constraints, a point \(x \in K\) and a PSD matrix \(A \in \mathbb {R}^{n \times n}\) such that the following sandwiching condition holds
    \(\begin{align} x + E(A) \subseteq K \subseteq x + 2^{\mathsf {poly}(n)} \cdot E(A) , \end{align}\)
    (3)
    then there is a randomized algorithm that uses \(m \cdot \mathsf {poly}(n, 1/\epsilon , \log (1/\delta))\) arithmetic operations to compute, with probability at least \(1-\delta\) , an \(\epsilon\) -approximate centroid \(x_K\) and an \(\epsilon\) -approximate covariance matrix \(\Sigma _K\) of K.
    Since approximate centroid and covariance matrix of a convex body give a sandwiching condition as in (3), [5] obtained the following efficient implementation of the center of gravity method. The theorem below comes from directly using Theorem 2.16 in the algorithmic framework of [5].
    Theorem 2.17 (Approximate Center of Gravity Method, [5]).
    Let parameters \(0\lt \epsilon \lt 0.01\) and \(0\! \lt \!\delta \! \lt \! 1/2\) . Given a separation oracle \(\mathsf {SO}\) for a convex function f defined on \(\mathbb {R}^n\) with mini- mizers \(K^*\) , a polytope K with m constraints containing \(K^*\) , an \(\epsilon\) -approximate centroid \(x_K \notin K^*\) and an \(\epsilon\) -approximate covariance matrix \(\Sigma _K\) of K, there exists a randomized algorithm \(\textsf {RandomWalkCG}(\mathsf {SO}, K, x_K, \Sigma _K, \epsilon , \delta)\) that makes one call to \(\mathsf {SO}\) and an extra \(m \cdot \mathsf {poly}(n, 1/\epsilon , \log (1/\delta))\) arithmetic operations to return a polytope \(K^{\prime }\) , a point \(x_{K^{\prime }} \in K^{\prime }\) and a PSD matrix \(\Sigma _{K^{\prime }}\) such that the following hold with probability at least \(1-\delta\) :
    (a)
    \(K^* \subseteq K^{\prime }\) and \(K^{\prime }\) is the intersection of K with a constraint output by \(\mathsf {SO}\) at \(x_K\) ,
    (b)
    \(\mathsf {vol}(K^{\prime }) \le \frac{2}{3} \cdot \mathsf {vol}(K)\) ,
    (c)
    \(x_{K^{\prime }}\) is an \(\epsilon\) -approximate centroid of \(K^{\prime }\) , and
    (d)
    \(\Sigma _{K^{\prime }}\) is an \(\epsilon\) -approximate covariance matrix of \(K^{\prime }\) .

    3 Technical Lemmas

    In this section, we prove a few technical lemmas which are key to our result.

    3.1 Dimension Reduction that Preserves Low-Complexity Rational Points

    Recall from Definition 2.8 that \(S_\varphi ^n\) is the set of rational vectors with LCM vertex complexity at most \(\varphi \ge 0\) .
    Lemma 3.1 (Dimension Reduction That Preserves Low-complexity Rational Points).
    Given an affine subspace \(W = x_0 + W_0\) , where \(W_0\) is a linear subspace of \(\mathbb {R}^n\) and \(x_0 \in \mathbb {R}^n\) is a fixed point, and an ellipsoid \(E = E(x_0, A)\) that has full rank on W. Given a vector \(v \in \Pi _{W_0}(\mathbb {Z}^n) \setminus \lbrace 0\rbrace\) with \(\left\Vert v\right\Vert _{A^{-1}} \lt 1/2^{2 \varphi + 1}\) , where \(\varphi \ge 0\) is an integer, then there exists a hyperplane \(P \nsupseteq W\) such that \(E \cap S_{\varphi }^n \subseteq P \cap W\) . In particular, let \(z \in \mathbb {Z}^n\) be such that \(v = \Pi _{W_0}(z)\) , then P can be taken as
    \(\begin{align*} P = \lbrace {x: v^\top x = (v - z)^\top x_0 + \lceil z^\top x_0 \rfloor }_\varphi \rbrace . \end{align*}\)
    Proof.
    Clearly we have \(E \cap S_\varphi ^n \subseteq W\) since \(E \subseteq W\) . It therefore suffices to show that the hyperplane P given in the lemma statement satisfies \(P \nsupseteq W\) and \(E \cap S_\varphi ^n \subseteq P\) .
    Since \(v \in W_0 \setminus \lbrace 0\rbrace\) and \(W_0\) is a translation of W, we have \(P \nsupseteq W\) . If \(E \cap S_\varphi ^n = \emptyset\) , then the lemma statement trivially holds. We may therefore assume \(E \cap S_\varphi ^n \ne \emptyset\) in the following. Then for any rational vectors \(x_1,x_2 \in E \cap S_\varphi ^n\) , we have
    \(\begin{align*} |v^\top (x_1 - x_2)| & \le \left\Vert v\right\Vert _{A^{-1}} \cdot \left\Vert x_1 - x_2\right\Vert _A \\ & \lt \frac{1}{2^{2 \varphi + 1}} \cdot (\left\Vert x_1 - x_0\right\Vert _A + \left\Vert x_2 - x_0\right\Vert _A) \le \frac{1}{2^{2 \varphi }} . \end{align*}\)
    Since \(x_1,x_2 \in W \cap S_\varphi ^n\) , we have \(x_1 - x_2 \in W_0 \cap S_{2 \varphi }\) . As \(v = \Pi _{W_0}(z)\) where \(z \in \mathbb {Z}^n\) , we have
    \(\begin{align*} v^\top (x_1 - x_2) = z^\top (x_1 - x_2) \in \mathbb {Z}/q , \end{align*}\)
    for some positive integer \(q \le 2^{2 \varphi }\) . It then follows that \(v^\top x_1 = v^\top x_2\) . Finally, we note that for any rational vector \(x_1 \in E \cap S_\varphi ^n\) , we have
    \(\begin{align*} |z^\top (x_1 - x_0)| = | v^\top (x_1 - x_0) | \le \left\Vert v\right\Vert _{A^{-1}} \cdot \left\Vert x_1 - x_0\right\Vert _A \lt \frac{1}{2^{2 \varphi + 1}} . \end{align*}\)
    Since \(z^\top x_1 \in \mathbb {Z}/q^{\prime }\) for some \(q^{\prime } \le 2^{\varphi }\) , we have \({z^\top x_1 = \lceil z^\top x_0 \rfloor }_\varphi\) . Therefore, we have
    \(\begin{align*} {v^\top x_1 = \lceil z^\top x_0 \rfloor }_{\varphi } + (v - z)^\top x_1 = {\lceil z^\top x_0 \rfloor }_{\varphi } + (v - z)^\top x_0 , \end{align*}\)
    where the last equality is because \(v - z \in W_0^\bot\) and \(x_1 - x_0 \in W_0\) . This finishes the proof of the lemma.□
    We remark here that the rounding \(\lceil \cdot \rfloor _\varphi\) in the construction of the hyperplane P can be efficiently computed using the continued fraction method (e.g., [42, Corollary 6.3a].

    3.2 High Dimensional Slicing Lemma

    Lemma 3.2 (High Dimensional Slicing Lemma).
    Let K be a convex body and L be a full-rank lattice in \(\mathbb {R}^n\) . Let W be an \((n-k)\) -dimensional linear subspace of \(\mathbb {R}^n\) such that \(\dim (L \cap W) = n-k\) . Then we have
    \(\begin{align*} \frac{\mathsf {vol}(K \cap W)}{\det (L \cap W)} \le \frac{\mathsf {vol}(K)}{\det (L)} \cdot \frac{k^{O(k)}}{\lambda _1(L^*, K)^k} , \end{align*}\)
    where \(L^*\) is the dual lattice, and \(\lambda _1(L^*,K)\) is the shortest non-zero vector in \(L^*\) under the norm \(\left\Vert \cdot \right\Vert _{\mathsf {Cov}(K)}\) .
    Proof.
    Note that \(\mathsf {vol}(K \cap W) / \det (L \cap W)\) , \(\mathsf {vol}(K)/\det (L)\) , and \(\lambda _1(L^*, K)\) are preserved when applying the same linear transformation to K and L simultaneously. We can therefore rescale K and L such that \(\mathsf {Cov}(K) = I\) . We may further assume that \(K \cap W \ne \emptyset\) as otherwise \(\mathsf {vol}(K \cap W) = 0\) and the statement trivially holds.
    We first upper bound \(\mathsf {vol}(K \cap W)\) in terms of \(\mathsf {vol}(K)\) . To this end, we apply a translation on K to obtain \(K_0\) such that \(\mathsf {cg}(K_0) = 0\) , i.e., \(K_0\) is in isotropic position, and it suffices to upper bound the cross-sectional volume \(\mathsf {vol}(K_0 \cap (W + x))\) for an arbitrary \(x \in W^\bot\) . By identifying \(W^\bot\) with \(\mathbb {R}^k\) , we note that the function \(f(x)\) defined as \(f(x) := \mathsf {vol}(K_0 \cap (W + x)) / \mathsf {vol}(K_0)\) is a log-concave density function on \(\mathbb {R}^k\) by Brunn’s principle (Theorem 2.10). Furthermore, \(f(x)\) is isotropic since \(K_0\) is in isotropic position. It thus follows from Theorem 2.11 that \(f(x) \le k^{O(k)}\) , for any \(x \in \mathbb {R}^k\) . Note that \(K = K_0 + \mathsf {cg}(K)\) , we obtain from taking \(x = -\mathsf {cg}(K)\) that
    \(\begin{align} \frac{\mathsf {vol}(K \cap W)}{\mathsf {vol}(K)} \le k^{O(k)}. \end{align}\)
    (4)
    We next upper bound \(\det (L)\) in terms of \(\det (L \cap W)\) . Note that
    \(\begin{align} \det (L) = \det (L \cap W) \cdot \det (\Pi _{W^\bot } (L)) = \frac{\det (L \cap W)}{\det (L^* \cap W^\bot)} , \end{align}\)
    (5)
    where the first equality follows from Fact 2.2, and the second equality is due to Fact 2.3. By Minkowski’s first theorem (Theorem 2.4), we have
    \(\begin{align*} \lambda _1(L^*) \le \lambda _1(L^* \cap W^\bot) \le \sqrt {k} \cdot (\det (L^* \cap W^\bot))^{1/k}. \end{align*}\)
    Combine this with the earlier equation (5) gives
    \(\begin{align} \det (L) \le \frac{\det (L \cap W) \cdot \sqrt {k}^k}{\lambda _1(L^*)^k} \end{align}\)
    (6)
    It then follows from (4) and (6) that
    \(\begin{align*} \frac{\mathsf {vol}(K \cap W)}{\mathsf {vol}(K)} \cdot \frac{\det (L)}{\det (L \cap W)} \le \frac{k^{O(k)}}{\lambda _1(L^*)^k} . \end{align*}\)
    This finishes the proof of the lemma.□

    4 Meta Algorithm

    In this section, we present a simple meta algorithm (Algorithm 2) that achieves the oracle complexity in Theorem 1.2. While this meta algorithm requires computing the centroids and covariance matrices of polytopes and is therefore not efficient, its oracle complexity analysis contains most of the key insights of this paper. We give an efficient (but more complicated) implementation of this meta algorithm and prove Theorem 1.2 in Section 5.
    Theorem 4.1 (Oracle Complexity in Theorem 1.2).
    Given a separation oracle \(\mathsf {SO}\) for a convex function f defined on \(\mathbb {R}^n\) , and a \(\gamma\) -approximation algorithm ApproxSVP for the shortest vector problem. If the set of minimizers \(K^*\) of f is a rational polyhedron contained in a box of radius R and has LCM vertex complexity at most \(\varphi \ge 0\) , then there is a randomized algorithm that with high probability finds a vertex of \(K^*\) using \(O(n(\varphi + \log (\gamma n R)))\) calls to \(\mathsf {SO}\) .

    4.1 The Meta Algorithm

    By the argument in the beginning of Section 1.3, we may assume without loss of generality that f has a unique minimizer \(x^* \in S_\varphi ^n\) . We therefore describe our algorithm under this assumption.
    Our meta algorithm maintains an affine subspace W, a polytope \(K \subseteq W\) containing the rational minimizer \(x^*\) of f, and a lattice \(\Lambda\) . It also maintains the centroid \(x_K\) and covariance matrix \(\Sigma _K\) of the polytope K. In the beginning, the affine subspace \(W = \mathbb {R}^n\) , polytope \(K = B_\infty (R)\) and lattice \(\Lambda = \mathbb {Z}^n\) . In each iteration of the algorithm (i.e., each while loop), the algorithm uses the \(\gamma\) -approximation algorithm ApproxSVP to find a short non-zero vector \(v \in \Lambda\) under \(\Sigma _K\) -norm. If the vector v satisfies \(\left\Vert v\right\Vert _{\Sigma _K} \ge \frac{1}{10n2^{2 \varphi }}\) , then the algorithm runs the center of gravity method (Theorem 2.14) for one more step, and updates \(x_K\) and \(\Sigma _K\) to be the centroid and covariance matrix of the new polytope K. We remark that the criterion for performing the cutting plane step comes from the convex geometry fact that \(K \subseteq x_K + 2n \cdot E(\Sigma _K^{-1})\) (Theorem 2.12).
    If, on the other hand, that \(\left\Vert v\right\Vert _{\Sigma _K} \lt \frac{1}{10n2^{2 \varphi }}\) , then the algorithm uses Lemma 3.1 to find a hyperplane P that contains \(K \cap S_\varphi ^n\) , where we recall from Definition 2.8 that \(S_\varphi ^n\) is the set of all rational vectors in \(\mathbb {R}^n\) with LCM vertex complexity at most \(\varphi\) . Specifically, the hyperplane \(P = \lbrace x: v^\top x = (v - z)^\top x_K + {\lceil z^\top x_K \rfloor }_\varphi \rbrace\) for some integral vector \(z \in \mathbb {Z}^n\) such that \(v = \Pi _{W_0}(z)\) and \(W_0 = -x_K + W\) is the translation of W that passes through the origin. One may find such a vector \(z \in \mathbb {Z}^n\) efficiently by solving the closest vector problem \(\min _{z \in \mathbb {Z}^n} \left\Vert z - v\right\Vert _{P_{W_0}}\) , where \(P_{W_0}\) is the projection matrix onto the subspace \(W_0\) . As mentioned earlier, the rounding \(\lceil \cdot \rfloor _\varphi\) can also be performed efficiently using the continued fraction method. After constructing the hyperplane P, the algorithm then recurses on the lower-dimensional affine subspace \(W \cap P\) , updates K to be \(K \cap P\) , and updates \(x_K\) and \(\Sigma _K\) to be the centroid and covariance matrix of the new polytope \(K \cap P\) . The algorithm obtains a new lattice with rank reduced by one by projecting the current lattice \(\Lambda\) onto \(P_0\) , a translation of P that passes through the origin.
    The above procedure stops when \(\dim (W) = 0\) , in which case K contains a unique rational point \(x^*\) which will be the output of the algorithm. Note that when \(\dim (W) = 1\) , the algorithm reduces to a binary search on the segment \(K \subseteq W\) . A formal description of the algorithm is given in Algorithm 2.
    We remark that Algorithm 2 is not efficient since it requires the computation of the centroid and covariance matrix in Line 8 and 13. Line 8 can easily be made efficient using the approximate center of gravity method as in Theorem 2.17. However, it is not clear how to efficiently implement Line 13 since we do not know an ellipsoid satisfying condition (3) in Theorem 2.16, and thus approximate centroid and covariance matrix might not be efficiently computable by sampling. We address this computational issue in the next section.

    4.2 Oracle Complexity Analysis

    We start by proving the correctness of Algorithm 2.
    Lemma 4.2 (Correctness of.
    MetaALG ) Assuming the conditions in Theorem 4.1 and that f has a unique minimizer \(x^* \in S_\varphi ^n\) , Algorithm 2 finds \(x^*\) .
    Proof.
    Note that in the beginning of each iteration, we have \(K \subseteq W\) and \(\Lambda \subseteq W_0\) , where \(W_0\) is the translation of W that passes through the origin. We first argue that the lattice \(\Lambda\) is in fact the orthogonal projection of \(\mathbb {Z}^n\) onto the subspace \(W_0\) , i.e., \(\Lambda = \Pi _{W_0}(\mathbb {Z}^n)\) . This is required for Lemma 3.1 to be applicable. Clearly \(\Lambda = \Pi _{W_0}(Z)\) holds in the beginning of the algorithm since \(\Lambda = \mathbb {Z}^n\) and \(W = \mathbb {R}^n\) . Notice that the CenterOfGravity procedure in Line 7 keeps \(\Lambda\) and W the same. Each time we reduce the dimension in Line 11–15, we have
    \(\begin{align*} \Pi _{W_0 \cap P_0}(\mathbb {Z}^n) = \Pi _{W_0 \cap P_0}(\Pi _{W_0}(\mathbb {Z}^n)) = \Pi _{W_0 \cap P_0}(\Lambda) , \end{align*}\)
    where the first equality follows because \(W_0 \cap P_0\) is a subspace of \(W_0\) . Since \(\Pi _{P_0}(\Lambda) = \Pi _{W_0 \cap P_0}(\Lambda)\) as \(v \in W_0\) , this shows that the invariant \(\Lambda = \Pi _{W_0}(\mathbb {Z}^n)\) holds throughout the algorithm.
    We now prove that Algorithm 2 finds the unique minimizer \(x^* \in S_\varphi ^n\) . Note that in the beginning of the algorithm, we have \(x^* \in K\) . Since CenterOfGravity in Line 7 always preserves \(x^* \in K\) , we only need to prove that dimension reduction in Line 11–15 preserves \(x^* \in K\) . In the following, we show the stronger statement that each dimension reduction iteration in Line 11–15 preserves all rational points in \(K \cap S_\varphi ^n\) .
    Since Algorithm 2 maintains \(x_K = \mathsf {cg}(K)\) and \(\Sigma _K = \mathsf {Cov}(K)\) in every iteration, an immediate application of Theorem 2.12 gives the following sandwiching condition:
    \(\begin{align} x_K + E\left(\Sigma _K^{-1}\right)/2 \subseteq K \subseteq x_K + 2n \cdot E\left(\Sigma _K^{-1}\right). \end{align}\)
    (7)
    Now we proceed to show that each dimension reduction iteration preserves all rational points in \(K \cap S_\varphi ^n\) . By the RHS of (7), we have \(K \cap S_\varphi ^n \subseteq (x_K + 2n \cdot E(\Sigma _K^{-1})) \cap S_\varphi ^n\) . Since \(\left\Vert v\right\Vert _{\Sigma _K} \lt \frac{1}{10n2^{2 \varphi }}\) is satisfied in a dimension reduction iteration, Lemma 3.1 shows that all rational points in \((x_K + 2n \cdot E(\Sigma _K^{-1})) \cap S_\varphi ^n\) lie on the hyperplane given by \(P = \lbrace y: v^\top y = (v - z)^\top x_K + {\lceil z^\top x_K \rfloor }_\varphi \rbrace\) . Thus we have \(K \cap S_\varphi ^n \subseteq K \cap P\) and this finishes the proof of the lemma.□
    Next, we prove the oracle complexity upper bound of Algorithm 2 in Theorem 4.1.
    Lemma 4.3 (Oracle Complexity of.
    MetaALG ) Assuming the conditions in Theorem 4.1 and that f has a unique minimizer \(x^* \in S_\varphi ^n\) , Algorithm 2 makes at most \(O(n (\varphi + \log (\gamma n R)))\) calls to \(\mathsf {SO}\) .
    Proof.
    We note that the oracle is only called when CenterOfGravity is invoked in Line 7, and each run of CenterOfGravity makes one call to \(\mathsf {SO}\) according to Theorem 2.14. To upper bound the total number of runs of CenterOfGravity, we consider the potential function
    \(\begin{align*} \Phi = \log (\mathsf {vol}(K) \cdot \det (\Lambda)). \end{align*}\)
    In the beginning, \(\Phi = \log (\mathsf {vol}(B_\infty (R)) \cdot \det (I)) = n \log (R)\) . Each time CenterOfGravity is called in Line 7, we have from Theorem 2.14 that the volume of K decreases by at least a constant factor, so the potential function decreases by at least \(\Omega (1)\) additively.
    To analyze the change in the potential function after dimension reduction, we consider a maximal sequence of consecutive dimension reduction iterations \(t_0+1, \ldots , t_0 + k\) , i.e., CenterOfGravity is invoked in iteration \(t_0\) and \(t_0 + k+1\) , while every iteration in \(t_0 + 1,\ldots , t_0 + k\) decreases the dimension by one. We shall use superscript \((i)\) to denote the corresponding notations in the beginning of iteration \(t_0 + i\) , for any integer \(i \ge 0\) . In particular, in the beginning of iteration \(t_0 + 1\) , we have a convex body \(K^{(1)} \subseteq K^{(0)} \subseteq W^{(0)} = W^{(1)}\) , and after the sequence of dimension reduction iterations, we reach a convex body \(K^{(k + 1)} = K^{(1)} \cap W^{(k + 1)} \subseteq K^{(0)} \cap W^{(k+1)}\) . The lattice changes from \(\Lambda ^{(0)} = \Lambda ^{(1)} \subseteq W_0^{(1)}\) to \(\Lambda ^{(k + 1)} = \Pi _{W_0^{(k + 1)}}(\Lambda ^{(1)}) = \Pi _{W_0^{(k + 1)}}(\Lambda ^{(0)})\) , where we recall that subspaces \(W_0^{(i)}\) are translations of the affine subspaces \(W^{(i)}\) that pass through the origin. Note that the potential at the beginning of this maximal sequence of dimension reduction iterations is
    \(\begin{align*} e^{\Phi ^{(0)}} = \mathsf {vol}(K^{(0)}) \cdot \det (\Lambda ^{(0)}) = \frac{\mathsf {vol}(K^{(0)})}{\det ((\Lambda ^{(0)})^*)} . \end{align*}\)
    The potential after this sequence of dimension reduction iterations is
    \(\begin{align*} e^{\Phi ^{(k+1)} } & = \mathsf {vol}(K^{(k+1)}) \cdot \det (\Lambda ^{(k+1)}) = \mathsf {vol}(K^{(1)} \cap W^{(k+1)}) \cdot \det (\Pi _{W_0^{(k+1)}}(\Lambda ^{(0)})) \\ & = \frac{\mathsf {vol}(K^{(1)} \cap W^{(k+1)})}{\det ((\Pi _{W_0^{(k+1)}}(\Lambda ^{(0)}))^*)} = \frac{\mathsf {vol}(K^{(1)} \cap W^{(k+1)})}{\det ((\Lambda ^{(0)})^* \cap W_0^{(k+1)})} \le \frac{\mathsf {vol}(K^{(0)} \cap W^{(k+1)})}{\det ((\Lambda ^{(0)})^* \cap W_0^{(k+1)})}, \end{align*}\)
    where the last equality follows from the duality \((\Pi _{W_0^{(k+1)}}(\Lambda ^{(0)}))^* = (\Lambda ^{(0)})^* \cap W_0^{(k+1)}\) in Fact 2.3. Since \(W^{(k+1)}\) is a translation of the subspace \(W_0^{(k+1)}\) , we can apply Lemma 3.2 by taking \(L = (\Lambda ^{(0)})^*\) to obtain
    \(\begin{align} e^{\Phi ^{(k+1)} } \le e^{\Phi ^{(0)}} \cdot \frac{k^{O(k)}}{\lambda _1(\Lambda ^{(0)}, K^{(0)})^k}, \end{align}\)
    (8)
    where \(\lambda _1(\Lambda ^{(0)}, K^{(0)})\) is the shortest non-zero vector in \(\Lambda ^{(0)}\) under the norm \(\left\Vert \cdot \right\Vert _{\mathsf {Cov}(K^{(0)})}\) . As CenterOfGravity is invoked in iteration \(t_0\) , we have \(\textstyle \Vert v^{(0)}\Vert _{\Sigma _K^{(0)}} \ge \frac{1}{10n 2^{2 \varphi }}\) for the output vector \(v^{(0)} \in \Lambda ^{(0)} \setminus \lbrace 0\rbrace\) . Since the ApproxSVP procedure is \(\gamma\) -approximation and that \(\Sigma _K^{(0)} = \mathsf {Cov}(K^{(0)})\) , this implies that \(\lambda _1(\Lambda ^{(0)}, K^{(0)}) \ge \frac{\Omega (1)}{\gamma n 2^{2 \varphi }}\) . It then follows that
    \(\begin{align*} e^{\Phi ^{(k+1)} } \le e^{\Phi ^{(0)}} \cdot (\gamma n 2^{\varphi })^{O(k)}. \end{align*}\)
    This shows that after a sequence of k dimension reduction iterations, the potential increases additively by at most \(O(k \log (\gamma n 2^{\varphi }))\) . As there are at most n dimension reduction iterations, the total amount of potential increase due to dimension reduction iterations is thus at most \(O(n \log (\gamma n 2^{\varphi }))\) .
    Finally, we note that whenever the potential becomes smaller than \(-10 n \log (20 n \gamma 2^{2 \varphi })\) , Minkowski’s first theorem (Theorem 2.4) shows the existence of a non-zero vector \(v \in \Lambda\) with \(\left\Vert v\right\Vert _{\Sigma _K} \lt \frac{1}{20n \gamma 2^{2 \varphi }}\) . This implies that the \(\gamma\) -approximation algorithm ApproxSVP for the shortest vector problem will find a non-zero vector \(v^{\prime } \in \Lambda\) that satisfies \(\left\Vert v^{\prime }\right\Vert _{\Sigma _K} \lt \frac{1}{20n 2^{2 \varphi }}\) , and thus such an iteration will not invoke CenterOfGravity. Therefore, Algorithm 2 runs CenterOfGravity at most \(O(n \log (\gamma n 2^{\varphi }) + n \log (R)) = O(n(\varphi + \log (\gamma n R)))\) times. Since each run of CenterOfGravity makes one call to \(\mathsf {SO}\) , the total number of calls to \(\mathsf {SO}\) made by Algorithm 2 is thus \(O(n(\varphi + \log (\gamma n R)))\) . This finishes the proof of the lemma.□
    Proof of Theorem 4.1
    By the argument in the beginning of Section 1.3, we may assume without loss of generality that f has a unique minimizer \(x^* \in S_\varphi ^n\) . The correctness of Algorithm 2 is given in Lemma 4.2, and its oracle complexity is upper bounded in Lemma 4.3. These finish the proof of the theorem.□

    5 Efficient Implementation of the Meta Algorithm

    In this section, we give an efficient implementation of Algorithm 2 from the previous section and prove Theorem 1.2 which we restate below for convenience.

    5.1 The Efficient Implementation

    By the argument in the beginning of Section 1.3, we may assume without loss of generality that f has a unique minimizer \(x^* \in S_\varphi ^n\) . For simplicity, we present our algorithm under this assumption.
    As mentioned in the last paragraph of Section 4.1, we can efficiently implement Line 8 of Algorithm 2 by using the approximate center of gravity method in Theorem 2.17. We now address the issue of efficiently implementing Line 13 of Algorithm 2 in the following.
    To obtain an approximate centroid and covariance matrix of the polytope K after dimension reduction, our efficient algorithm maintains two polytopes \(K_{\mathsf {SO}} \subseteq K_{\mathsf {free}}\) . The polytope \(K_{\mathsf {SO}}\) plays the same role as K in Algorithm 2, and is the polytope formed by the separating hyperplanes from \(\mathsf {SO}\) . And \(K_{\mathsf {free}}\) is a simple polytope for which we always know an approximate centroid \(x_K\) and covariance matrix \(\Sigma _K\) . Our algorithm explicitly maintains the lists of constraints for the polytopes \(K_{\mathsf {SO}}\) and \(K_{\mathsf {free}}\) to efficiently perform computations on them. In particular, our algorithm can efficiently certify8 that \(K_{\mathsf {free}} = K_{\mathsf {SO}}\) when all the constraints for \(K_{\mathsf {SO}}\) appear in the list of constraints for \(K_{\mathsf {free}}\) , since it is always maintained that \(K_{\mathsf {SO}} \subseteq K_{\mathsf {free}}\) .
    In the beginning of the algorithm, \(K_{\mathsf {free}} = K_{\mathsf {SO}}\) and we run RandomWalkCG for both polytopes at the same time. When dimension reduction happens in Line 16–21, \(K_{\mathsf {SO}}\) is updated to be \(K^{\mathsf {new}}_{\mathsf {SO}} = K_{\mathsf {SO}} \cap P\) and we no longer have approximations to \(\mathsf {cg}(K^{\mathsf {new}}_{\mathsf {SO}})\) and \(\mathsf {Cov}(K^{\mathsf {new}}_{\mathsf {SO}})\) . To bypass this difficulty, our strategy is to update \(K_{\mathsf {free}}\) to be a simple polytope \(K^{\mathsf {new}}_{\mathsf {free}}\) containing \(K^{\mathsf {new}}_{\mathsf {SO}}\) for which we know \(\mathsf {cg}(K^{\mathsf {new}}_{\mathsf {free}})\) and \(\mathsf {Cov}(K^{\mathsf {new}}_{\mathsf {free}})\) , and “learn” \(\mathsf {cg}(K^{\mathsf {new}}_{\mathsf {SO}})\) and \(\mathsf {Cov}(K^{\mathsf {new}}_{\mathsf {SO}})\) by shrinking \(K^{\mathsf {new}}_{\mathsf {free}}\) via RandomWalkCG until it coincides with \(K^{\mathsf {new}}_{\mathsf {SO}}\) . Whenever \(K^{\mathsf {new}}_{\mathsf {free}} = K^{\mathsf {new}}_{\mathsf {SO}}\) happens again (in the aforementioned sense that the constraints for \(K^{\mathsf {new}}_{\mathsf {SO}}\) all appear in the list of constraints \(K^{\mathsf {new}}_{\mathsf {free}}\) ), we have successfully learned an approximate centroid and covariance matrix of \(K^{\mathsf {new}}_{\mathsf {SO}}\) , and can continue to shrink \(K^{\mathsf {new}}_{\mathsf {SO}}\) using RandomWalkCG as before.
    Now we specify our choice of \(K^{\mathsf {new}}_{\mathsf {free}}\) in the strategy above. Note that \(K^\mathsf {new}_{\mathsf {SO}} \subseteq P \cap (x_K + 2n \cdot E(\Sigma _K^{-1}))\) . Denoting the ellipsoid \(P \cap (x_K + 2n \cdot E(\Sigma _K^{-1})) = E(w, A)\) , we can simply choose \(K^{\mathsf {new}}_{\mathsf {free}}\) to be the smallest hyperrectangle containing \(E(w, A)\) , i.e., \(K^{\mathsf {new}}_{\mathsf {free}} = w + A^{-1/2} B_\infty\) , for which it is easy to compute an exact centroid and covariance matrix.
    Such choice of \(K^{\mathsf {new}}_{\mathsf {free}}\) blows up the volume of the outer ellipsoid \(P \cap (x_K + 2n \cdot E(\Sigma _K^{-1}))\) by a factor of \(n^{O(n)}\) , and thus shrinking \(K^{\mathsf {new}}_{\mathsf {free}}\) seems to require much more \(\mathsf {SO}\) calls. The crucial observation here is that when we shrink the volume of \(K^{\mathsf {new}}_{\mathsf {free}}\) , we do not need to make calls to \(\mathsf {SO}\) since we already know the polytope \(K^{\mathsf {new}}_{\mathsf {SO}} \subseteq K^{\mathsf {new}}_{\mathsf {free}}\) . Instead, we simulate the separation oracle using the smaller polytope \(K^{\mathsf {new}}_{\mathsf {SO}}\) via the procedure FreeCG (see Algorithm 4) until we have \(K^{\mathsf {new}}_{\mathsf {free}} = K^{\mathsf {new}}_{\mathsf {SO}}\) again, at which point we regain approximations to \(\mathsf {cg}(K^{\mathsf {new}}_{\mathsf {SO}})\) and \(\mathsf {Cov}(K^{\mathsf {new}}_{\mathsf {SO}})\) . If we are ever able to find a hyperplane \(P^{\mathsf {new}}\) containing \(K^{\mathsf {new}}_{\mathsf {free}} \cap S_\varphi ^n\) even before reaching the point \(K^{\mathsf {new}}_{\mathsf {free}} = K^{\mathsf {new}}_{\mathsf {SO}}\) , we can further reduce the dimension. A formal description of the efficient implementation is given in Algorithm 3.

    5.2 Proof of Main Result

    By the argument in the beginning of Section 1.3, we can assume w.l.o.g. that f has a unique minimizer \(x^* \in S_\varphi ^n\) . We first prove the correctness and oracle complexity of Algorithm 3. These proofs are very similar to the proofs of Lemmas 4.2 and 4.3 from the previous section, so we only highlight the differences.
    Lemma 5.1 (Correctness of.
    Main ) Assuming the conditions in Theorem 1.2 and that f has a unique minimizer \(x^* \in S_\varphi ^n\) , Algorithm 3 finds \(x^*\) .
    Proof.
    As in the proof of Lemma 4.2, we only need to verify that \(x^* \in K_{\mathsf {SO}}\) is preserved under dimension reduction in Line 16–21. Let’s assume that \(x^* \in K_{\mathsf {SO}}\) before dimension reduction. Since Theorem 2.17 guarantees \(\Vert x_K - \mathsf {cg}(K_{\mathsf {free}})\Vert _{(\Sigma _K)^{-1}} \le \epsilon\) and \((1-\epsilon) \cdot \mathsf {Cov}(K_{\mathsf {free}}) \preceq \Sigma _K \preceq (1+\epsilon) \cdot \mathsf {Cov}(K_{\mathsf {free}})\) with \(\epsilon = 0.01\) , it follows from Theorem 2.12 that (7) still holds with K replaced by \(K_{\mathsf {free}}\) :
    \(\begin{align*} x_K + E\left(\Sigma _K^{-1}\right)/2 \subseteq K_{\mathsf {free}} \subseteq x_K + 2n \cdot E\left(\Sigma _K^{-1}\right). \end{align*}\)
    Proceeding from here, the same argument as in the proof of Lemma 4.2 shows that \(K_{\mathsf {free}} \cap S_\varphi ^n \subseteq P\) . Also note that Algorithm 3 always maintains \(K_{\mathsf {SO}} \subseteq K_{\mathsf {free}}\) . It follows that \(K_{\mathsf {SO}} \cap S_{\varphi }^n \subseteq K_{\mathsf {free}} \cap S_\varphi ^n \subseteq P\) , i.e., all rational points in \(K_{\mathsf {SO}} \cap S_{\varphi }^n\) are preserved during dimension reduction. This implies that \(x^* \in K_{\mathsf {SO}} \cap P\) after dimension reduction and completes the proof of the lemma.□
    Lemma 5.2 (Oracle Complexity of.
    Main ) Assuming the conditions in Theorem 1.2 and that f has a unique minimizer \(x^* \in S_\varphi ^n\) , Algorithm 3 makes at most \(O(n (\varphi + \log (\gamma n R)))\) calls to the separation oracle \(\mathsf {SO}\) with high probability.
    Proof.
    Note that Algorithm 3 always maintains \(K_{\mathsf {SO}} \subseteq K_{\mathsf {free}}\) , and \(\mathsf {SO}\) is only called in Line 10 when \(K_{\mathsf {SO}} = K_{\mathsf {free}}\) . Since each run of RandomWalkCG in Line 10 succeeds with probability \(\delta = 1/\mathsf {poly}(n, \varphi , \log (\gamma R))\) for a large enough polynomial by Theorem 2.17, union bound implies that with high probability, the first \(O(n (\varphi + \log (\gamma n R)))\) run of RandomWalkCG in Line 10 all succeed. We condition on this event. Then applying exactly the same analysis as in the proof of Lemma 4.3 to the potential function
    \(\begin{align*} \Phi _{\mathsf {SO}} := \log (\mathsf {vol}(K_{\mathsf {SO}}) \cdot \det (\Lambda)) \end{align*}\)
    gives the oracle complexity bound in the lemma.□
    Next, we show that Algorithm 3 makes at most \(\mathsf {poly}(n, \varphi , \log (\gamma R))\) calls to FreeCG with high probability. Since each call to FreeCG can be implemented in \(\mathsf {poly}(n, \varphi , \log (\gamma R))\) time by checking all the constraints of \(K_{\mathsf {SO}}\) , this will imply the bound on the number of arithmetic operations in Theorem 1.2.
    Lemma 5.3 (Number of.
    FreeCG Calls) Assuming the conditions in Theorem 1.2 and that f has a unique minimizer \(x^* \in S_\varphi ^n\) , Algorithm 3 makes at most \(\mathsf {poly}(n, \varphi , \log (\gamma R))\) calls to FreeCG with high probability.
    Proof.
    As in the proof above, we condition on the high probability event that the first \(\mathsf {poly}(n, \varphi , \log (\gamma R))\) calls to RandomWalkCG as well as the sampling algorithm in Theorem 2.16 all succeed. In the beginning of the algorithm, \(K_{\mathsf {SO}} = B_{\infty }(R)\) and thus can be specified using 2n constaints. An additional constraint is placed on \(K_{\mathsf {SO}}\) each time \(\mathsf {SO}\) is called, and since the number of \(\mathsf {SO}\) calls is at most \(O(n(\varphi + \log (\gamma n R))\) , the number of constraints Algorithm 3 maintains for the specification of \(K_{\mathsf {SO}}\) can be at most \(O(n(\varphi + \log (\gamma n R))\) throughout.
    Now we upper bound the number of calls to FreeCG. In fact, we show that the total number of cutting plane steps for \(K_{\mathsf {free}}\) in Line 10 and 13 of Algorithm 3 is at most \(\mathsf {poly}(n,\varphi , \log (\gamma R))\) . Our strategy is to consider the potential function
    \(\begin{align*} \Phi _{\mathsf {free}} := \log (\mathsf {vol}(K_{\mathsf {free}}) \cdot \det (\Lambda)) , \end{align*}\)
    and repeat the analysis as in the proof of Lemma 4.3. However, there are two main differences that we highlight below.
    The first main difference is that when we reduce the dimension in Line 16–21 of Algorithm 3, we are not simply slicing \(K_{\mathsf {free}}\) by the hyperplane P. Instead, we first replace \(K_{\mathsf {free}}\) by its outer containing ellipsoid \(x_K + 2n \cdot E(\Sigma _K^{-1})\) , then further replace the sliced ellipsoid \(E(w, A) = P \cap (x_K + 2n \cdot E(\Sigma _K^{-1}))\) by its outer containing hyperrectangle \(K^{\mathsf {new}}_{\mathsf {free}} := w + A^{-1/2} B_\infty\) . Since we have the sandwiching condition that
    \(\begin{align*} x_K + E\left(\Sigma _K^{-1}\right)/2 \subseteq K_{\mathsf {free}} \subseteq x_K + 2n \cdot E\left(\Sigma _K^{-1}\right), \end{align*}\)
    replacing \(K_{\mathsf {free}}\) by \(x_K + 2n \cdot E(\Sigma _K^{-1})\) increases its volume by at most \(n^{O(n)}\) . Also note that replacing an ellipsoid by its outer containing hyperrectangle increases its volume by at most \(n^{O(n)}\) . It then follows that these replacements contribute to at most a factor of \(n^{O(n)}\) to \(\mathsf {vol}(K_{\mathsf {free}})\) for each dimension reduction step. As there are at most n dimension reduction steps, the increase in \(\Phi _{\mathsf {free}}\) due to these replacements is at most \(O(n^2 \log (n))\) additively.
    The second main difference is that not every call to FreeCG decreases \(\mathsf {vol}(K_{\mathsf {free}})\) by a constant factor. In particular, this is the case if \(x_K \in K_{\mathsf {SO}}\) in Algorithm 4 and we add to \(K_{\mathsf {free}}\) one constraint of \(K_{\mathsf {SO}}\) that is currently not a constraint of \(K_{\mathsf {free}}\) . However, since we have shown above that \(K_{\mathsf {SO}}\) has at most \(O(n (\varphi + \log (\gamma n R)))\) constraints, this case can happen at most \(O(n (\varphi + \log (\gamma n R)))\) in each dimension until all the constraints for \(K_{\mathsf {SO}}\) appear in the list of constraints for \(K_{\mathsf {free}}\) , in which case our algorithm can efficiently certify that \(K_{\mathsf {free}} = K_{\mathsf {SO}}\) . Whenever this happens, no additional call to FreeCG will happen until the dimension is further reduced.
    Incorporating the above two differences into the analysis as in the proof of Lemma 4.3, we obtain that the total number of cutting plane steps in Line 10 and 13 applied to \(K_{\mathsf {free}}\) is at most \(O(n^2(\varphi + \log (\gamma n R)))\) . This is also an upper bound on the number of calls to FreeCG, and thus proves the lemma.□
    Proof of Theorem 1.2
    By the argument in the beginning of Section 1.3, we may assume without loss of generality that f has a unique minimizer \(x^* \in S_\varphi ^n\) . The correctness of Algorithm 3 is given in Lemma 5.1, and its oracle complexity is upper bounded in Lemma 5.2. We are thus left to upper bound the total number of arithmetic operations used by Algorithm 3.
    By Lemma 5.3, Algorithm 3 makes at most \(\mathsf {poly}(n, \varphi , \log (\gamma R))\) calls to FreeCG and each such step can be implemented using \(\mathsf {poly}(n, \varphi , \log (\gamma R))\) arithmetic operations. Since ApproxSVP is called after each cutting plane step in Line 10 and 13, the total number of calls to ApproxSVP is at most \(\mathsf {poly}(n, \varphi , \log (\gamma R))\) . Note that the remaining part of the algorithm takes \(\mathsf {poly}(n, \varphi , \log (\gamma R))\) arithmetic operations. This gives the upper bound on the number of arithmetic operations and finishes the proof of the theorem.□

    6 Submodular Function Minimization

    In this section, we do not seek to give a comprehensive introduction to submodular functions, but only provide the necessary definitions and properties that are needed for the proof of Theorem 1.3. We refer interested readers to the famous textbook by Schrijver [44] or the extensive survey by McCormick [33] for more details on submodular functions.

    6.1 Preliminaries

    Throughout this section, we use \([n] = \lbrace 1,\ldots , n\rbrace\) to denote the ground set and let \(f: 2^{[n]} \rightarrow \mathbb {Z}\) be a set function defined on subsets of \([n]\) . For a subset \(S \subseteq [n]\) and an element \(i \in [n]\) , we define \(S + i := S \cup \lbrace i\rbrace\) . A set function f is submodular if it satisfies the following property of diminishing marginal differences:
    Definition 6.1 (Submodularity).
    A function \(f: 2^{[n]} \rightarrow \mathbb {Z}\) is submodular if \(f(T + i) - f(T) \le f(S + i) - f(S)\) , for any subsets \(S \subseteq T \subseteq [n]\) and \(i \in [n] \setminus T\) .
    Throughout this section, the set function f we work with is assumed to be submodular even when it is not stated explicitly. We may assume without loss of generality that \(f(\emptyset) = 0\) by replacing \(f(S)\) by \(f(S) - f(\emptyset)\) . We assume that f is accessed by an evaluation oracle, and use \(\mathsf {EO}\) to denote the time to compute \(f(S)\) for a subset S. Our algorithm for SFM is based on a standard convex relaxation of a submodular function, known as the Lovász extension [19].
    Definition 6.2 (Lovász Extension).
    The Lovász extension \(\hat{f}:[0,1]^n \rightarrow \mathbb {R}\) of a submodular function f is defined as
    \(\begin{align*} \hat{f}(x) = \mathbb {E}_{t \sim [0,1]} [f(\lbrace i: x_i \ge t\rbrace)], \end{align*}\)
    where \(t \sim [0,1]\) is drawn uniformly at random from \([0,1]\) .
    The Lovász extension \(\hat{f}\) of a submodular function f has many desirable properties. In particular, \(\hat{f}\) is a convex relaxation of f and it can be evaluated efficiently.
    Theorem 6.3 (Properties of Lovász Extension).
    Let \(f: 2^{[n]} \rightarrow \mathbb {Z}\) be a submodular function and \(\hat{f}\) be its Lovász extension. Then,
    (a)
    \(\hat{f}\) is convex and \(\min _{x \in [0,1]^n} \hat{f}(x) = \min _{S \subseteq [n]} f(S)\) ;
    (b)
    \(f(S) = \hat{f}(I_S)\) for any subset \(S \subseteq [n]\) , where \(I_S\) is the indicator vector for S;
    (c)
    Suppose \(x \in [0,1]^n\) satisfies \(x_1 \ge \cdots \ge x_n\) , then \(\hat{f}(x) = \sum _{i=1}^n (f([i]) - f([i-1])) x_i\) ;
    (d)
    The set of minimizers of \(\hat{f}\) is the convex hull of the set of minimizers of f.
    Next we address the question of implementing the separation oracle (as in Definition 1.1) using the evaluation oracle of f.
    Theorem 6.4 (Separation Oracle Implementation for Lovász Extension, Theorem 61 of [29]).
    Let \(f: 2^{[n]} \rightarrow \mathbb {Z}\) be a submodular function and \(\hat{f}\) be its Lovász extension, then a separation oracle for \(\hat{f}\) can be implemented in time \(O(n \cdot \mathsf {EO}+ n^2)\) .

    6.2 Proof of Theorem 1.3

    Before presenting the proof, we restate Theorem 1.3 for convenience.
    Proof.
    We apply Corollary 1.2 to the Lovász extension \(\hat{f}\) of the submodular function f with \(R = 1\) . By part (a) and (d) of Theorem 6.3, \(\hat{f}\) is a convex function that satisfies the assumption (⋆) in Corollary 1.2 Thus Corollary 1.2 gives a strongly polynomial algorithm for finding an integral minimizer of \(\hat{f}\) that makes \(O(n^2 \log \log (n)/\log (n))\) calls to a separation oracle of \(\hat{f}\) , and an exponential time algorithm that finds an integral minimizer of \(\hat{f}\) using \(O(n \log (n))\) separation oracle calls. This integral minimizer also gives a minimizer of f. Since a separation oracle for \(\hat{f}\) can be implemented using \(O(n)\) calls to \(\mathsf {EO}\) by Theorem 6.4, the total number of calls to the evaluation oracle is thus \(O(n^3 \log \log (n)/\log (n))\) for the strongly polynomial algorithm, and is \(O(n^2 \log (n))\) for the exponential time algorithm. This proves the theorem.□

    Acknowledgments

    I would like to thank the anonymous referees of Journal of the ACM for very insightful comments. I thank my advisor Yin Tat Lee for advising this project. Part of this work is inspired from earlier notes by Yin Tat Lee and Zhao Song. A special thanks to Daniel Dadush for pointing out the implication of the Grötschel-Lovász-Schrijver approach to our problem, suggesting the high dimensional slicing lemma which greatly simplifies my earlier proofs, and to Daniel Dadush and Thomas Rothvoss for pointing out that our framework implies \(O(n^2 \log (n))\) oracle complexity for SFM by solving SVP exactly. I also thank Thomas Rothvoss for other useful comments and his wonderful lecture notes on integer optimization and lattice theory. I also thank Jonathan Kelner, Janardhan Kulkarni, Aaron Sidford, Zhao Song, Santosh Vempala, and Sam Chiu-wai Wong for helpful discussions on this project.

    Footnotes

    1
    It’s easy to show that strongly polynomial algorithm doesn’t exist if \(\log (R)\) is super-polynomial (see Remark 1.3).
    2
    The original approach by Grötschel, Lovász and Schrijver was given in the context of obtaining exact solutions to LP, but it is immediately applicable to our problem. Their approach was briefly described in [18] with details given in [19]. Their approach originally used the ellipsoid method which is sub-optimal in terms of oracle complexity. The oracle complexity given here uses Vaidya’s cutting plane method [49].
    3
    Here we use a slightly different definition from Grötschel, Lovász and Schrijver’s original definition of vertex complexity in [17, 19] so that \(\varphi = 0\) corresponds to the setting of integral minimizers. More details can be found in Section 2.2.4.
    4
    This algorithm improves the approximation factors of the celebrated LLL algorithm [30] and Schnorr’s block reduction algorithm [41].
    5
    Note that this implementation of the separation oracle for the lexicographically minimal minimizer \(x^*\) does not quite satisfy the conditions in Definition 1.1. In particular, even when \(x^*\) is queried, the separation oracle for finding \(x^*\) might not realize it unless the current working subspace is trivial (i.e., 0-dimensional). However, all our results and proofs still hold under this slightly weaker implementation of the separation oracle.
    6
    Perhaps a more natural candidate is the ellipsoid method developed in [27, 45, 53]. This method, however, shrinks the volume of K by a factor of \(O(n)\) slower than Vaidya’s method. In fact, the Grötschel-Lovász-Schrijver approach [18] originally used the ellipsoid method which results in an oracle complexity of \(O(n^4)\) for their polynomial time algorithm.
    7
    Equivalently, one could think of finding an approximately shortest vector under the Euclidean norm in the lattice \(A^{1/2} \Lambda\) .
    8
    In general, our algorithm might not be able to efficiently verify that the geometric objects \(K_{\mathsf {free}}\) being the same as \(K_{\mathsf {SO}}\) . So whenever we say \(K_{\mathsf {free}} = K_{\mathsf {SO}}\) , we always mean it in the sense that it can be efficiently certified by checking that all constraints for \(K_{\mathsf {SO}}\) appear in the list of constraints for \(K_{\mathsf {free}}\) .

    References

    [1]
    Radosław Adamczak, Alexander Litvak, Alain Pajor, and Nicole Tomczak-Jaegermann. 2010. Quantitative estimates of the convergence of the empirical covariance matrix in log-concave ensembles. Journal of the American Mathematical Society 23, 2 (2010), 535–561.
    [2]
    Ilan Adler and Steven Cosares. 1991. A strongly polynomial algorithm for a special class of linear programs. Operations Research 39, 6 (1991), 955–960.
    [3]
    Divesh Aggarwal, Daniel Dadush, Oded Regev, and Noah Stephens-Davidowitz. 2015. Solving the shortest vector problem in 2n time using discrete Gaussian sampling. In Proceedings of the Forty-seventh Annual ACM Symposium on Theory of Computing. 733–742.
    [4]
    Miklós Ajtai, Ravi Kumar, and Dandapani Sivakumar. 2001. A sieve algorithm for the shortest lattice vector problem. In Proceedings of the Thirty-third Annual ACM Symposium on Theory of Computing. 601–610.
    [5]
    Dimitris Bertsimas and Santosh Vempala. 2004. Solving convex programs by random walks. Journal of the ACM (JACM) 51, 4 (2004), 540–556.
    [6]
    John William Scott Cassels. 1971. An Introduction to the Theory of Numbers. Springer-Verlag.
    [7]
    Sergei Chubanov. 2012. A strongly polynomial algorithm for linear systems having a binary solution. Mathematical Programming 134, 2 (2012), 533–570.
    [8]
    Sergei Chubanov. 2015. A Polynomial Algorithm for Linear Optimization which is Strongly Polynomial Under Certain Conditions on Optimal Solutions.
    [9]
    Edith Cohen and Nimrod Megiddo. 1994. Improved algorithms for linear inequalities with two variables per inequality. SIAM J. Comput. 23, 6 (1994), 1313–1347.
    [10]
    Daniel Dadush. 2012. Integer Programming, Lattice Algorithms, and Deterministic Volume Estimation. Ph. D. Dissertation. Georgia Institute of Technology.
    [11]
    Daniel Dadush, Sophie Huiberts, Bento Natura, and László A. Végh. 2020. A scaling-invariant algorithm for linear programming whose running time depends only on the constraint matrix. In Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing. 761–774.
    [12]
    Daniel Dadush and Daniele Micciancio. 2013. Algorithms for the densest sub-lattice problem. In Proceedings of the Twenty-fourth Annual ACM-SIAM Symposium on Discrete Algorithms. SIAM, 1103–1122.
    [13]
    Daniel Dadush, László A. Végh, and Giacomo Zambelli. 2018. Geometric rescaling algorithms for submodular function minimization. In Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms. SIAM, 832–848.
    [14]
    Daniel Dadush, László A. Végh, and Giacomo Zambelli. 2020. Rescaling algorithms for linear conic feasibility. Mathematics of Operations Research 45, 2 (2020), 732–754.
    [15]
    Jack Edmonds. 1970. Submodular functions, matroids, and certain polyhedra. Edited by G. Goos, J. Hartmanis, and J. van Leeuwen (1970), 11.
    [16]
    Lisa Fleischer and Satoru Iwata. 2003. A push-relabel framework for submodular function minimization and applications to parametric optimization. Discrete Applied Mathematics 131, 2 (2003), 311–322.
    [17]
    Martin Grötschel, László Lovász, and Alexander Schrijver. 1981. The ellipsoid method and its consequences in combinatorial optimization. Combinatorica 1, 2 (1981), 169–197.
    [18]
    Martin Grötschel, László Lovász, and Alexander Schrijver. 1984. Geometric methods in combinatorial optimization. In Progress in Combinatorial Optimization. Elsevier, 167–183.
    [19]
    Martin Grötschel, László Lovász, and Alexander Schrijver. 1988. Geometric Algorithms and Combinatorial Optimization. Springer.
    [20]
    Satoru Iwata. 2003. A faster scaling algorithm for minimizing submodular functions. SIAM J. Comput. 32, 4 (2003), 833–840.
    [21]
    Satoru Iwata. 2008. Submodular function minimization. Mathematical Programming 112, 1 (2008), 45.
    [22]
    Satoru Iwata, Lisa Fleischer, and Satoru Fujishige. 2001. A combinatorial strongly polynomial algorithm for minimizing submodular functions. Journal of the ACM (JACM) 48, 4 (2001), 761–777.
    [23]
    Satoru Iwata and James B. Orlin. 2009. A simple combinatorial algorithm for submodular function minimization. In Proceedings of the Twentieth Annual ACM-SIAM Symposium on Discrete Algorithms. SIAM, 1230–1237.
    [24]
    He Jia, Aditi Laddha, Yin Tat Lee, and Santosh Vempala. 2021. Reducing isotropy and volume to KLS: An o*(n 3 \(\psi\) 2) volume algorithm. In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing. 961–974.
    [25]
    Ravi Kannan, László Lovász, and Miklós Simonovits. 1995. Isoperimetric problems for convex bodies and a localization lemma. Discrete & Computational Geometry 13, 3-4 (1995), 541–559.
    [26]
    Ravi Kannan, László Lovász, and Miklós Simonovits. 1997. Random walks and an o*(n5) volume algorithm for convex bodies. Random Structures & Algorithms 11, 1 (1997), 1–50.
    [27]
    Leonid G. Khachiyan. 1980. Polynomial algorithms in linear programming. U.S.S.R. Comput. Math. and Math. Phys. 20, 1 (1980), 53–72.
    [28]
    Leonid G. Khachiyan, Sergei Pavlovich Tarasov, and I. I. Erlikh. 1988. The method of inscribed ellipsoids. In Soviet Math. Dokl, Vol. 37. 226–230.
    [29]
    Yin Tat Lee, Aaron Sidford, and Sam Chiu-wai Wong. 2015. A faster cutting plane method and its implications for combinatorial and convex optimization. In 2015 IEEE 56th Annual Symposium on Foundations of Computer Science. IEEE, 1049–1065.
    [30]
    Arjen Lenstra, Hendrik Lenstra, and László Lovász. 1982. Factoring polynomials with rational coefficients. Math. Ann 261 (1982), 515–534.
    [31]
    Anatoly Yur’evich Levin. 1965. An algorithm for minimizing convex functions. In Doklady Akademii Nauk, Vol. 160. Russian Academy of Sciences, 1244–1247.
    [32]
    László Lovász and Santosh Vempala. 2007. The geometry of logconcave functions and sampling algorithms. Random Structures & Algorithms 30, 3 (2007), 307–358.
    [33]
    S. Thomas McCormick. 2005. Submodular function minimization. Discrete Optimization 12 (2005), 321–391.
    [34]
    Nimrod Megiddo. 1983. Towards a genuinely polynomial algorithm for linear programming. SIAM J. Comput. 12, 2 (1983), 347–353.
    [35]
    Daniele Micciancio and Panagiotis Voulgaris. 2013. A deterministic single exponential time algorithm for most lattice problems based on Voronoi cell computations. SIAM J. Comput. 42, 3 (2013), 1364–1391.
    [36]
    Hermann Minkowski. 1953. Geometrie der zahlen. Chelsea, Reprint (1953).
    [37]
    Y. E. Nesterov and A. S. Nemirovskii. 1989. Self-concordant functions and polynomial time methods in convex programming. preprint, Central Economic & Mathematical Institute, USSR Acad. Sci. Moscow, USSR (1989).
    [38]
    Donald J. Newman. 1965. Location of the maximum on unimodal surfaces. Journal of the ACM (JACM) 12, 3 (1965), 395–398.
    [39]
    Neil Olver and László A. Végh. 2020. A simpler and faster strongly polynomial algorithm for generalized flow maximization. Journal of the ACM (JACM) 67, 2 (2020), 1–26.
    [40]
    James B. Orlin. 2009. A faster strongly polynomial time algorithm for submodular function minimization. Mathematical Programming 118, 2 (2009), 237–251.
    [41]
    Claus-Peter Schnorr. 1987. A hierarchy of polynomial time lattice basis reduction algorithms. Theoretical Computer Science 53, 2-3 (1987), 201–224.
    [42]
    Alexander Schrijver. 1998. Theory of Linear and Integer Programming. John Wiley & Sons.
    [43]
    Alexander Schrijver. 2000. A combinatorial algorithm minimizing submodular functions in strongly polynomial time. Journal of Combinatorial Theory, Series B 80, 2 (2000), 346–355.
    [44]
    Alexander Schrijver. 2003. Combinatorial Optimization: Polyhedra and Efficiency. Vol. 24. Springer Science & Business Media.
    [45]
    Naum Z. Shor. 1977. Cut-off method with space extension in convex programming problems. Cybernetics 13, 1 (1977), 94–96.
    [46]
    Steve Smale. 1998. Mathematical problems for the next century. The Mathematical Intelligencer 20, 2 (1998), 7–15.
    [47]
    Nikhil Srivastava and Roman Vershynin. 2013. Covariance estimation for distributions with \({2+\varepsilon }\) moments. The Annals of Probability 41, 5 (2013), 3081–3111. DOI:
    [48]
    Eva Tardos. 1986. A strongly polynomial algorithm to solve combinatorial linear programs. Operations Research 34, 2 (1986), 250–256.
    [49]
    Pravin M. Vaidya. 1989. A new algorithm for minimizing convex functions over convex sets. In 30th Annual IEEE Symposium on Foundations of Computer Science (FOCS’89). 338–343.
    [50]
    Stephen A. Vavasis and Yinyu Ye. 1996. A primal-dual interior point method whose running time depends only on the constraint matrix. Mathematical Programming 74, 1 (1996), 79–120.
    [51]
    László A. Végh. 2017. A strongly polynomial algorithm for generalized flow maximization. Mathematics of Operations Research 42, 1 (2017), 179–211.
    [52]
    Jens Vygen. 2003. A note on Schrijver’s submodular function minimization algorithm. Journal of Combinatorial Theory, Series B 88, 2 (2003), 399–402.
    [53]
    David B. Yudin and Arkadii S. Nemirovski. 1976. Evaluation of the information complexity of mathematical programming problems. Ekonomika i Matematicheskie Metody 12 (1976), 128–142.

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Journal of the ACM
    Journal of the ACM  Volume 70, Issue 1
    February 2023
    405 pages
    ISSN:0004-5411
    EISSN:1557-735X
    DOI:10.1145/3572730
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 19 December 2022
    Online AM: 11 October 2022
    Accepted: 12 September 2022
    Revised: 25 June 2022
    Received: 26 November 2020
    Published in JACM Volume 70, Issue 1

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Convex optimization
    2. rational minimizers
    3. strongly polynomial time
    4. submodular function minimization
    5. Shortest Vector Problem

    Qualifiers

    • Research-article
    • Refereed

    Funding Sources

    • NSF

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 1,362
      Total Downloads
    • Downloads (Last 12 months)461
    • Downloads (Last 6 weeks)49

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Get Access

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media