J. ACM, Vol. 65, No. 6, Article 37, Publication date: December 2018.
DOI: https://doi.org/10.1145/3230742
We introduce a new and natural algebraic proof system, whose complexity measure is essentially the algebraic circuit size of Nullstellensatz certificates. This enables us to exhibit close connections between effective Nullstellensatzë, proof complexity, and (algebraic) circuit complexity. In particular, we show that any super-polynomial lower bound on any Boolean tautology in our proof system implies that the permanent does not have polynomial-size algebraic circuits ($\mathsf {VNP} \ne \mathsf {VP}$). We also show that super-polynomial lower bounds on the number of lines in Polynomial Calculus proofs imply the Permanent versus Determinant Conjecture. Note that there was no proof system prior to ours for which lower bounds on an arbitrary tautology implied any complexity class lower bound.
Our proof system helps clarify the relationships between previous algebraic proof systems. In doing so, we highlight the importance of polynomial identity testing (PIT) in proof complexity. In particular, we use PIT to illuminate $\mathsf {AC}^0[p]$-Frege lower bounds, which have been open for nearly 30 years, with no satisfactory explanation as to their apparent difficulty.
Finally, we explain the obstacles that must be overcome in any attempt to extend techniques from algebraic circuit complexity to prove lower bounds in proof complexity. Using the algebraic structure of our proof system, we propose a novel route to such lower bounds. Although such lower bounds remain elusive, this proposal should be contrasted with the difficulty of extending $\mathsf {AC}^0[p]$ circuit lower bounds to $\mathsf {AC}^0[p]$-Frege lower bounds.
ACM Reference format:
Joshua A. Grochow and Toniann Pitassi. 2018. Circuit Complexity, Proof Complexity, and Polynomial Identity Testing: The Ideal Proof System. J. ACM 65, 6, Article 37 (December 2018), 59 pages. https://doi.org/10.1145/3230742
$\mathsf {NP}$ versus $\mathsf {coNP}$ is the very natural question of whether, for every graph without a Hamiltonian path, there is a short proof of this fact. One of the arguments for the utility of proof complexity is that proving lower bounds for standard proof systems is a necessary step towards proving $\mathsf {NP} \ne \mathsf {coNP}$. Moreover, standard proof systems correspond to standard circuit classes; for example, Frege corresponds to $\mathsf {NC}^1$, and Extended Frege corresponds to $\mathsf {P/poly}$; since the corresponding proof system can reason over the corresponding circuit class, it is speculated that a proof system lower bound would imply the corresponding circuit class lower bound, e.g., that lower bounds on Frege would imply $\mathsf {NP} \ne \mathsf {NC}^1$. However, until now these arguments have been more the expression of a philosophy or hope, as there is no known proof system for which lower bounds imply computational complexity lower bounds of any kind, let alone $\mathsf {NP} \ne \mathsf {coNP}$.
We remedy this situation by introducing a very natural algebraic proof system, which has tight connections to (algebraic) circuit complexity (albeit a proof system for which we only know a randomized efficient verification procedure). We show that any super-polynomial lower bound on any Boolean tautology in our proof system implies that the permanent does not have polynomial-size algebraic circuits ($\mathsf {VNP} \ne \mathsf {VP}$). Additionally, lower bounds on bounded-depth versions of our system imply the corresponding algebraic circuit lower bounds, e.g., lower bounds on the logarithmic-depth version of our proof system imply $\mathsf {VNP} \not\subseteq \mathsf {VNC^1}$. Note that, prior to our work, essentially all implications went the opposite direction: a circuit complexity lower bound implying a proof complexity lower bound. We use this result to begin to explain why several long-open lower bound questions in proof complexity—lower bounds on Extended Frege, on $\mathsf {AC}^0[p]$-Frege, and on number-of-lines in Polynomial Calculus-style proofs—have been so apparently difficult.
Algebraic Circuit Complexity. The most natural way to compute a polynomial function $f(x_1,\ldots,x_n)$ is with a sequence of instructions $g_1,\ldots,g_m = f$, starting from the inputs $x_1,\ldots, x_n$, and where each instruction $g_i$ is of the form $g_j \circ g_k$ for some $j,k \lt i$, where $\circ$ is either a linear combination or multiplication. Such computations are called algebraic circuits or straight-line programs. The goal of algebraic complexity is to understand the optimal asymptotic complexity of computing a given polynomial family $(f_n(x_1,\ldots,x_{\operatorname{poly}(n)}))_{n=1}^{\infty }$, typically in terms of size (=number of instructions) and depth (the depth of the natural directed acyclic graph associated to the instruction sequence of a straight-line program) of algebraic circuits. In addition to the intrinsic interest in these questions, since Valiant's work [102–104] algebraic complexity has become more and more important for Boolean computational complexity. Valiant argued that understanding algebraic complexity could give new intuitions that may lead to better understanding of other models of computation (see also [108]); several direct connections have been found between algebraic and Boolean complexity [23, 48, 50, 74]; and the Geometric Complexity Theory Program (see, e.g., the overview [76] and references therein) suggests how algebraic techniques might be used to resolve major Boolean complexity conjectures.
Two central functions in this area are the determinant and permanent polynomials, which are fundamental both because of their prominent role in many areas of mathematics and because they are complete for various natural complexity classes. In particular, the permanent of $\lbrace 0,1\rbrace$-matrices is $\mathsf {\# P}$-complete, and the permanent of arbitrary matrices is $\mathsf {VNP}$-complete in odd characteristic. Valiant's Permanent versus Determinant Conjecture [102] states that the permanent of an $n \times n$ matrix, as a polynomial in $n^2$ variables, cannot be written efficiently as the determinant of any polynomially larger matrix all of whose entries are variables or constants. In some ways this is an algebraic analog of $\mathsf {P} \ne \mathsf {NP}$, although it is in fact much closer to $\mathsf {FNC}^2 \ne \mathsf {\# P}$. In addition to this analogy, the Permanent versus Determinant Conjecture is also known to be a formal consequence of the nonuniform lower bound $\mathsf {NP} \not\subseteq \mathsf {P/poly}$ [23] and is thus thought to be an important step towards showing $\mathsf {P} \ne \mathsf {NP}$.
Unlike in Boolean circuit complexity, (slightly) non-trivial lower bounds for the size of algebraic circuits are known [9, 97]. Their methods, however, only give lower bounds up to $\Omega (n\log n)$. Moreover, their methods are based on a degree analysis of certain algebraic varieties and do not give lower bounds for polynomials of constant degree. Recent work [2, 55, 99] has shown that polynomial-size algebraic circuits computing functions of polynomial degree can in fact be computed by sub-exponential-size depth 4 algebraic circuits. Thus, strong-enough lower bounds for depth 4 algebraic circuits for the permanent would already prove $\mathsf {VP} \ne \mathsf {VNP}$.
Effective Nullstellensatzë. A special case of Hilbert's famous Nullstellensatz says that over a field , a set of polynomials $F_1(x_1,\ldots, x_n)$, $\ldots\,$, $F_m(x_1,\ldots, x_n)$ of degree at most $d$ has no common zero over the algebraic closure if and only if the ideal they generate contains 1 or, equivalently, if there exists polynomials $G_i$ such that $\sum G_i F_i = 1$. Doubly exponential upper bounds on the degree of the $G_i F_i$ were shown as early as 1983 [68], of the form $d^{O(2^n)}$. These were later improved to singly exponential bounds of the form $O(d^n)$ [20] (in characteristic zero) and subsequently improved to bounds that hold over an arbitrarily algebraically closed field, have tighter dependence on the degrees of each $F_i$, and are essentially tight [56]. Since then, more refined geometric information than merely degree bounds has been obtained by a number of authors [21, 30, 49, 57, 96]. For a good overview of this work, see the introduction of Ein and Lazarsfeld [30] and references therein.
In this article, we raise the question of extending Effective Nullstellensatzë from degree bounds to bounds on the algebraic circuit complexity of Nullstellensatz certificates. (This question was perhaps implicit in Pitassi [81, 82], and a syntactically more complicated variant of this question was raised in Grigoriev-Hirsch [35].) It has long been known [42]—although perhaps not well known—that bounds on algebraic circuit complexity can have geometric consequences; indeed, this is one of the philosophical underpinnings of the current Geometric Complexity Theory Program towards resolving questions like $\mathsf {P}$ versus $\mathsf {NP}$ (see, e.g., References [36, 64, 75–78] and references therein). Here, we show that the algebraic circuit complexity of the Nullstellensatz has deep connections to Boolean proof complexity, and we use ideas motivated by this question to forge new bridges between proof complexity and computational complexity.
Proof Complexity. Despite considerable progress obtaining super-polynomial lower bounds for many weak proof systems (resolution [40], cutting planes [17], and bounded-depth Frege systems [58]), there has been essentially no progress in the last 25 years for stronger proof systems such as Extended Frege systems or Frege systems. More surprisingly, no nontrivial lower bounds are known for the seemingly weak $\mathsf {AC}^0[p]$-Frege system. In contrast, the analogous result in circuit complexity—proving super-polynomial $\mathsf {AC}^0[p]$ lower bounds for an explicit function—was resolved by Razborov and Smolensky over 25 years ago [86, 94]. To date, there has been no satisfactory explanation for this state of affairs.
In proof complexity, there are no known formal barriers such as relativization [8], Razborov–Rudich-natural proofs [87], or algebrization [1] that exist in Boolean function complexity. Moreover, there has not even been progress by way of conditional lower bounds. That is, trivially $\mathsf {NP} \ne \mathsf {coNP}$ implies superpolynomial lower bounds for $\mathsf {AC}^0[p]$-Frege, but we know of no weaker complexity assumption that implies such lower bounds. The only formal implication in this direction shows that certain circuit lower bounds imply lower bounds for proof systems that admit feasible interpolation, but unfortunately only weak proof systems (not Frege nor even $\mathsf {AC}^0$-Frege) have this property, under standard complexity-theoretic assumptions [18, 19]. In the converse direction, there are essentially no implications at all. For example, we do not know if $\mathsf {AC}^0[p]$-Frege lower bounds—nor even Frege nor Extended Frege lower bounds—imply any nontrivial circuit lower bounds.
In this article, we define a simple and natural proof system that we call the Ideal Proof System (IPS) based on Hilbert's Nullstellensatz. Our system is similar in spirit to related algebraic proof systems that have been studied previously but is different in a crucial way that we explain below.
Given a set of polynomials $F_1,\ldots ,F_m$ in $n$ variables $x_1,\ldots ,x_n$ over a field without a common zero over the algebraic closure of , Hilbert's Nullstellensatz says that there exist polynomials such that $\sum F_i G_i =1$, i.e., that 1 is in the ideal generated by the $F_i$. In the Ideal Proof System, we introduce new variables $y_i$ that serve as placeholders into which the original polynomials $F_i$ will eventually be substituted:
An $\text{IPS}$ certificate that a system of -polynomial equations $F_1(\vec{x})=F_2(\vec{x}) = \cdots = F_m(\vec{x}) = 0$ is unsatisfiable over is an -polynomial $C(\vec{x}, \vec{y})$ in the variables $x_1,\ldots ,x_n$ and $y_1,\ldots ,y_m$ such that
The first condition is equivalent to $C$ being in the ideal generated by $y_1,\ldots, y_m$, and the two conditions together therefore imply that 1 is in the ideal generated by the $F_i$, and hence that $F_1(\vec{x}) = \cdots = F_m(\vec{x})=0$ is unsatisfiable.
An $\text{IPS}$ proof of the unsatisfiability of the polynomials $F_i$ is an -algebraic circuit on inputs $x_1,\ldots ,x_n,y_1,\ldots ,y_m$ computing some $\text{IPS}$ certificate of unsatisfiability.
For any class $\mathcal {C}$ of polynomial families, we may speak of $\mathcal {C}$-$\text{IPS}$ proofs of a family of systems of equations $(\mathcal {F}_n)$, where $\mathcal {F}_n$ is $F_{n,1}(\vec{x}) = \cdots = F_{n,\operatorname{poly}(n)}(\vec{x}) = 0$. When we refer to $\text{IPS}$ without further qualification, we mean $\text{IPS}$ certificates whose proofs are computed by circuits of polynomial size (with no a priori bound on the degree), unless specified otherwise.1
The Ideal Proof System is easily shown to be sound, and (without any size bounds) its completeness follows from the Nullstellensatz.
We note that although the Nullstellensatz says that if $F_1(\vec{x}) = \cdots = F_m(\vec{x}) = 0$ is unsatisfiable, then there always exists a certificate that is linear in the $y_i$—that is, of the form $\sum y_i G_i(\vec{x})$—our definition of $\text{IPS}$ certificate does not enforce $\vec{y}$-linearity. The definition of $\text{IPS}$ certificate allows certificates with $\vec{y}$-monomials of higher degree, and it is conceivable that one could achieve a savings in size by considering such certificates rather than only considering $\vec{y}$-linear ones. (Subsequent to this work it was shown [34] that, for general IPS, super-polynomial savings are not possible; but the result does not rule out a savings for $\mathcal {C}$-$\text{IPS}$ proofs for various restricted classes $\mathcal {C}$; see Section 8.2 below for details.) As the linear form is closer to the original way Hilbert expressed the Nullstellensatz (see, e.g., the translation [44]), we refer to certificates of the form $\sum y_i G_i(\vec{x})$ as Hilbert-like $\text{IPS}$ certificates.
We typically consider $\text{IPS}$ as a propositional proof system by translating a CNF tautology $\varphi$ into a system of equations as follows. We translate a clause $\kappa$ of $\varphi$ into a single algebraic equation $F(\vec{x})$ as follows: $x \mapsto 1-x$, $x \vee y \mapsto xy$. This translation has the property that a $\lbrace 0,1\rbrace$ assignment satisfies $\kappa$ if and only if it satisfies the equation $F = 0$. Let $\kappa _1,\ldots, \kappa _m$ denote all the clauses of $\varphi$, and let $F_i$ be the corresponding polynomials. Then the system of equations we consider is $F_1(\vec{x}) = \cdots = F_m(\vec{x}) = x_1^2 - x_1 = \cdots = x_n^2 - x_n = 0$. The latter equations force any solution to this system of equations to be $\lbrace 0,1\rbrace$-valued. Despite our indexing here, when we speak of the system of equations corresponding to a tautology, we always assume that the $x_i^2 - x_i$ are among the equations, unless explicitly stated otherwise (and, indeed, there are a few situations where we do not need the equations $x_i^2 - x_i$).
Like previously defined algebraic systems [14, 27, 81, 82], proofs in our system can be checked in randomized polynomial time. The key difference between our system and previously studied ones is that those systems are axiomatic in the sense that they require that every sub-computation (derived polynomial) be in the ideal generated by the original polynomial equations $F_i$ and thus be a sound consequence of the equations $F_1=\cdots =F_m=0$. In contrast our system has no such requirement: An $\text{IPS}$ proof can compute potentially “unsound” sub-computations (whose vanishing does not follow from $F_1=\cdots =F_m=0$), as long as the final polynomial is in the ideal generated by the equations. This key difference allows $\text{IPS}$ proofs to be ordinary algebraic circuits, and thus nearly all results in algebraic circuit complexity apply directly to the Ideal Proof System. To quote the tagline of a common US food chain, the Ideal Proof System is a “No rules, just right” proof system.
Our first main theorem shows one of the advantages of this close connection with algebraic circuits. To the best of our knowledge, this is the first implication showing that a proof complexity lower bound implies any sort of computational complexity lower bound.
Super-polynomial lower bounds for the Ideal Proof System imply that the permanent does not have polynomial-size algebraic circuits, that is, $\mathsf {VNP} \ne \mathsf {VP}$.
The preceding theorem is perhaps somewhat unsurprising—though not completely immediate—given the definition of $\text{IPS}$, because of the close connection between the definition of $\text{IPS}$ proofs and algebraic circuits. However, the following result is significantly more surprising—showing a relation between a standard rule-based algebraic proof system and algebraic circuit lower bounds—and we believe we would not have come to this result had we not first considered the rule-less Ideal Proof System.
Super-polynomial lower bounds on the number of lines in Polynomial Calculus proofs imply the Permanent versus Determinant Conjecture. 2 , 3
Corollary 1.3 follows from the proof of Theorem 1.2 together with one of our simulation results (Proposition 3.4).
Under a reasonable assumption on polynomial identity testing, which we discuss further below, we are able to show that Extended Frege is equivalent to the Ideal Proof System. Polynomial Identity Testing (PIT) is the problem of deciding whether a given algebraic circuit computes the identically zero polynomial or not; unless otherwise stated, we take PIT to allow arbitrary circuits as input, with no restriction on their degree. Even without degree restriction, the standard randomized algorithm places PIT into $\mathsf {coRP}$ by working over an extension field of polynomial degree if needed [29, 89, 109]. Extended Frege (EF) is the strongest natural deduction-style propositional proof system that has been proposed and is the proof complexity analog of $\mathsf {P/poly}$ (that is, Extended Frege = $\mathsf {P/poly}$-Frege).
Let $K$ be a family of polynomial-size Boolean circuits for PIT such that the PIT axioms for $K$ (see Definition 5.1) have polynomial-size EF proofs. Then EF polynomially simulates $\text{IPS}$, and hence EF and $\text{IPS}$ are polynomially equivalent.
In light of Theorem 1.4, a promising direction for proving EF lower bounds is to try to prove lower bounds for IPS instead. Since this suggestion seems counter to the usual philosophy of trying to prove lower bounds on the next-hardest proof system, and proving IPS lower bounds may be much harder, this suggestion deserves a little discussion. First, by considering $\mathcal {C}$-IPS for restricted circuit classes $\mathcal {C}$, we recover some of the usual philosophy of trying to prove lower bounds on incrementally harder proof systems first rather than jumping straight to (full) IPS lower bounds.
Second, and more importantly, IPS gives a new way of thinking about propositional proof systems and creates the possibility of harnessing tools from algebra, representation theory, and algebraic circuit complexity. Indeed, tools from algebraic circuit complexity have already been used to prove lower bounds for some restricted IPS systems [34] , and in Section 6 we give one suggestion of how to apply tools from algebraic geometry to obtain IPS lower bounds.
The combination of Thereoms 1.2 and 1.4 together state that if the PIT axioms are provable in EF, then EF lower bounds imply circuit lower bounds ($\mathsf {VNP} \ne \mathsf {VP}$). The hypothesis that the PIT axioms are provable in EF is orthogonal but still closely related to the more standard hypothesis that PIT is in $\mathsf {P}$. Since upper bounds on PIT are also known to imply lower bounds, we would like to address the differences between the two conclusions. The best lower bound known to follow from PIT $\in \mathsf {P}$ is an algebraic circuit-size lower bound on an integer polynomial that can be evaluated in $\mathsf {NEXP} \cap \mathsf {coNEXP}$ [25, 48] (via personal communication we have learned that Impagliazzo and R. Williams have also proved similar results), whereas our conclusion is a lower bound on algebraic circuit-size for an integer polynomial computable in the much smaller class $\mathsf {\# P} \subseteq \mathsf {PSPACE}$.
Although PIT has long been a central problem of study in computational complexity—both because of its importance in many algorithms, as well as its strong connection to circuit lower bounds—our theorems highlight the importance of PIT in proof complexity. Next we prove that Theorem 1.4 can be scaled down to obtain similar results for weaker Frege systems and discuss some of its more striking consequences.
Let $\mathcal {C}$ be any of the standard circuit classes $\mathsf {AC}^k$, $\mathsf {AC}^k[p]$, $\mathsf {ACC}^k$, $\mathsf {TC}^k$, $\mathsf {NC}^k$. Let $K$ be a family of polynomial-size Boolean circuits for PIT (not necessarily in $\mathcal {C}$) such that the PIT axioms for $K$ have polynomial-size $\mathcal {C}$-Frege proofs. Then $\mathcal {C}$-Frege is polynomially equivalent to $\text{IPS}$ and, consequently, to Extended Frege as well.
Theorem 1.6 also highlights the importance of our PIT axioms for getting $\mathsf {AC}^0[p]$-Frege lower bounds, which has been an open question for nearly 30 years. (For even weaker systems, Theorem 1.6 in combination with known results yields an unconditional lower bound on $\mathsf {AC}^0$-Frege proofs of the PIT axioms.) In particular, we are in the following win-win scenario:
For any $d$, either:
Finally, in Section 6 we show what obstacles must be overcome in any attempt to extend proof techniques from lower bounds on ($\mathcal {C}$-)algebraic circuits to lower bounds on ($\mathcal {C}$-)IPS proofs—which may also apply to Extended Frege via Theorem 1.4. We then use the algebraic structure of $\text{IPS}$ to suggest a new approach to proving lower bounds that we feel has promise. In particular, the set of all $\text{IPS}$-certificates for a given unsatisfiable system of equations is, in a certain precise sense, “finitely generated.” We suggest how one might take advantage of this finite generation to transfer techniques from algebraic circuit complexity to prove lower bounds on $\text{IPS}$, and, consequently, on Extended Frege (since $\text{IPS}$ p-simulates Extended Frege unconditionally), giving hope for the long-sought length-of-proof lower bounds on an algebraic proof system. We hope to pursue this approach in future work.
Other proof systems. We will see in Section 3.3 that many previously studied proof systems can be p-simulated by $\text{IPS}$ and, furthermore, can be viewed simply as different complexity measures on $\text{IPS}$ proofs or as $\mathcal {C}$-$\text{IPS}$ for certain classes $\mathcal {C}$. In particular, the Nullstellensatz system [14], the Polynomial Calculus (or Gröbner) proof system [27], and Polynomial Calculus with Resolution [4] are all particular measures on $\text{IPS}$, and Pitassi's previous algebraic systems [81, 82] are subsystems of $\text{IPS}$.
Raz and Tzameret [85] introduced various multilinear algebraic proof systems. Although their systems are not so easily defined in terms of $\text{IPS}$, the Ideal Proof System nonetheless p-simulates all of their systems. Among other results, they show that a super-polynomial separation between two variants of their system—one representing lines by multilinear circuits, and one representing lines by general algebraic circuits—would imply a super-polynomial separation between general and multilinear circuits computing multilinear polynomials. However, they only get implications to lower bounds on multilinear circuits rather than general circuits, and they do not prove a statement analogous to our Theorem 1.2, that lower bounds on a single system imply algebraic circuit lower bounds.
Grigoriev and Hirsch [35] introduced two proof systems, F-NS and F-PC, analogous to Nullstellensatz and Polynomial Calculus, respectively, but in which the basic objects of the proofs are allowed to be algebraic formulae, rather than only sums of monomials, and equivalence of formulae must be verified line-by-line using the axioms for a polynomial ring (associativity, commutativity, distributivity, etc.). Again, although these systems are not so easily defined in terms of IPS, they are easily p-simulated by IPS; indeed, in IPS the standard axioms for a polynomial ring come nearly for free—the main cost is that the verification is randomized instead of deterministic. They did not draw connections between algebraic circuit lower bounds and lower bounds on their proof systems.
Finally, we mention that Hrubeš and Tzameret [46] have studied the proof complexity of polynomial identity testing (PIT). In particular, they studied the question of how many basic ring identities—associativity, distributivity, and so on—are needed to verify a polynomial identity. While one of the messages of our article is the importance of the proof complexity of PIT, the way it shows up in our work is in proving that a Boolean circuit deciding PIT is correct (see our PIT axioms, Definition 5.1), whereas in Hrubeš and Tzameret [46] they study the complexity of proving individual polynomial identities.
Ideal Membership and Effective Nullstellensatzë. Prior to our work, much work was done on bounds for the Ideal Membership Problem ($\mathsf {EXPSPACE}$-complete [70, 71]), the so-called Effective Nullstellensatz (where exponential degree bounds are known, and known to be tight [20, 30, 56, 96]), and the arithmetic Nullstellensatz over , where one wishes to bound not only the degree of the polynomials but also the sizes of the integer coefficients appearing [61]. The viewpoint afforded by the Ideal Proof Systems raises new questions about potential strengthening of these results (or at least simplifies and highlights questions that were implicit in References [35, 81, 82]).
In particular, the following is a natural extension of Definition 1.1.
An $\text{IPS}$ certificate that a polynomial is in the ideal (respectively, radical of the ideal) generated by $F_1(\vec{x}),\ldots, F_m(\vec{x})$ is a polynomial $C(\vec{x}, \vec{y})$ such that
An $\text{IPS}$ derivation of $G$ (respectively, $G^k$) from $F_1,\ldots, F_m$ is a circuit computing some $\text{IPS}$ certificate that $G \in \langle F_1,\ldots, F_m \rangle$ (respectively, $G \in \sqrt {\langle F_1,\ldots, F_m \rangle }$).
Grigoriev and Hirsch [35, Section 2.5] introduced a related system, denoted (F-)PC$\sqrt {}$, for proving that a polynomial is in the radical of an ideal. The key difference between (F-)PC$\sqrt {}$ and (F-)PC being that they add the rule from which derives a polynomial $P$ from $P^2$. But otherwise, their system has similar tradeoffs relative to IPS: On the one hand, their system can be deterministically verified; on the other hand, it is restricted to syntactic derivations.
There is no sub-exponential ($\bigcap _{\varepsilon \gt 0} O(2^{n^{\varepsilon }})$) upper bound on the size of constant-free circuits computing $\text{IPS}$-certifices of ideal membership. Similarly for general algebraic circuits in characteristic zero, assuming the Generalized Riemann Hypothesis (GRH).
Under special circumstances, of course, one may be able to achieve better upper bounds.
Suppose that for every $G(\vec{x}) \in \langle F_1(\vec{x}),\ldots, F_m(\vec{x}) \rangle$ there were a constant-free circuit of sub-exponential size computing some $\text{IPS}$ certificate for the membership of $G$ in $\langle F_1,\ldots, F_m \rangle$. Then guessing that circuit and verifying its correctness using PIT gives a $\mathsf {MA}_{\text{subexp}} \subseteq \mathsf {SUBEXPSPACE}$ algorithm for the Ideal Membership Problem. The $\mathsf {EXPSPACE}$-completeness of Ideal Membership [70, 71] would then imply that $\mathsf {EXPSPACE} \subseteq \mathsf {SUBEXPSPACE}$, contradicting the Space Hierarchy Theorem [41]. In characteristic zero, if we assume GRH, we may drop the assumption that the circuits are constant free, using essentially the same argument as in Proposition 3.2.
The preceding observation, however, does not seem to apply to effective Nullstellensatzë, which are generally about showing that a function $G$ is in the radical of an ideal (which, in particular, always applies for $G=1$). We thus raise the following question:
For any $G(\vec{x}) \in \sqrt {\langle F_1(\vec{x}),\ldots, F_m(\vec{x}) \rangle }$, is there always an $\text{IPS}$-certificate, as in Definition 1.8, of sub-exponential size that $G$ is in the radical of $\langle F_1,\ldots, F_m \rangle$? Similarly, for , is there a constant-free -certificate of sub-exponential size that $aG(\vec{x})$ is in the radical of the ideal $\langle F_1,\ldots, F_m \rangle$ for some integer $a$?
In Section 2, we give the necessary preliminaries from algebraic circuit complexity, proof complexity, and commutative algebra. We really begin in Section 3 by proving several basic facts about $\text{IPS}$. We discuss the relationship between $\text{IPS}$ and previously studied proof systems. We also highlight several consequences of results from algebraic complexity theory for the Ideal Proof System, such as division elimination [98] and the chasms at depths 3 [39, 99] and 4 [2, 55, 99].
In Section 4, we prove that lower bounds on $\text{IPS}$ imply algebraic circuit lower bounds (Theorem 1.2). We also show how this result gives as a corollary a new, simpler proof that over any field (Corollary 4.1). In Section 5 we introduce our PIT axioms in detail and prove Theorems 1.4 and 1.6.
We also discuss in detail many variants of Theorem 1.6 and their consequences, as briefly mentioned above. In Section 6, we show what obstacles need to be overcome to extend lower bounds from algebraic circuit complexity to (algebraic) proof complexity; we also suggest a new framework for transferring techniques in this direction. Finally, in Section 7, we gather a long list of open questions raised by our work, many of which we believe may be quite approachable.
In Section 8, we discuss some developments that occurred subsequent to the appearance of the preliminary version of this article [37]. Namely, Li, Tzameret, and Wang [65] showed—along the lines suggested in Section 5—that non-commutative formula $\text{IPS}$ is unconditionally quasi-polynomially equivalent to Frege. We discuss their result and its significance for proving Frege lower bounds. Also, Forbes, Shpilka, Tzameret, and Wigderson [34] showed several fundamental results about IPS as well as how to transfer some techniques from circuit complexity to prove lower bounds on some simple polynomials in $\mathcal {C}$-IPS for various $\mathcal {C}$.
In Appendices A and B, we introduce two variants of the Ideal Proof System—one of which allows certificates to be rational functions and not only polynomials and one of which has a more geometric flavor—and discuss their relationship to $\text{IPS}$. These systems further suggest that tools from geometry and algebra could potentially be useful for understanding the complexity of various propositional tautologies and more generally the complexity of individual instances of $\mathsf {NP}$-complete problems.
As general references, we refer the reader to Sipser [93] or Arora–Barak [5] for Boolean computational complexity, to Bürgisser–Clausen–Shokrollahi [24] and two surveys [26, 92] for algebraic complexity, to Krajíček [59] for proof complexity, and to any of the standard books [6, 31, 69, 88] for commutative algebra.
We use $\operatorname{poly}(n)$ as a synonym for $n^{O(1)}$, i.e., any function that is eventually bounded by $n^k$ for some $k$. Different instances of “$\operatorname{poly}(n)$,” even in the same sentence, may mean different polynomials. We use the quantifier $\exists ^p$ to mean “there exists a string of length $\operatorname{poly}(n)$” and $\forall ^p$ to mean “for all strings of length $\operatorname{poly}(n)$.” Similarly, $Pr(X)$ denotes the probability of an event $X$, and $Pr^p_r(X)$ denotes the probability of the event $X$, taken over a uniformly random choice of strings $r$ of length $\operatorname{poly}(n)$.
Over a ring $R$, $\mathsf {VP}_{R}$ is the class of families $f=(f_n)_{n=1}^{\infty }$ of formal polynomials—that is, considered as symbolic polynomials rather than as functions—$f_n$ such that $f_n$ has $\operatorname{poly}(n)$ input variables, is of $\operatorname{poly}(n)$ degree, and can be computed by algebraic circuits over $R$ of $\operatorname{poly}(n)$ size. $\mathsf {VNP}_{R}$ is the class of families $g$ of polynomials $g_n$ such that $g_n$ has $\operatorname{poly}(n)$ input variables and is of $\operatorname{poly}(n)$ degree and can be written as
A linear combination gate is a gate $g$, with incoming edges from gates $f_1,\ldots, f_k$, and with scalar weights on its incoming edges, which computes the linear combination $\sum _i w_i f_i$. For the definitions of $\mathsf {VP}$ and $\mathsf {VNP}$ it does not matter whether we use gates of bounded fan-in or unbounded fan-in, and whether we allow general linear combination gates or merely addition gates (with no weights). But when we consider families of algebraic circuits of bounded-depth, we will by default allow linear combination gates and product gates of unbounded fan-in.
A family of algebraic circuits is said to be constant free if the only constants used in the circuit are $\lbrace 0,1,-1\rbrace$. Other constants can be used but must be built up using algebraic operations, which then count towards the size of the circuit. The class $\mathsf {VP}^0$ is defined by restricting the circuits used in the definition of $\mathsf {VP}$ to be constant free and similarly for $\mathsf {VNP}^0$. We note that over a fixed finite field , , since there are only finitely many possible constants. Consequently, as well. Over the integers, coincides with those families in that are computable by algebraic circuits of polynomial total bit-size: Note that any integer of polynomial bit-size can be constructed by a constant-free circuit by using its binary expansion $b_n \cdots b_1 = \sum _{i=0}^{n-1} b_i 2^i$, and computing the powers of 2 by linearly many successive multiplications. A similar trick shows that over the algebraic closure of a finite field, coincides with those families in that are computable by algebraic circuits of polynomial total bit-size or equivalently where the constants they use lie in subfields of of total size bounded by $2^{n^{O(1)}}$. (Recall that is a subfield of whenever $a | b$, and that the algebraic closure is just the union of over all integers $a$.)
A polynomial $f(\vec{x})$ is a projection of a polynomial $g(\vec{y})$ if $f(\vec{x}) = g(L(\vec{x}))$ identically as polynomials in $\vec{x}$, for some map $L$ that assigns to each $y_i$ either a variable or a constant. A family of polynomials $(f_n)$ is a polynomial projection or p-projection of another family $(g_n)$, denoted $(f_n) \le _{p} (g_n)$, if there is a function $t(n) = n^{\Theta (1)}$ such that $f_n$ is a projection of $g_{t(n)}$ for all (sufficiently large) $n$. The primary value of projections is that they are very simple and thus preserve bounds on nearly all natural complexity measures. Valiant [102, 104] was the first to point out not only their value but also their ubiquity in computational complexity—nearly all problems that are known to be complete for some natural class, even in the Boolean setting, are complete under p-projections. We say that two families $f=(f_n)$ and $g=(g_n)$ are of the same p-degree if each is a p-projection of the other, which we denote $f \equiv _{p} g$.
Despite its central role in computation, and the fact that $\mathsf {VP} = \mathsf {VNC}^2$ [105], the determinant is not known to be $\mathsf {VP}$-complete under p-projections. The determinant is $\mathsf {VQP}$-complete ($\mathsf {VQP}$ is defined just like $\mathsf {VP}$ but with a quasi-polynomial $n^{(\log n)^{O(1)}}$ bound on the size and degree of the circuits) under qp-projections (like p-projections, but with a quasi-polynomial bound). The complexity of the determinant is clarified by skew and weakly skew circuits. An algebraic circuit is skew if every multiplication gate has only two inputs, one of which is a variable or a constant. An algebraic circuit is weakly skew if every multiplication gate has at least one input that is computed solely for the purposes of that multiplication gate; more precisely, each multiplication gate $f = g_0 \times g_1$ has the property that for at least one of its input $g_i$, removing the edge connecting $g_i$ to $f$ in the directed acyclic graph corresponding to the circuit results in a disconnected graph. $\mathsf {VP}_s$ is the class of polynomials of $\operatorname{poly}(n)$ variables computed by skew circuits of $\operatorname{poly}(n)$ size; $\mathsf {VP}_{ws}$ is defined analogously with “skew” replaced by “weakly skew.” In both these classes, $\operatorname{poly}(n)$ bounded degree follows from the definition for free, thus $\mathsf {VP}_s \subseteq \mathsf {VP}$ and $\mathsf {VP}_{ws} \subseteq \mathsf {VP}$. Let $\mathsf {VP}_{\det }$ denote the class of polynomials that are p-projections of the determinant. It turns out that $\mathsf {VP}_s = \mathsf {VP}_{ws} = \mathsf {VP}_{\det }$ [101] (see also Malod and Portier [67]). We will use weakly skew circuits and $\mathsf {VP}_{\det }$ in Proposition 3.4.
The semantic degree of any gate in an algebraic circuit is just the degree of the polynomial it computes; the semantic degree of a (single-output) algebraic circuit is the semantic degree of its output gate. The syntactic degree of an algebraic circuit is defined inductively as follows: The syntactic degree of a constant is 0; the syntactic degree of a variable is 1; the syntactic degree of a product gate with children $f_1,\ldots, f_k$ is the sum of the syntactic degrees of the $f_i$; and the syntactic degree of a sum or linear combination gate with children $f_1,\ldots, f_k$ is the maximum of the syntactic degrees of the $f_i$. Semantic degree can be exponentially smaller than syntactic degree due to cancellations.
Here we give formal definitions of proof systems and probabilistic proof systems for $\mathsf {coNP}$ languages and discuss several important and standard proof systems for TAUT.
Let $L \subseteq \lbrace 0,1\rbrace ^*$ be a $\mathsf {coNP}$ language. A proof system $P$ for $L$ is a polynomial-time function of two inputs $x,\pi \in \lbrace 0,1\rbrace ^*$ (“$\pi$” for “proof”) with the following properties:
$P$ is polynomially bounded if for every $x \in L$, there exists a $\pi$ such that $|\pi |\le \operatorname{poly}(|x|)$ and $P(x,\pi)=1$.
As this is just the definition of an $\mathsf {NP}$ procedure for $L$, it follows that for any $\mathsf {coNP}$-complete language $L$, $L$ has a polynomially bounded proof system if and only if $\mathsf {coNP} \subseteq \mathsf {NP}$.
Cook and Reckhow [28] formalized proof systems for the language TAUT (all Boolean tautologies) in a slightly different way, although their definition is essentialy equivalent to the one above. We mildly prefer the above definition as it is consistent with definitions of interactive proofs.
A Cook–Reckhow proof system is a polynomial-time function $P^{\prime }$ of just one input $\pi$, and whose range is the set of all yes-instances of $L$. If $x \in L$, then any $\pi$ such that $P^{\prime }(\pi)=x$ is called a $P^{\prime }$-proof of $x$. $P^{\prime }$ must satisfy the following properties:
(That is, the image of $P^{\prime }$ must be exactly $L$.)
$P^{\prime }$ is polynomially bounded if for every $x \in L$, there exists a $\pi$ such that $|\pi | \le \operatorname{poly}(|x|)$ and $P^{\prime }(\pi)=x$.
Intuitively, we think of $P^{\prime }$ as a procedure for verifying that $\pi$ is a proof that some $x \in L$ and if so, it outputs $x$. (For all strings $z$ that do not encode valid proofs, $P^{\prime }(z)$ may just output some fixed $x_0 \in L$.) It is a simple exercise to see that for every language $L$, any propositional proof system $P$ according to our definition can be converted to a Cook–Reckow proof system $P^{\prime }$, and vice versa, and furthermore the runtime properties of $P$ and $P^{\prime }$ will be the same. In the forward direction, say $P$ is a proof system for $L$ according to our definition. Define $\pi$ as encoding a pair $(x,\pi ^{\prime })$; on input $\pi =(x,\pi ^{\prime })$, $P^{\prime }$ runs $P$ on the pair $(x,\pi ^{\prime })$. If $P$ accepts, then $P^{\prime }(\pi)$ outputs $x$, and if $P$ rejects, then $P^{\prime }(\pi)$ outputs (the encoding of) a canonical $x_0$ in $L$. Conversely, say that $P^{\prime }$ is a Cook–Reckhow proof system for $L$. $P(x,\pi)$ runs $P^{\prime }$ on $\pi$ and accepts if and only if $P^{\prime }(\pi)=x$.
Let $P_1$ and $P_2$ be two proof systems for a language $L$ in $\mathsf {coNP}$. $P_1$ polynomially simulates or p-simulates $P_2$ if for every $x \in L$ and for every $\pi$ such that $P_2(x,\pi)=1$, there exists $\pi ^{\prime }$ such that $|\pi ^{\prime }|\le \operatorname{poly}(|\pi |)$, and $P_1(x,\pi ^{\prime })=1$.
Informally, $P_1$ p-simulates $P_2$ if proofs in $P_1$ are no longer than proofs in $P_2$ (up to polynomial factors).
Let $P_1$ and $P_2$ be two proof systems for a language $L$ in $\mathsf {coNP}$. $P_1$ and $P_2$ are polynomially equivalent or p-equivalent if $P_1$ p-simulates $P_2$ and $P_2$ p-simulates $P_1$.
Standard Propositional Proof Systems. For TAUT (or UNSAT), there are a variety of standard and well-studied proof systems, the most important ones including Extended Frege (EF), Frege, Bounded-depth Frege, and Resolution. A Frege rule is an inference rule of the form: $B_1, \ldots , B_n \Rightarrow B$, where $B_1,\ldots , B_n,B$ are propositional formulas. If $n=0$, then the rule is an axiom. For example, $A \vee \lnot A$ is a typical Frege axiom, and $A, \lnot A \vee B \Longrightarrow B$ is a typical Frege rule. A Frege system is specified by a finite set, $R$, of rules. Given a collection $R$ of rules, a derivation of a 3DNF formula $f$ is a sequence of formulas $f_1,\ldots ,f_m$ such that each $f_i$ is either an instance of an axiom scheme or follows from previous formulas by one of the rules in $R$ and such that the final formula $f_m$ is $f$. In order for a Frege system to be a proof system in the Cook–Reckhow sense, its corresponding set of rules must be sound and complete. Work by Cook and Reckhow in the 1970s [28] showed that Frege systems are very robust in the sense that all Frege systems are polynomially equivalent.
Bounded-depth Frege proofs ($\mathsf {AC}^0$-Frege) are Frege proofs but with the additional restriction that each formula in the proof has bounded depth. (Because our connectives are AND, OR, and negation, by depth we assume the formula has all negations at the leaves, and we count the maximum number of alternations of AND/OR connectives in the formula.) Polynomial-sized $\mathsf {AC}^0$-Frege proofs correspond to the complexity class $\mathsf {AC}^0$ because such proofs allow a polynomial number of lines, each of which must be “syntactically in $\mathsf {AC}^0$” (that is, syntactically it must be described by a bounded-depth circuit).
Bounded-depth Frege proofs with mod $p$ connectives ($\mathsf {AC}^0[p]$-Frege) are bounded-depth Frege proofs that also allow unbounded fan-in $MOD_p$ connectives, namely $MOD_p^i$ for $i \in \lbrace 0,\ldots,p-1\rbrace$. $MOD^i_p(x_1,\ldots, x_k)$ evaluates to true if the number of $x_i$ that are true is congruent to $i \pmod {p}$ and evaluates to false otherwise.
Extended Frege systems generalize Frege systems by allowing, in addition to all of the Frege rules, a new axiom of the form $y \leftrightarrow A$, where $A$ is a formula and $y$ is a new variable not occurring in $A$ nor in the final formula (i.e., the formula being proved). Whereas polynomial-size Frege proofs allow a polynomial number of lines, each of which must be a polynomial-size formula, using the new axiom, polynomial-size EF proofs allow a polynomial number of lines, each of which can be a polynomial-size circuit. See Krajíček [59] for precise definitions of Frege, $\mathsf {AC}^0$-Frege, and EF proof systems.
Probabilistic Proof Systems. The concept of a proof system for a language in $\mathsf {coNP}$ can be generalized in the natural way to obtain randomized Merlin–Arthur-style proof systems.
Let $L$ be a language in $\mathsf {coNP}$, and let $V$ (for “verifier”) be a probabilistic polynomial-time algorithm with two inputs $x,\pi \in \lbrace 0,1\rbrace ^*$. $V$ is a probabilistic proof system for $L$ if:
$P$ is polynomially bounded if for every $x \in L$, there exists $\pi$ such that $|\pi | \le \operatorname{poly}(|x|)$ and $Pr_r[P(x,\pi)=1] \ge 3/4$.
It is clear that for any $\mathsf {coNP}$-complete language $L$, there is a polynomially bounded probabilistic proof system for $L$ if and only if $\mathsf {coNP} \subseteq \mathsf {MA}$ (which implies the collapse of $\mathsf {PH}$).
Again, we have chosen to define our probabilitic proof systems to match the definition of $\mathsf {MA}$. The probabilistic proof system that would be analogous to the standard Cook–Reckhow proof system would be somewhat different, as defined below. Again, a simple argument like the one above shows that our probablistic proof systems are essentially equivalent to probabilistic Cook–Reckhow proof systems.
A probabilistic Cook–Reckhow proof system for a language $L \in \mathsf {coNP}$ is a probabilistic polynomial-time algorithm $A$ (whose runtime is independent of its random choices) such that
Such a proof system is polynomially bounded or p-bounded if for every $x \in L$, there is some $\pi$ such that $f(\pi)=x$ and $|\pi | \le \operatorname{poly}(|x|)$.
We note that both Pitassi's algebraic proof system [81] and the Ideal Proof System are probabilistic Cook–Reckhow systems. The algorithm $P$ takes as input a description of a (constant-free) algebraic circuit $C$ together with a tautology $\varphi$ and then verifies that the circuit is indeed an $\text{IPS}$-certificate for $\varphi$ by using the standard Schwartz–Zippel–DeMillo–Lipton [29, 89, 109] $\mathsf {coRP}$ algorithm for polynomial identity testing. The proof that Pitassi's algebraic proof system is a probabilistic Cook–Reckhow system is essentially the same.
The following preliminaries from commutative algebra are needed only in Section 6 and Appendix A.
A module over a ring $R$ is defined just like a vector space, except over a ring instead of a field. That is, a module $M$ over $R$ is a set with two operations: addition (making $M$ an abelian group) and multiplication by elements of $R$ (“scalars”), satisfying the expected axioms (see any textbook on commutative algebra, e.g., References [6, 31]). A module over a field is precisely a vector space over . Every ring $R$ is naturally an $R$-module (using the ring multiplication as the scalar multiplication), as is $R^{n}$, the set of $n$-tuples of elements of $R$. Every ideal $I \subseteq R$ is an $R$-module—indeed, an ideal could be defined, if one desired, as an $R$-submodule of $R$—and every quotient ring $R/I$ is also an $R$-module, by $r \cdot (r_0 + I) = rr_0 + I$.
Unlike vector spaces, however, there is not so nice a notion of “dimension” for modules over arbitrary rings. Two differences will be particularly relevant in our setting. First, although every vector subspace of is finite-dimensional, hence finitely generated, this need not be true of every submodule of $R^n$ for an arbitrary ring $R$. Second, every (finite-dimensional) vector space $V$ has a basis, and every element of $V$ can be written as a unique -linear combination of basis elements, but this need not be true of every $R$-module, even if the $R$-module is finitely generated, as in the following example.
Let and consider the ideal $I = \langle x, y \rangle$ as an $R$-module. For clarity, let us call the generators of this $R$-module $g_1 = x$ and $g_2 = y$. First, $I$ cannot be generated as an $R$-module by fewer than two elements: If $I$ were generated by a single element, say, $f$, then we would necessarily have $x=r_1 f$ and $y=r_2 f$ for some $r_1,r_2 \in R$, and thus $f$ would be a common divisor of $x$ and $y$ in $R$ (here we are using the fact that $I$ is both a module and a subset of $R$). But the GCD of $x$ and $y$ is 1, and the only submodule of $R$ containing 1 is $R \ne I$. So $\lbrace g_1, g_2\rbrace$ is a minimum generating set of $I$. But not every element of $I$ has a unique representation in terms of this (or, indeed, any) generating set: For example, $xy \in I$ can be written either as $r_1 g_1$ with $r_1=y$ or $r_2 g_2$ with $r_2 = x$.
A ring $R$ is Noetherian if there is no strictly increasing, infinite chain of ideals . Fields are Noetherian (every field has only two ideals: the zero ideal and the whole field), as are the integers . Hilbert's Basis Theorem says that every ideal in a Noetherian ring is finitely generated. Hilbert's (other) Basis Theorem says that if $R$ is finitely generated, then so is the polynomial ring $R[x]$ (and hence so is any polynomial ring $R[\vec{x}]$). Quotient rings of Noetherian rings are Noetherian, so every ring that is finitely generated over a field (or more generally, over a Noetherian ring $R$) is Noetherian.
Similarly, an $R$-module $M$ is Noetherian if there is no strictly increasing, infinite chain of submodules . If $R$ is Noetherian as a ring, then it is Noetherian as an $R$-module. It is easily verified that finite direct sums of Noetherian modules are Noetherian, so if $R$ is a Noetherian ring, then it is a Noetherian $R$-module, and, consequently, $R^{n}$ is a Noetherian $R$-module for any finite $n$. Just as for ideals, every submodule of a Noetherian module is finitely generated.
For any field , if every propositional tautology has a polynomial-size constant-free -proof, then $\mathsf {NP} \subseteq \mathsf {coMA}$, and hence the polynomial hierarchy collapses to its second level.
This result and its proof are essentially the same as Pitassi [81, Theorem 4]; here we mainly take advantage of history that PIT and $\mathsf {coMA}$ are now much more standard than they were in 1996. We also note that the proof allows arbitrary fields, as long as one is careful about the use of constant freeness.
If we wish to drop the restriction of “constant free” (which, recall, is no restriction at all over a finite field), then we may do so either by using the Blum–Shub–Smale analogs of $\mathsf {NP}$ and $\mathsf {coMA}$ using essentially the same proof or over fields of characteristic zero using the Generalized Riemann Hypothesis (Proposition 3.2).
Merlin nondeterministically guesses the polynomial-size constant-free $\text{IPS}$ proof, and then Arthur must check conditions (1) and (2) of Definition 1.1. (We need constant free so that the algebraic proof has polynomial bit-size and thus can in fact be guessed by a Boolean Merlin.) Both conditions of Definition 1.1 are instances of Polynomial Identity Testing (PIT), which can thus be solved in randomized polynomial time by the standard Schwartz–Zippel–DeMillo–Lipton [29, 89, 109] $\mathsf {coRP}$ algorithm for PIT.
Over any field of characteristic zero, if every propositional tautology has a polynomial-size -proof, then $\mathsf {NP} \subseteq \mathsf {coAM}$, assuming the Generalized Riemann Hypothesis.
The key difference between this result and Proposition 3.1 is that we do not need to assume the proofs are constant free. The price we pay is the use of GRH and that we do not know how to improve this result from $\mathsf {coAM}$ to $\mathsf {coMA}$ (as in Proposition 3.1). We thank Pascal Koiran for the second half of the proof.
The key fact we will use is that deciding Hilbert's Nullstellensatz—that is, given a system of integer polynomials over , deciding if they have a solution over —is in $\mathsf {AM}$ [54]. Rather than looking at solvability of the original set of equations $F_1(\vec{x}) = \cdots = F_m(\vec{x}) = 0$, we consider solvability of a set of equations whose solutions describe all of the polynomial-size $\text{IPS}$-certficiates for $F$.
The equations we consider will come from a generic polynomial-size circuit; here we use the model of generic circuits from Mulmuley--Sohoni [77, Section 6]. The generic circuit will have depth $d \le \operatorname{poly}(n)$ and width $n+m \le w \le \operatorname{poly}(n)$, consisting of $d+1$ layers of gates, the 0th layer consisting of the inputs to a potential $\text{IPS}$ certificate—$x_1,\ldots, x_n, y_1,\ldots, y_m$—the $d$th layer consisting of a single output gate, and each intermediate layer containing $w$ nodes. Each node in level $\ell$ is connected to every node in level $\ell +1$. There are also new variables $z_{i,j,k}$. We define the circuit to compute as follows: We use $f_k$ to denote the function computed at gate $k$. If $k$ is an input gate, then $f_k$ is equal to the appropriate input variable; otherwise, $f_k(\vec{x}, \vec{y}, \vec{z}) \stackrel{\textit{def}}{=}\sum _{i,j} z_{i,j,k} f_i f_j$, where the sum is over all pairs of gates $i,j$ in the layer immediately preceding the layer of $k$. The output gate of this generic circuit computes a polynomial $C(\vec{x}, \vec{y}, \vec{z})$, and for any setting of the $z_{i,j,k}$ variables to constants $\zeta _{i,j,k}$, we get a particular polynomial $C_{\vec{\zeta }}(\vec{x}, \vec{y}) \stackrel{\textit{def}}{=}C(\vec{x}, \vec{y}, \vec{\zeta })$ that is easily seen to be computed by circuits of polynomial size. Furthermore, any function computed by a polynomial-size circuit is equal to $C_{\vec{\zeta }}(\vec{x},\vec{y})$ for some setting of $\vec{\zeta }$. In particular, there is a polynomial-size $\text{IPS}$ proof $C^{\prime }$ for $F$ if and only if there is some such that $C^{\prime } = C_{\vec{\zeta }}(\vec{x}, \vec{y})$.
We will translate the conditions that a circuit be an $\text{IPS}$ certificate into equations on the new $z$ variables. Pick sufficiently many random values $\vec{\xi }^{(1)}, \vec{\xi }^{(2)},\ldots, \vec{\xi }^{(h)}$ to be substituted into $\vec{x}$. Heintz and Schnorr [42, Theorem 4.4] showed that by picking $h \sim \operatorname{poly}(n)$ random values from $[N]^{n}$ for some $N \le \exp (n^{O(1)})$ (which therefore have $\operatorname{poly}(n)$ bit-size), with high probability $\lbrace \vec{\xi }^{(1)},\ldots, \vec{\xi }^{(h)}\rbrace$ will be a hitting set against all $n$-variable polynomials of circuit-size $\le \operatorname{poly}(n)$. Then we consider the solvability of the following set of $2h$ equations in $\vec{z}$:
Composing Koiran's $\mathsf {AM}$ algorithm for the Nullstellensatz with the random guesses for the $\vec{\xi }^{(i)}$, and assuming that every family of propositional tautologies has polynomial-size $\text{IPS}$ certificates, we get an $\mathsf {AM}$ algorithm for TAUT.
Recently, many strong depth reduction theorems have been proved for circuit complexity [2, 39, 55, 99], which have been called “chasms” since Agrawal and Vinay [2]. In particular, they imply that sufficiently strong lower bounds against depth 3 or 4 circuits imply super-polynomial lower bounds against arbitrary circuits. Since an $\text{IPS}$ proof is just a circuit, these depth reduction chasms apply equally well to $\text{IPS}$ proof size. Note that it was not clear to us how to adapt the proofs of these chasms to proofs in the Polynomial Calculus or other previous algebraic systems [82], and indeed this was part of the motivation to move to our more general notion of $\text{IPS}$ proof.
If a system of $\operatorname{poly}(n)$ polynomial equations in $n$ variables has an $\text{IPS}$ proof of unsatisfiability of size $s=s(n)$ and (semantic) degree $d=d(n)$, then it also has the following:
This suggests that size lower bounds for $\text{IPS}$ proofs in restricted circuit classes would be interesting, even for restricted kinds of depth 3 circuits.
Similarly, since $\text{IPS}$ proofs are just circuits, any $\text{IPS}$ certificate family of polynomially bounded degree that is computed by a polynomial-size family of algebraic circuits with divisions can also be computed by a polynomial-size family of algebraic circuits without divisions (follows from Strassen [98]). We note, however, that one could in principle consider $\text{IPS}$ certificates that were not merely polynomials, but even rational functions, under suitable conditions; divisions for computing these cannot always be eliminated. We discuss this “Rational Ideal Proof System,” the exact conditions needed, and when such divisions can be effectively eliminated in Appendix A.
Previously studied algebraic proof systems can be viewed as particular complexity measures on the Ideal Proof System, including the Polynomial Calculus (or Gröbner) proof system (PC) [27], Polynomial Calculus with Resolution (PCR) [4], the Nullstellensatz proof system [14], and Pitassi's algebraic systems [81, 82], as we explain below.
All of the previous algebraic proof systems are rule-based systems, in that they syntactically enforce the condition that every line of the proof is a polynomial in the ideal of the original polynomials $F_1(\vec{x}),\ldots, F_m(\vec{x})$. Typically, they do this by allowing two derivation rules: (1) from $G$ and $H$, derive $\alpha G + \beta H$ for $\alpha ,\beta$ constants and (2) from $G$, derive $Gx_i$ for any variable $x_i$. By “rule-based circuits,” we mean circuits with inputs $y_1,\ldots, y_m$ having linear combination gates and, for each $i=1,\ldots,n$, gates that multiply their input by $x_i$. In particular, rule-based circuits necessarily produce Hilbert-like certificates.
In Pitassi's 1998 system [82], a proof is a rule-based derivation of 1, as above, starting from the $F_i$, with size measured by number of lines. This is essentially the same as the Polynomial Calculus, but with size measured by the number of lines rather than by the total number of monomials appearing.
In Pitassi's 1996 system [81], a proof of the unsatisfiability of $F_1(\vec{x}) = \cdots = F_m(\vec{x}) = 0$ is a circuit computing a vector $(G_1(\vec{x}),\ldots, G_m(\vec{x}))$ such that $\sum _i F_i(\vec{x}) G_i(\vec{x}) = 1$. Size is measured by the size of the corresponding circuit.
Now we come to the definitions of previous algebraic proof systems in terms of complexity measures on the Ideal Proof System:
We prove the precise relationships between Pitassi's previous algebraic proof systems [81, 82] and $\text{IPS}$ next.
Recall from Section 2.1 the definitions of $\mathsf {VP}_{\det }$ and $\mathsf {VP}_{ws}$; for readability and ease of speech, we refer to $\mathsf {VP}_{\det }$-$\text{IPS}$ $=\mathsf {VP}_{ws}$-$\text{IPS}$ as “determinantal $\text{IPS}$” or “$\det$-$\text{IPS}$,” for short.
The number-of-lines measure on PC proofs—equivalent to Pitassi's 1998 algebraic proof system [82]—is p-equivalent to Hilbert-like $\det$-$\text{IPS}$ or $\mathsf {VP}_{ws}$-$\text{IPS}$.
Furthermore, Pitassi's 1996 algebraic proof system [81] is p-equivalent to Hilbert-like $\text{IPS}$.
In light of this proposition, we henceforth refer to the systems from Pitassi [81] and [82] as Hilbert-like $\text{IPS}$ and Hilbert-like $\det$-$\text{IPS}$, respectively. Pitassi [81, Theorem 5] showed that Hilbert-like $\text{IPS}$ p-simulates Polynomial Calculus and Frege. Essentially, the same proof shows that Hilbert-like $\text{IPS}$ p-simulates Extended Frege as well. Unfortunately, the proof of the simulation in Pitassi [81] does not seem to generalize to give a depth-preserving simulation. In Section 3.5, we show there is indeed a depth-preserving simulation (Theorem 3.5).
We start with the proof of the second statement, as its proof is a simpler version of the proof of the first statement.
Let $C$ be a proof in the 1996 system [81], namely a circuit computing $(G_1(\vec{x}),\ldots, G_m(\vec{x}))$. Then with $m$ product gates and a single fan-in-$m$ addition gate, we get a circuit $C^{\prime }$ computing the Hilbert-like $\text{IPS}$ certificate $\sum _{i=1}^{m} y_i G_i(\vec{x})$.
Conversely, if $C^{\prime }$ is a Hilbert-like $\text{IPS}$-proof computing the certificate $\sum _i y_i G_i^{\prime }(\vec{x})$, then by Baur–Strassen [9] there is a circuit $C$ of size at most $O(|C^{\prime }|)$ computing the vector $(\frac{\partial C^{\prime }}{\partial y_1},\ldots, \frac{\partial C^{\prime }}{\partial y_m}) = (G_1^{\prime }(\vec{x}),\ldots, G_m^{\prime }(\vec{x}))$, which is exactly a proof in the 1996 system. (Alternatively, more simply, but at slightly more cost, we may create $m$ copies of $C^{\prime }$, and in the $i$th copy of $C^{\prime }$ plug in 1 for one of the $y_i$ and 0 for all of the others.
The proof of the first statement takes a bit more work. At this point, the reader may wish to recall the definition of weakly skew circuit from Section 2.1.
Suppose we have a derivation of 1 from $F_1(\vec{x}),\ldots, F_m(\vec{x})$ in the 1998 system [82]. First, replace each $F_i(\vec{x})$ at the beginning of the derivation with the corresponding placeholder variable $y_i$. Since size in the 1998 system is measured by number of lines in the proof, this has not changed the size. Furthermore, the final step no longer derives 1 but rather derives an $\text{IPS}$ certificate. By structural induction on the two possible rules, one easily sees that this is in fact a Hilbert-like $\text{IPS}$-certificate. Convert each linear combination step into a linear combination gate and each “multiply by $x_i$” step into a product gate one of whose inputs is a new leaf with the variable $x_i$. As we create a new leaf for every application of the product rule, these new leaves are clearly cut off from the rest of the circuit by removing their connection to their product gate. As these are the only product gates introduced, we have a weakly skew circuit computing a Hilbert-like $\text{IPS}$ certificate.
The converse takes a bit more work, so we first show that a Hilbert-like formula-$\text{IPS}$ proof can be converted at polynomial cost into a proof in the 1998 system [82] and then explain why the same proof works for $\mathsf {VP}_{ws}$-$\text{IPS}$. This proof is based on a folklore result (see the remark after Definition 2.6 in Raz–Tzameret [85]); we thank Iddo Tzameret for a conversation clarifying it, which led us to realize that the result also applies to weakly skew circuits.
Let $C$ be a formula computing a Hilbert-like $\text{IPS}$-certificate $\sum _{i=1}^{m} y_i G_i(\vec{x})$. Using the trick above of substituting in $\lbrace 0,1\rbrace$-values for the $y_i$ (one 1 at a time), we find that each $G_i(\vec{x})$ can be computed by a formula $\Gamma _i$ no larger than $|C|$. For each $i$ we show how to derive $F_i(\vec{x}) G_i(\vec{x})$ in the 1998 system. These can then be combined using the linear combination rule. Thus, for simplicity, we drop the subscript $i$ and refer to $y$, $F(\vec{x})$, $G(\vec{x})$, and the formula $\Gamma$ computing $G$. Without loss of generality (with a polynomial blow-up if needed), we can assume that all of $\Gamma$’s gates have fan-in at most 2.
We proceed by induction on the size of the formula $\Gamma$. Our inductive hypothesis is as follows: For all formulas $\Gamma ^{\prime }$ of size $|\Gamma ^{\prime }| \lt |\Gamma |$, for all polynomials $P(\vec{x})$, in the 1998 system one can derive $P(\vec{x}) \Gamma ^{\prime }(\vec{x})$ starting from $P(\vec{x})$, using at most $|\Gamma ^{\prime }|$ lines. The base case is $|\Gamma |=1$, in which case $G(\vec{x})$ is a single variable $x_i$, and from $P(\vec{x})$ we can compute $P(\vec{x}) x_i$ in a single step using the variable-product rule.
If $\Gamma$ has a linear combination gate at the top, say, $\Gamma = \alpha \Gamma _1 + \beta \Gamma _2$. By induction, from $P(\vec{x})$, we can derive $P(\vec{x}) \Gamma _i(\vec{x})$ in $|\Gamma _i|$ steps for $i=1,2$. Do those two derivations and then apply the linear combination rule to derive $\alpha P(\vec{x}) \Gamma _1(\vec{x}) + \beta P(\vec{x}) \Gamma _2(\vec{x}) = P(\vec{x}) \Gamma (\vec{x})$ in one additional step. The total length of this derivation is then $|\Gamma _1| + |\Gamma _2| + 1 = |\Gamma |$.
If $\Gamma$ has a product gate at the top, say, $\Gamma = \Gamma _1 \times \Gamma _2$. Unlike the case of linear combinations where we proceeded in parallel, here we proceed sequentially and use more of the strength of our inductive assumption. Starting from $P(\vec{x})$, we derive $P(\vec{x}) \Gamma _1(\vec{x})$ in $|\Gamma _1|$ steps. Now, starting from $P^{\prime }(\vec{x}) = P(\vec{x}) \Gamma _1(\vec{x})$, we derive $P^{\prime }(\vec{x}) \Gamma _2(\vec{x})$ in $|\Gamma _2|$ steps. But $P^{\prime } \Gamma _2 = P \Gamma _1 \Gamma _2 = P \Gamma$, which we derived in $|\Gamma _1| + |\Gamma _2| \le |\Gamma |$ steps. This completes the proof of this direction for Hilbert-like formula-$\text{IPS}$.
For Hilbert-like weakly skew $\text{IPS}$ the proof is similar. However, because gates can now be reused, we must also allow lines in our constructed proof to be reused (otherwise, we would effectively be unrolling our weakly skew circuit into a formula, for which the best-known upper bound is only quasi-polynomial). We still induct on the size of the weakly skew circuit, but now we allow circuits with multiple outputs. We change the induction hypothesis to the following: for all weakly skew circuits $\Gamma ^{\prime }$ of size $|\Gamma ^{\prime }| \lt |\Gamma |$, possibly with multiple outputs that we denote $\Gamma ^{\prime }_{out,1},\ldots, \Gamma ^{\prime }_{out,s}$, from any $P(\vec{x})$ one can derive the tuple $P \Gamma ^{\prime }_{out,1},\ldots, P \Gamma ^{\prime }_{out,s}$ in the 1998 system using at most $|\Gamma ^{\prime }|$ lines.
To simplify matters, we assume that every multiplication gate in a weakly skew circuit has a label indicating which one of its children is separated from the rest of the circuit by this gate.
The base case is the same as before, since a circuit of size one can only have one output, a single variable.
Linear combinations are similar to before, except now we have a multi-output weakly skew circuit of some size, say, $s$, that outputs $\Gamma _1$ and $\Gamma _2$. By the induction hypothesis, there is a derivation of size $\le s$ that derives both $P \Gamma _1$ and $P \Gamma _2$. Then we apply one additional linear combination rule, as before.
For a product gate $\Gamma = \Gamma _1 \times \Gamma _2$, suppose without loss of generality that $\Gamma _2$ is the child that is isolated from the larger circuit by this product gate (recall that we assumed $\Gamma$ comes with an indicator of which child this is). Then we proceed as before, first computing $P \Gamma _1$ from $P$ and then $(P \Gamma _1) \Gamma _2$ from $(P \Gamma _1)$. Because we apply “multiplication by $\Gamma _1$” and “multiplication by $\Gamma _2$” in sequence, it is crucial that the gates computing $\Gamma _2$ do not depend on those computing $\Gamma _1$, for the gates $g$ in $\Gamma _1$ get translated into lines computing $P g$, and if we reused that in computing $\Gamma _2$, rather than getting $g$ as needed, we would be getting $Pg$.
It is interesting to note that the condition of being weakly skew is precisely the condition we needed to make this proof go through.
Throughout this section, all algebraic circuits may have linear combination gates—with weights on their incoming edges (see Section 2.1)—and product gates of unbounded fan-in. We measure the size of all circuits $C$ (Boolean and algebraic), denoted $\text{size}(C)$, by the number of gates. The depth of a circuit $C$ (Boolean or algebraic), denoted $\text{depth}(C)$, is the maximum number of gates encountered on any path from a leaf to the root.
Let $p$ be prime and any field of characteristic $p$. Then p-simulates Frege with $MOD_p$ connectives in such a way that depth-$d$ Frege proofs are simulated by depth-$O(d)$ proofs.
In particular, $\mathsf {AC}^0[p]$-Frege is p-simulated by bounded-depth , and Frege is p-simulated by logarithmic-depth , i.e., -$\text{IPS}$.
The “$O(d)$” above is ; in a forthcoming preprint [38], we prove a tighter depth-preserving simulation, which has the advantage of drawing connections to depth-six algebraic circuits; the two proofs follow the same outline, but the tighter result requires several new technical ingredients.
We will use a small modification of the sequent-calculus formalization of $\mathsf {AC}^0[p]$-Frege as given by Maciel and Pitassi [66]. Changing between a Frege system and sequent calculus does not increase the depth by more than an additive constant. The underlying connectives are unbounded fan-in OR, unbounded fan-in $MOD^i_p$ for $i \in \lbrace 0,\ldots,p-1\rbrace$, and unary negation. The inputs are $x_i$ and the constants 0,1.
We will work in a sequent calculus style proof system, where lines are cedents of the form $\Gamma \rightarrow \Delta$, where both $\Gamma$ and $\Delta$ are sets of $\lbrace \vee , \lnot , MOD^0_p,\ldots, MOD^{p-1}_p\rbrace$-formulas whose inputs are $x_i,0,1$, where each of $\Gamma _i \in \Gamma$ and $\Delta _i \in \Delta$ has depth at most $d$; the intended meaning is that the conjunction of the formulas in $\Gamma$ implies the disjunction of the formulas in $\Delta$. The commutativity of the arguments to each connective is implicit. Throughout we use $\Gamma ,\Delta$ to denote sets of formulae and $A$, $A_i$ to denote individual formulae. Although $\Gamma$ and $\Delta$ are sets, we use sequence notation for convenience, viz. “$\Delta ,A$” actually means “$\Delta \cup \lbrace A\rbrace$.”
Because we are working with gates of unbounded fan-in, we use the prefix notation $\vee (A_1,\ldots, A_k)$ for a single OR gate whose inputs are $A_1,\ldots, A_k$, and similarly $MOD^i_p(A_1,\ldots, A_k)$. In particular, $\vee ()$ is the OR with no inputs, which is equal to 0 by convention, $MOD^0_p()$ is equal to 1 by convention, and $MOD^i_p()$ for $i \ne 0$ is equal to 0 by convention.
The axioms are as follows:
The rules of inference are as follows; throughout, “$i-1$” is to be interpreted modulo $p$.
Weakening | $\displaystyle \frac{\Gamma \rightarrow \Delta }{\Gamma \rightarrow \Delta ,A}$ | $\displaystyle \frac{\Gamma \rightarrow \Delta }{A, \Gamma \rightarrow \Delta }$ | Cut $\displaystyle \frac{\rightarrow A, \Gamma \qquad A \rightarrow \Gamma }{\rightarrow \Gamma }$ | ||
Negation | $\displaystyle \frac{\Gamma , A \rightarrow \Delta }{\Gamma \rightarrow \lnot A, \Delta }$ | $\displaystyle \frac{\Gamma \rightarrow A, \Delta }{\Gamma , \lnot A \rightarrow \Delta }$ | |||
Or-Left | $\displaystyle \frac{A_1,\Gamma \rightarrow \Delta \qquad \vee (A_2,\ldots,A_k),\Gamma \rightarrow \Delta }{\vee (A_1,\ldots,A_k),\Gamma \rightarrow \Delta }$ | ||||
Or-Right | $\displaystyle \frac{\Gamma \rightarrow A_1,\vee (A_2,\ldots,A_k),\Delta }{\Gamma \rightarrow \vee (A_1,\ldots,A_k),\Delta }$ | ||||
Mod-$p$-Left | $\displaystyle \frac{A_1, MOD^{i-1}_p(A_2,\ldots, A_k), \Gamma \rightarrow \Delta \qquad \lnot A_1, MOD^i_p(A_2,\ldots,A_k),\Gamma \rightarrow \Delta }{MOD^i_p(A_1,\ldots,A_k),\Gamma \rightarrow \Delta }$ | ||||
Mod-$p$-Right | $\displaystyle \frac{\Gamma \rightarrow \lnot A_1, MOD^{i-1}_p(A_2,\ldots,A_k),\Delta \qquad \Gamma \rightarrow A_1, MOD^i_p(A_2,\ldots,A_k),\Delta }{\Gamma \rightarrow MOD^i_p(A_1,\ldots,A_k),\Delta }$ | ||||
Mod-$p$ Constants | $\displaystyle \frac{\Gamma \rightarrow MOD^i_p(1,A_2,\ldots, A_k),\Delta }{\Gamma \rightarrow MOD^{i-1}_p(A_2,\ldots,A_k), \Delta }$ | $\displaystyle \frac{\Gamma \rightarrow MOD^i_p(0,A_2,\ldots,A_k),\Delta }{\Gamma \rightarrow MOD^i_p(A_2,\ldots,A_k),\Delta }.$ |
A refutation of a 3CNF formula $\varphi =\kappa _1 \wedge \kappa _2 \wedge \cdots \wedge \kappa _m$ in Frege with mod $p$ connectives is a sequence of cedents, where each cedent is either one of the $\kappa _i$’s, or an instance of an axiom scheme or follows from two earlier cedents by one of the above inference rules, and the final cedent is the empty cedent.
We define a translation $t(A)$ from Boolean formulas to algebraic formulae over such that for any assignment $\alpha$, $A(\alpha)=1$ if and only if $t(A)(\alpha)=0$. The translation is defined inductively as follows:
Note that
For a cedent $\Gamma \rightarrow \Delta$, we will translate the cedent by moving everything to the right of the arrow. That is, the cedent $L = A_1,\ldots,A_k \rightarrow B_1,\ldots,B_\ell$ will be translated to $t(L) = t(\lnot A_1 \vee \cdots \vee \lnot A_k \vee B_1 \vee \cdots \vee B_\ell) = (1-t(A_1))(1-t(A_2))\cdots (1-t(A_k))t(B_1)\cdots t(B_\ell)$. This may again increase the depth by 1, since the product gate used to simulate the $\rightarrow$ was not counted in the depth of the $A_i, B_i$.
Let $R$ be a Frege refutation (with mod $p$ connectives) of $\varphi$. Without loss of generality, we may assume that $R$ is treelike. Recall that a Frege (or sequent calculus) proof is treelike if the underlying directed acyclic graph structure of the proof is a tree, and therefore every cedent in the refutation, other than the final empty cedent, is used exactly once. Any Frege proof can be efficiently converted into a treelike proof at a polynomial increase in size, and increasing the depth by one Krajíček [58, Proposition 1.1] (although not stated there for Frege with mod $p$ connectives, the same proof still works, see Segerlind [90, Section 3.2]).
We will prove by induction on the number of cedents of $R$ that for each cedent $L$ in the refutation, we can derive $t(L)$ via a Hilbert-like IPS proof (see Definition 1.8) whose size is polynomial in the original size, and whose depth is at most a constant factor greater than the depth of $R$.
For the base case, each initial cedent of the form $\rightarrow \kappa _i$ translates to $y_i$, and thus has the right form.
Axioms (2), (3), and (4) translate to the identically zero polynomial, so they have the right form. The axiom $A \rightarrow A$ translates to $t(A)(1-t(A))$; we need to show that $t(A)(1-t(A))$ can be derived from the Boolean axioms $x_i^2 - x_i$ by an $\text{IPS}$ proof of appropriate size and depth.
Let $p$ be any prime and any field of characteristic $p$. For any Boolean formula $A$ of size $s$ and depth $d$ with connectives $\lnot$, $\vee$, $MOD^0_p$, $\ldots\,$, $MOD^{p-1}_p$, $t(A)(1-t(A))$ can be derived from $\lbrace x_i^2 - x_i : i \in [n]\rbrace$ by a Hilbert-like derivation of size $O(s^2)$ and depth $O(d)$.
To make our formulae clearer, we write $b(A_j)$ for the $\text{IPS}$ circuit that derives $t(A_j)^2 - t(A_j)$ from the placeholder variables $y_i$ for the Boolean axioms $x_i^2 - x_i$. We build up the $\text{IPS}$ circuit starting from the leaves (inputs) of $A$; in fact, we will derive $t(g)^2 - t(g)$ for every gate $g$ of $A$. Our $\text{IPS}$ circuit will include a single copy of the (natural) circuit for $t(A)$; whenever we write $t(g_i)$ inside an expression for some $b(g)$, we mean to use the gate $t(g_i)$ in this single copy of the circuit for $t(A)$. This incurs an additive cost of $\text{size}(t(A)) \le \text{size}(A)$ and means the depth of $b(A)$ is at least $\text{depth}(t(A)) \le 2\text{depth}(A)+1$.
Case 0: For an input gate $x_i$, we have $t(x_i)(1-t(x_i)) = (1-x_i)x_i = x_i^2 - x_i$, so its $\text{IPS}$ derivation is just $b(x_i) = y_i$. Both the input gate and its $\text{IPS}$ derivation have depth zero and size one (or size zero, depending on how you count, but either way will not affect the rest of the result).
Case 1: For a negation gate, say, $g = \lnot g_1$. Then $t(g)(1-t(g)) = (1-t(g_1))t(g_1)$, so $b(g) = b(g_1)$ and the depth and size do not increase at all.
Case 2: $g = \vee (g_1,\ldots, g_k)$. First, we claim that the following is a polynomial identity which holds over any ring:
Now, as $t(g) = \prod _{i=1}^k t(g_i)$, plugging in $t(g_i)$ for $z_i$ into the above identity, we find that
Case 3: $g = MOD^i_p(g_1,\ldots, g_k)$. The idea of the proof in this case as follows: First, we show that adding up a bunch of $\lbrace 0,1\rbrace$ values over , possibly with a constant from $\lbrace 0,\ldots, p-1\rbrace$, results in an element $z$ of the prime field (which sits inside any field of characteristic $p$); in other words, we derive $z^p - z$, where $z = -i + \sum _j t(g_j)$. (Note that the proof holds for symbolic polynomials in characteristic $p$, not only with the corresponding functions, so it is allowed within the framework of .) Then we show that $z^{p-1} \in \lbrace 0,1\rbrace$ by noting that $(z^{p-1})^2 - z^{p-1} = z^{2p - 2} - z^{p-1} = z^{p-2} (z^p - z)$.
Let us now implement the preceding idea carefully: Let $z = -i + \sum _j t(g_j)$. Then we have
In total, we have included a copy of $t(A)$, and for each gate we add at most $O(ps) = O(s)$ gates to our $\text{IPS}$ derivation, so the total size is at most $s + sO(s) = O(s^2)$, and the depth has increased by a factor of at most 3.
The preceding lemma handled the only nontrivial axiom; we now conclude the proof of Theorem 3.5. For the inductive step, it is a matter of going through all of the rules. We assume inductively that we have an $\text{IPS}$ proof of appropriate size and depth of all the antecedents.
At each step, we have increased the depth by at most 4 and, except for the axiom $A \rightarrow A$, the size by at most a constant as well. Since Lemma 3.6 can increase the size quadratically, our overall size increase is quadratic, and the depth has been multiplied by at most 4.
Let $F_1 = \cdots = F_m = 0$ be a polynomial system of equations in $n$ variables $x_1,\ldots, x_n$, and let $C(\vec{x}, \vec{y})$ be an $\text{IPS}$-certificate of the unsatisfiability of this system. Let $D = \max _{i} \deg _{y_i} C$ and let $t$ be the number of terms of $C$, when viewed as a polynomial in the $y_i$ with coefficients in . Suppose $C$ and each $F_i$ can be computed by a circuit of size $\le s$. Then a Hilbert-like $\text{IPS}$-certificate for this system can be computed by a circuit of size $\operatorname{poly}(D,t,n,s)$. 4
The proof uses known sparse multivariate polynomial interpolation algorithms. The threshold $T$ is essentially the number of points at which the polynomial must be evaluated in the course of the interpolation algorithm. Here we use one of the early, elegant interpolation algorithms due to Zippel [110]. Although Zippel's algorithm chooses random points at which to evaluate polynomials for the interpolation, in our nonuniform setting it suffices merely for points with the required properties to exist (which they do as long as ). Better bounds may be achievable using more recent interpolation algorithms such as those of Ben-Or and Tiwari [16] or Kaltofen and Yagati [51]. We note that all of these interpolation algorithms only give limited control on the depth of the resulting Hilbert-like $\text{IPS}$-certificate (as a function of the depth of the original $\text{IPS}$-certificate $f$), because they all involve solving linear systems of equations, which is not known to be computable efficiently in constant depth.
Forbes, Shpilka, Tzameret, and Wigderson [34] subsequently improved on this result; see Section 8.
Using a sparse multivariate interpolation algorithm such as Zippel's [110], for each monomial in the placeholder variables $\vec{y}$ that appears in $C$, there is a polynomial-size algebraic circuit for its coefficient, which is an element of . For each such monomial $\vec{y}^{\vec{e}} = y_1^{e_1} \cdots y_m^{e_m}$, with coefficient $c_{\vec{e}}(\vec{x})$, there is a small circuit $C^{\prime }$ computing $c_{\vec{e}}(\vec{x}) \vec{y}^{\vec{e}}$. Since every $\vec{y}$-monomial appearing in $C$ is non-constant, at least one of the exponents $e_i \gt 0$. Let $i_0$ be the least index of such an exponent. Then we get a small circuit computing $c(\vec{e})(\vec{x}) y_{i_0} F_{i_0}(\vec{x})^{e_{i_0}-1} F_{i_0 + 1}(\vec{x})^{e_{i_0 + 1}} \cdots F_{m}(\vec{x})^{e_m}$ as follows. Divide $C^{\prime }$ by $y_{i_0}$, and then eliminate this division using Strassen [98] (or, alternatively, consider $\frac{1}{e_{i_0}} \frac{\partial C^{\prime }}{\partial y_{i_0}}$ using Baur–Strassen [9]). In the resulting circuit, replace each input $y_i$ by a small circuit computing $F_i(\vec{x})$. Then multiply the resulting circuit by $y_{i_0}$. Repeat this procedure for each monomial appearing (the list of monomials appearing in $C$ is one of the outputs of the sparse multivariate interpolation algorithm), and then add them all together.
A super-polynomial lower bound on [constant-free] Hilbert-like $\mathsf {VP}$-$\text{IPS} _{R}$ proofs of any family of tautologies implies $\mathsf {VNP}_{R} \ne \mathsf {VP}_{R}$ [respectively, $\mathsf {VNP}^0_{R} \ne \mathsf {VP}^0_{R}$], for any ring $R$.
A super-polynomial lower bound on the number of lines in Polynomial Calculus proofs implies the Permanent versus Determinant Conjecture ($\mathsf {VNP} \ne \mathsf {VP}_{ws}$).
Together with Proposition 3.1, this immediately gives an alternative, and we believe simpler, proof of the following result:
If $\mathsf {NP} \not\subseteq \mathsf {coMA}$, then , for any field .
The previous proofs we are aware of all depend crucially on the random self-reducibility of the permanent or of some function complete for $\mathsf {Mod}_{p}\mathsf {P/poly}$. In contrast, our proof is quite different, in that it avoids random self-reduciblity altogether and does not need any completeness results: Indeed, we do not even know if there exist tautologies and a choice of ordering of the clauses such that the $\mathsf {VNP}$-$\text{IPS}$ certificates of Lemma 4.2 are random self-reducible nor (separately) $\mathsf {VNP}$-complete.
For comparison, here is a brief sketch of three previous proofs (we thank Lance Fortnow for one and an anonymous reviewer for the other two). Note that all of these proofs rely on several seminal results from computational complexity (all of them rely on Valiant's completeness result [102], and each relies on a subset of References [7, 23, 33, 50, 100, 102, 106]), whereas our proof uses little more than the Nullstellensatz, a result over 100 years old.
Proof 1: This proof seems to only work when is a finite field or, assuming the Generalized Riemann Hypothesis, a field of characteristic zero. First, Bürgisser's results [23] relate $\mathsf {VP}$ and $\mathsf {VNP}$ over various fields to standard Boolean complexity classes such as $\mathsf {NC/poly}$, $\mathsf {\# P/poly}$ (uses GRH), and $\mathsf {Mod}_{p}\mathsf {P/poly}$. The result then follows from the implication $\mathsf {NP} \not\subseteq \mathsf {coMA} \Rightarrow \mathsf {NC/poly} \ne \mathsf {\# P/poly}$ (and similarly with $\mathsf {\# P/poly}$ replaced by $\mathsf {Mod}_{p}\mathsf {P/poly}$), which uses the downward self-reducibility of complete functions for $\mathsf {\# P/poly}$ (the permanent [102]) and $\mathsf {Mod}_{p}\mathsf {P/poly}$ [33], as well as Valiant–Vazirani [106].
Proof 2: This proof seems to only work when $R$ is a finite ring of odd characteristic. If $\mathsf {VNP}_R \subseteq \mathsf {VP}_R$ for a finite ring $R$ of odd characteristic, then the permanent over $R$ has polynomial-size $R$-algebraic circuits, and hence—since $R$ is finite—polynomial-size Boolean circuits. This implies $\mathsf {P}^\mathsf {\mathsf {Mod}_m \mathsf {P}} \subseteq \mathsf {coMA}$ by Babai–Fortnow–Nisan–Wigderson [7], where $m$ is the characteristic of $R$. The proof concludes by using either Toda's Theorem [100]—or the slightly weaker result of Valiant and Vazirani [106]—to show that $\mathsf {NP} \subseteq \mathsf {P}^{\mathsf {Mod}_m \mathsf {P}}$.
Proof 3: Similar to Proof 2, but instead of Babai–Fortnow–Nisan–Wigderson [7], uses Kabanets–Impagliazzo [50] to conclude from the polynomial-size $R$-algebraic circuits for the permanent that $\mathsf {NP} \subseteq \mathsf {coNP}^{\mathsf {RP}} \subseteq \mathsf {coMA}$.
The following lemma is the key to Theorem 1.2.
Every family of CNF tautologies $(\varphi _n)$ has a Hilbert-like family of $\text{IPS}$ certificates $(C_n)$ in $\mathsf {VNP}^{0}_{R}$.
We first show how Theorem 1.2 follows from Lemma 4.2, and then return to the proof of the lemma.
For a given set $\mathcal {F}$ of unsatisfiable polynomial equations $F_1=\cdots =F_m=0$, a lower bound on $\text{IPS}$ refutations of $\mathcal {F}$ is equivalent to giving the same circuit lower bound on all $\text{IPS}$ certificates for $\mathcal {F}$. A super-polynomial lower bound on Hilbert-like $\text{IPS}$ implies that some function in $\mathsf {VNP}$—namely, the $\mathsf {VNP}$-$\text{IPS}$ certificate guaranteed by Lemma 4.2—cannot be computed by polynomial-size algebraic circuits and hence that $\mathsf {VNP} \ne \mathsf {VP}$. Since Lemma 4.2 even guarantees a constant-free certificate, we get the analogous consequence for constant-free lower bounds.
The second part of Theorem 1.2 follows from the fact that number of lines in a PC proof is p-equivalent to Hilbert-like $\det$-$\text{IPS}$ (Proposition 3.4). As in the first part, a super-polynomial lower bound on Hilbert-like $\det$-$\text{IPS}$ implies that some function family in $\mathsf {VNP}$ is not a p-projection of the determinant. Since the permanent is $\mathsf {VNP}$-complete under p-projections, the result follows.
We mimic one of the proofs of completeness for Hilbert-like $\text{IPS}$ [81, Theorem 1] (recall Proposition 3.4) and then show that this proof can in fact be carried out in $\mathsf {VNP}^{0}$. We omit any mention of the ground ring, as it will not be relevant.
Let $\varphi _n(\vec{x}) = \kappa _1(\vec{x}) \wedge \cdots \wedge \kappa _m(\vec{x})$ be an unsatisfiable CNF, where each $\kappa _i$ is a disjunction of literals. Let $C_i(\vec{x})$ denote the (negated) polynomial translation of $\kappa _i$ via $\lnot x \mapsto x$, $x \mapsto 1-x$ and $f \vee g \mapsto fg$; in particular, $C_i(\vec{x}) = 0$ if and only if $\kappa _i(\vec{x}) = 1$, and thus $\varphi _n$ is unsatisfiable if and only if the system of equations $C_1(\vec{x})=\cdots =C_m(\vec{x})=x_1^2 - x_1 = \cdots = x_n^2 - x_n = 0$ is unsatisfiable. In fact, as we will see in the course of the proof, we will not need the equations $x_i^2 - x_i = 0$. It will be convenient to introduce the function $b(e,x)=ex + (1-e)(1-x)$, i.e., $b(1,x) = x$ and $b(0,x)=1-x$. For example, the clause $\kappa _i(\vec{x}) = (x_1 \vee \lnot x_{17} \vee x_{42})$ gets translated into $C_i(\vec{x}) = (1-x_1)x_{17}(1-x_{42}) = b(0,x_1)b(1,x_{17})b(0,x_{42})$, and therefore an assignment falsifies $\kappa _i$ if and only if $(x_1,x_{17},x_{42}) \mapsto (0,1,0)$.
Just as $1 = x_1 x_2 + x_1(1-x_2) + (1-x_2)x_1 + (1-x_2)(1-x_1)$, an easy induction shows that
Let $c_i$ be the placeholder variable corresponding to $C_i(\vec{x})$. For any property $\Pi$, we write for the indicator function of $\Pi$: if and only if $\Pi (\vec{e})$ holds and 0 otherwise. We claim that the following is a $\mathsf {VNP}^0$-$\text{IPS}$ certificate:
To see that Equation (5) is a certificate, we claim that on substituting $C_i(\vec{x})$ for $c_i$, the resulting sum becomes syntactically identical to Equation (4) and therefore sums to 1. (It is clear from its form that it is in the ideal generated by the $c_i$.) Note that an assignment $\vec{e}$ falsifies clause $\kappa _i$ if and only if $C_i(x) = \prod _{j : x_j \in \kappa _i} b(e_j, x_j)$. Let $A_i$ be the set of assignments $\vec{e}$ satisfying the $i$th condition in brackets: $A_i = \lbrace \vec{e} \in \lbrace 0,1\rbrace ^n : \vec{e} \text{ falsifies } \kappa _i \text{ and satisfies $\kappa _j$ for all } j \lt i\rbrace$. Then, on substituting the $C_i(\vec{x})$ for the $c_i$, Equation (5) becomes
Indeed, as noted in Pitassi [81, Theorem 1], the same proof would have worked had the $A_i$ been any partition of $\lbrace 0,1\rbrace ^n$ such that every $\vec{e} \in A_i$ falsified clause $\kappa _i$; we will now use this particular partition to show that the certificate (5) is in $\mathsf {VNP}^0$. We have
In this section, we state our PIT axioms and prove Theorems 1.4 and 1.6, which say that Extended Frege (EF) (respectively, $\mathsf {AC}^0$- or $\mathsf {AC}^0[p]$-Frege) is polynomially equivalent to the Ideal Proof System if there are polynomial-size circuits for PIT whose correctness—suitably formulated—can be efficiently proved in EF (respectively, $\mathsf {AC}^0$- or $\mathsf {AC}^0[p]$-Frege). More precisely, we identify a small set of natural axioms for PIT and show that if these axioms can be proven efficiently in EF, then EF is p-equivalent to $\text{IPS}$. Theorem 1.6 begins to explain why $\mathsf {AC}^0[p]$-Frege lower bounds have been so difficult to obtain and highlights the importance of our PIT axioms for $\mathsf {AC}^0[p]$-Frege lower bounds. We begin by describing and discussing these axioms.
Fix some standard Boolean encoding of constant-free algebraic circuits, so that the encoding of any size-$m$ constant-free algebraic circuit has size $\operatorname{poly}(m)$. We use “$[C]$” to denote the encoding of the algebraic circuit $C$. Let $K = (K_{m,n})$ denote a family of Boolean circuits for solving polynomial identity testing. That is, $K_{m,n}$ is a Boolean function that takes as input the encoding of a size $m$ constant-free algebraic circuit, $C$, over variables $x_1, \ldots , x_n$, and if $C$ has polynomial degree, then $K$ outputs 1 if and only if the polynomial computed by $C$ is the 0 polynomial.
Notational convention: We underline parts of a statement that involve propositional variables. For example, if in a propositional statement we write “$[C]$,” then this refers to a fixed Boolean string that is encoding the (fixed) algebraic circuit $C$. In contrast, if we write $\underline{[C]}$, then this denotes a Boolean string of propositional variables, which is to be interpreted as a description of an as-yet-unspecified algebraic circuit $C$; any setting of the propositional variables corresponds to a particular algebraic circuit $C$. Throughout, we use $\vec{p}$ and $\vec{q}$ to denote propositional variables (which we do not bother underlining except when needed for emphasis) and $\vec{x}, \vec{y}, \vec{z},\ldots\,$ to denote the algebraic variables that are the inputs to algebraic circuits. Thus, $C(\vec{x})$ is an algebraic circuit with inputs $\vec{x}$, $[C(\vec{x})]$ is a fixed Boolean string encoding some particular algebraic circuit $C$, $\underline{[C(\vec{x})]}$ is a string of propositional variables encoding an unspecified algebraic circuit $C$, and $[C(\underline{\vec{p}})]$ denotes a Boolean string together with propositional variables $\vec{p}$ that describes a fixed algebraic circuit $C$ whose inputs have been set to the propositional variables $\vec{p}$.
Our PIT axioms for a Boolean circuit $K$ are as follows. (This definition makes sense even if $K$ does not correctly compute PIT, but that case is not particularly interesting or useful.)
If there is a family $K$ of polynomial-size Boolean circuits computing PIT, such that the PIT axioms for $K$ have polynomial-size EF proofs, then EF is polynomially equivalent to $\text{IPS}$.
Note that the issue is not the existence of small circuits for PIT, since we would be happy with nonuniform polynomial-size PIT circuits, which do exist. Unfortunately, the known constructions are highly nonuniform—they involve picking random points—and we do not see how to prove axiom 1 for these constructions. Nonetheless, it seems very plausible to us that there exists a polynomial-size family of PIT circuits where the above axioms are efficiently provable in EF.
To prove the theorem, we will first show that EF is p-equivalent to $\text{IPS}$ if a family of propositional formulas expressing soundness of $\text{IPS}$ are efficiently EF provable. Then we will show that efficient EF proofs of $Soundness_{\text{IPS}}$ follows from efficient EF proofs for our PIT axioms.
It is standard for two proof systems $P_1$ and $P_2$ that if $P_2$ can prove the soundness of $P_1$, then $P_2$ can p-simulate $P_1$. What is more interesting here is that we show (Lemma 5.4) that a natural set of axioms for PIT (Definition 5.1) imply $Soundness_{\text{IPS}}$. This allows us to draw on intuitions (and, hopefully, results) about PIT to get a better sense of the plausibility of efficient EF proofs of $Soundness_{\text{IPS}}$. The power of this connection to PIT has already led to new results building on ours: Li, Tzameret, and Wang [65] showed that noncommutative formula IPS is qp-equivalent to Frege by showing that a noncommutative formula PIT algorithm [84] could be proved correct in Frege (see Section 8 for details).
Soundness of $\text{IPS}$ . It is well known that for standard Cook–Reckhow proof systems, a proof system $P$ can p-simulate another proof system $P^{\prime }$ if and only if $P$ can prove soundness of $P^{\prime }$. Our proof system is not standard, because verifying a proof requires probabilistic, rather than deterministic, polynomial-time. Still we will show how to formalize soundness of $\text{IPS}$ propositionally, and we will show that if EF can efficiently prove soundness of $\text{IPS}$ then EF is p-equivalent to $\text{IPS}$.
Let $\varphi = \kappa _1 \wedge \ldots \wedge \kappa _m$ be an unsatisfiable propositional 3CNF formula over variables $p_1,\ldots ,p_n$, and let $Q^\varphi _1, \ldots , Q^\varphi _m$ be the corresponding polynomial equations (each of degree at most 3) such that $\kappa _i(\alpha)=1$ if and only if $Q^\varphi _i(\alpha)=0$ for $\alpha \in \lbrace 0,1\rbrace ^{n}$. An $\text{IPS}$-refutation of $\varphi$ is an algebraic circuit, $C$, which demonstrates that 1 is in the ideal generated by the polynomial equations $\vec{Q}^\varphi$. (This demonstrates that the polynomial equations $\vec{Q}^\varphi =0$ are unsolvable, which is equivalent to proving that $\varphi$ is unsatisfiable.) In particular, recall that $C$ has two types of inputs: $x_1,\ldots,x_n$ (corresponding to the propositional variables $p_1,\ldots,p_n$) and the placeholder variables $y_1, \ldots , y_m$ (corresponding to the equations $Q^{\varphi }_1,\ldots,Q^\varphi _m$) and satisfies the following two properties:
Encoding $\text{IPS}$ Proofs. Let $K$ be a family of polynomial-size circuits for PIT. Using $K_{m,n}$, we can create a polynomial-size Boolean circuit, $Proof_{\text{IPS}} ([C], [\varphi ])$ that is true if and only if $C$ is an $\text{IPS}$-proof of the unsatisfiability of $\vec{Q}^\varphi =0$. The polynomial-sized Boolean circuit $Proof_{\text{IPS}} ([C],[\varphi ])$ first takes the encoding of the algebraic circuit $C$ (which has $x$-variables and placeholder variables), and creates the encoding of a new algebraic circuit, $[C^{\prime }]$, where $C^{\prime }$ is like $C$ but with each $y_i$ variable replaced by 0. Second, it takes the encoding of $C$ and $[\varphi ]$ and creates the encoding of a new circuit $C^{\prime \prime }$, where $C^{\prime \prime }$ is like $C$ but now with each $y_i$ variable replaced by $Q^\varphi _i$. (Note that whereas $C$ has $n+m$ underlying algebraic variables, both $C^{\prime }$ and $C^{\prime \prime }$ have only $n$ underlying variables.) $Proof_{\text{IPS}} ([C], [\varphi ])$ is true if and only if $K([C^{\prime }])$—that is, $C^{\prime }(\vec{x})=C(\vec{x},\vec{0})$ computes the 0 polynomial—and $K([1-C^{\prime \prime }])=0$—that is, $C^{\prime \prime }(\vec{x})=C(\vec{x},\vec{Q}^{\varphi }(\vec{x}))$ computes the 1 polynomial.
Let formula $Truth_{bool}(\vec{p},\vec{q})$ state that the truth assignment $\vec{q}$ satisfies the Boolean formula coded by $\vec{p}$. The soundness of $\text{IPS}$ says that if $\varphi$ has a refutation in $\text{IPS}$, then $\varphi$ is unsatisfiable. That is, $Soundness_{\text{IPS},m,n}([C], [\varphi ], \vec{p})$ has variables that encode a size $m$ $\text{IPS}$-proof $C$, variables that encode a 3CNF formula $\varphi$ over $n$ variables, and $n$ additional Boolean variables, $\vec{p}$. $Soundness_{\text{IPS},m,n}([C], [\varphi ], \vec{p})$ states:
If EF can efficiently prove $Soundness_{\text{IPS}}$ for some polynomial-size Boolean circuit family $K$ computing PIT, then EF is p-equivalent to $\text{IPS}$.
Because $\text{IPS}$ can p-simulate EF, it suffices to show that if EF can prove Soundness of $\text{IPS}$, then EF can p-simulate $\text{IPS}$. Assume that we have a polynomial-size EF proof of $Soundness_{\text{IPS}}$. Now suppose that $C$ is an $\text{IPS}$-refutation of an unsatisfiable 3CNF formula $\varphi$ on variables $\vec{p}$. We will show that EF can also prove $\lnot \varphi$ with a proof of size polynomial in $|C|$.
First, we claim that it follows from a natural encoding (see Section 5.4) that EF can efficiently prove:
Second, if $C$ is an $\text{IPS}$-refutation of $\varphi$, then EF can prove $Proof_{\text{IPS}} ([C],[\varphi ])$.5 This holds because both $C$ and $\varphi$ are fixed, so this formula is variable free. Thus, EF can just verify that it is true.
Third, by soundness of $\text{IPS}$, which we are assuming is EF-provable, and the fact that EF can prove $Proof_{\text{IPS}} ([C],[\varphi ])$ (step 2), it follows by modus ponens that EF can prove $\lnot \textit{Truth}_{bool}([\varphi ], \vec{p})$. (The statement $Soundness_{\text{IPS}} ([C],[\varphi ],\vec{p})$ for this instance will only involve variables $\vec{p}$: The other two sets of inputs to the $Soundness_{\text{IPS}}$ statement, $[C]$ and $[\varphi ]$, are constants here, since both $C$ and $\varphi$ are fixed.)
Finally, by modus ponens and the contrapositive of $\varphi \rightarrow Truth_{bool}([\varphi ], \vec{p})$, we conclude in EF $\lnot \varphi$, as desired.
Theorem 1.4 follows from Lemma 5.4 and the following lemma.
If EF can efficiently prove the PIT axioms for some polynomial-size Boolean circuit family $K$ computing PIT, then EF can efficiently prove $Soundness_{\text{IPS}}$ (for that same $K$).
Starting with $Truth_{bool}(\underline{[\varphi ]},\vec{p})$, $K(\underline{[C(\vec{x},\vec{0})]})$, $K(\underline{[1-C(\vec{x},\vec{Q}(\vec{x}))]})$, we will derive a contradiction.
Finally, (6) and (7) give a contradiction.
Let $\mathcal {C}$ be any class of circuits closed under $\mathsf {AC}^0$ circuit reductions. If there is a family $K$ of polynomial-size Boolean circuits for PIT such that the PIT axioms for $K$ have polynomial-size $\mathcal {C}$-Frege proofs, then $\mathcal {C}$-Frege is polynomially equivalent to $\text{IPS}$ and, consequently, polynomially equivalent to Extended Frege.
Note that here we do not need to restrict the circuit $K$ to be in the class $\mathcal {C}$. This requires one more technical device compared to the proofs in the previous section. The proof of Theorem 1.6 follows the proof of Theorem 1.4 very closely. The main new ingredient is a folklore technical device that allows even very weak systems such as $\mathsf {AC}^0$-Frege to make statements about arbitrary circuits $K$—such as those needed to reason about the PIT axioms—together with a careful analysis of what was needed in the proof of Theorem 1.4. Before proving Theorem 1.6, we discuss some of its more interesting consequences.
As $\mathsf {AC}^0$-Frege is known unconditionally to be strictly weaker than Extended Frege [3], we immediately get that $\mathsf {AC}^0$-Frege cannot efficiently prove the PIT axioms for any Boolean circuit family $K$ correctly computing PIT.
Using essentially the same proof as Theorem 1.6, we also get the following result. By “depth-$d$ PIT axioms,” we mean a variant where the algebraic circuits $C$ (encoded as $[C]$ in the statement of the axioms) have depth at most $d$. Note that, even over finite fields, super-polynomial lower bounds on depth-$d$ algebraic circuits are notoriously open problems even for $d$ as small as 4 or 5.6
For any $d$, if there is a family of tautologies with no polynomial-size $\mathsf {AC}^0[p]$-Frege proof, and $\mathsf {AC}^0[p]$-Frege has polynomial-size proofs of the [depth-$d$] PIT axioms for some $K$, then does not have polynomial-size [depth-$d$] algebraic circuits.
This corollary makes the following question of central importance in getting lower bounds on $\mathsf {AC}^0[p]$-Frege:
For some $d \ge 4$, is there some $K$ computing depth-$d$ PIT, for which the depth-$d$ PIT axioms have $\mathsf {AC}^0[p]$-Frege proofs of polynomial size?
This question has the virtue that answering it either way is highly interesting:
This dichotomy is in some sense like a “completeness result for $\mathsf {AC}^0[p]$-Frege, modulo proving strong algebraic circuit lower bounds on $\mathsf {VNP}$”: If one hopes to prove $\mathsf {AC}^0[p]$-Frege lower bounds without proving strong lower bounds on $\mathsf {VNP}$, then one must prove $\mathsf {AC}^0[p]$-Frege lower bounds on the PIT axioms. For example, if you believe that proving $\mathsf {VP} \ne \mathsf {VNP}$ [or that proving $\mathsf {VNP}$ does not have bounded-depth polynomial-size circuits] is very difficult, and that proving $\mathsf {AC}^0[p]$-Frege lower bounds is comparatively easy, then to be consistent you must also believe that proving $\mathsf {AC}^0[p]$-Frege lower bounds on the [bounded-depth] PIT axioms is easy.
Similarly, by combining Theorems 1.6 and 3.5, we get the following corollary.
If for every constant $d$, there is a constant $d^{\prime }$ such that the depth-$d$ PIT axioms have polynomial-size depth-$d^{\prime }$ $\mathsf {AC}^0_{d^{\prime }}[p]$-Frege proofs , then $\mathsf {AC}^0[p]$-Frege is polynomially equivalent to constant-depth .
Using the chasms at depths 3 and 4 for algebraic circuits [2, 55, 99] (see Observation 3.3), we can also help explain why sufficiently strong exponential lower bounds for $\mathsf {AC}^0$-Frege—that is, lower bounds that do not depend on the depth, or do not depend so badly on the depth (the current best bounds are of the form $\exp (\Omega (n^{\exp (-d + O(1)}))$ [15, 60, 83]), which have also been open for nearly thirty years—have been difficult to obtain:
Let be any field, and let $c$ be a sufficiently large constant. If there is a family of tautologies $(\varphi _n)$ such that any $\mathsf {AC}^0$-Frege proof of $\varphi _n$ has size at least $2^{c\sqrt {n} \log n}$, and $\mathsf {AC}^0$-Frege has polynomial-size proofs of the depth 4 PIT axioms for some $K$, then .
If has characteristic zero, then we may replace “depth 4” above with “depth 3.”
Suppose that $\mathsf {AC}^0$-Frege can efficiently prove the depth-4 PIT axioms for some Boolean circuit $K$. Let $(\varphi _n)$ be a family of tautologies. If , then there is a polynomial-size $\text{IPS}$ proof of $\varphi _n$. By Observation 3.3, the same certificate is computed by a depth 4 -algebraic circuit of size $2^{O(\sqrt {n} \log n)}$. By assumption, $\mathsf {AC}^0$-Frege can efficiently prove the depth 4 PIT axioms for $K$, and therefore $\mathsf {AC}^0$-Frege p-simulates depth 4 $\text{IPS}$. Thus there are $\mathsf {AC}^0$-Frege proofs of $\varphi _n$ of size $2^{O(\sqrt {n} \log n)}$.
If has characteristic zero, then we may instead use the best-known chasm at depth 3, for which we only need depth-3 PIT and depth-3 $\text{IPS}$ and yields the same bounds.
As with Corollary 1.7, we conclude a similar dichotomy: Either $\mathsf {AC}^0$-Frege can efficiently prove the depth-4 PIT axioms (depth 3 in characteristic zero) or proving $2^{\omega (\sqrt {n} \log n)}$ lower bounds on $\mathsf {AC}^0$-Frege implies $\mathsf {VP}^0 \ne \mathsf {VNP}^0$.
Encoding $K$ into Weak Proof Systems. Extended Frege can easily reason about arbitrary circuits $K$: For each gate $g$ of $K$ (or even each gate of each instance of $K$ in a statement, if so desired), with children $g_{\ell }, g_{r}$, EF can introduce a new variable $k_g$ together with the requirement that $k_g \leftrightarrow k_{g_{\ell }} \, op_{g} \, k_{g_{r}}$, where $\, op_{g} \,$ is the corresponding operation $g = g_{\ell } \, op_{g} \, g_{r}$ (e.g., $\wedge$, $\vee$, etc.). But weaker proof systems such as Frege (=$\mathsf {NC}^1$-Frege), $\mathsf {AC}^0[p]$-Frege, or $\mathsf {AC}^0$-Frege do not have this capability. We thus need to help them out by introducing these new variables and formulae ahead of time.
For each gate $g$, the statement $k_g \leftrightarrow k_{g_{\ell }} \, op_{g} \, k_{g_{r}}$ only involves three variables and thus can be converted into a 3CNF of constant size. We refer to these clauses as the “$K$-clauses.” Note that the $K$-clauses do not set the inputs of $K$ to any particular values nor require its output to be any particular value. We denote the variables corresponding to $K$’s inputs as $k_{in,i}$ and the variable corresponding to $K$’s output as $k_{out}$.
The modified statement $\textit{Proof}_{\text{IPS}} (\underline{[C]},\underline{[\varphi ]})$ now takes the following form. Recall that $Proof_{\text{IPS}}$ involves two uses of $K$: $K(\underline{[C(\vec{x},\vec{0})]})$ and $K(\underline{[1-C(\vec{x}, \vec{Q}^\varphi (\vec{x}))]})$. Each of these instances of $K$ needs to get its own set of variables, which we denote $k^{(1)}_{g}$ for gate $g$ in the first instance and $k^{(2)}_{g}$ for gate $g$ in the second instance, together with their own copies of the $K$-clauses. For an encoding $[C]$ or $[\varphi ]$, let $[C]_{i}$ denote its $i$th bit, which may be a constant, a propositional variable, or even a propositional formula. Then $\textit{Proof}_{\text{IPS}} (\underline{[C]}, \underline{[\varphi }])$ is
The Proofs. Lemmata 5.10 and 5.11 are the $\mathsf {AC}^0$-analogs of Lemmata 5.4 and 5.5, respectively. The proof of Lemma 5.10 will cause no trouble, and the proof of Lemma 5.11 will need one additional technical device (the “dummy statements” above).
Before getting to their proofs, we state the main additional lemma that we use to handle the new $K$ variables. We say that a variable $k^{(i)}_{in,j}$ corresponding to an input gate of $K$ is set to $\psi$ by a propositional statement if $k^{(i)}_{in,j} \leftrightarrow \psi$ occurs in the statement.
Let $(\varphi _n)$ be a sequence of tautologies on $\operatorname{poly}(n)$ variables, including any number of copies of the $K$ variables, of the form $\varphi = \left(\left(\bigwedge _{i} \alpha _i\right) \rightarrow \omega \right)$. Let $\vec{p}$ denote the other (non-$K$) variables. Suppose that (1) there are at most $O(\log n)$ non-$K$ variables in $\varphi$; (2) for each copy of $K$, the corresponding $K$-clauses appear amongst the $\alpha _i$; (3) the only $K$ variables that appear in $\omega$ are output variables $k^{(i)}_{out}$; and (4) if $k^{(i)}_{out}$ appears in $\omega$, then all the inputs to $K^{(i)}$ are set to formulas that syntactically depend on at most $\vec{p}$.
Then there is a $\operatorname{poly}(n)$-size $\mathsf {AC}^0$-Frege proof of $\varphi$.
The basic idea is that $\mathsf {AC}^0$-Frege can brute force over all $\operatorname{poly}(n)$-many assignments to the $O(\log n)$ non-$K$ variables and for each such assignment can then just evaluate each copy of $K$ gate by gate to verify the tautology. Any copy $K^{(i)}$ of $K$ all of whose input variables are unset must not affect the truth of $\varphi$, since none of the $k^{(i)}$ variables can appear in the consequent $\omega$ of $\varphi$. In fact, for such copies of $K$, the $K$-clauses merely appear as disjuncts of $\varphi$, since it then takes the form $\varphi = \bigvee _{i} (\lnot \alpha _i) \vee \omega = (\bigvee _{g} \lnot (k^{(i)}_{g} \leftrightarrow k^{(i)}_{g_{\ell }} \, op_{g} \, k^{(i)}_{g_r})) \vee (\bigvee _{\text{remaining clauses $i$}} \lnot \alpha _i) \vee \omega$. Thus, if $\mathsf {AC}^0$-Frege can prove that the rest of $\varphi$, namely $(\bigvee _{\text{remaining clauses $i$}} \lnot \alpha _i) \vee \omega$, is a tautology, then it can prove that $\varphi$ is a tautology.
Now we state the analogs of Lemmata 5.4 and 5.5 for $\mathcal {C}$-Frege. Because of the similarity of the proofs to the previous case, we merely indicate how their proofs differ from the Extended Frege case.
Let $\mathcal {C}$ be a class of circuits closed under $\mathsf {AC}^0$ circuit reductions. If there is a family $K$ of polynomial-size Boolean circuits computing PIT, such that the PIT axioms for $K$ have polynomial-size $\mathcal {C}$-Frege proofs, then $\mathcal {C}$-Frege is polynomially equivalent to $\text{IPS}$.
Mimic the proof of Lemma 5.4. The third and fourth steps of that proof are just modus ponens, so we need only check the first two steps.
The first step is to show that $\mathcal {C}$-Frege can prove $\varphi \rightarrow \textit{Truth}_{bool}([\varphi ], \underline{\vec{p}})$. This follows directly from the details of the encoding of $[\varphi ]$ and the full definition of $Truth_{bool}$; see Lemma 5.12.
The second step is to show that $\mathcal {C}$-Frege can prove $\textit{Proof}_{\text{IPS}} ([C],[\varphi ])$ for a fixed $C,\varphi$. In Lemma 5.4, this followed because this statement was variable free. Now this statement is no longer variable free, since it involve two copies of $K$ and the corresponding variables and $K$-clauses. However, $\textit{Proof}_{\text{IPS}} ([C],[\varphi ])$ satisfies the requirements of Lemma 5.9, and applying that lemma we are done.
Let $\mathcal {C}$ be a class of circuits closed under $\mathsf {AC}^0$ circuit reductions. If $\mathcal {C}$-Frege can efficiently prove the PIT axioms for some polynomial-sized family of circuits $K$ computing PIT, then $\mathcal {C}$-Frege can efficiently prove $Soundness_{\text{IPS}}$ (for that same $K$).
We mimic the proof of Lemma 5.5. In steps (1), (2), and (4) of that proof we used $m$ additional copies of $K$, where $m$ is the number of clauses in the CNF $\varphi$ encoded by $\underline{[\varphi ]}$, and thus $m \le \operatorname{poly}(n)$. To talk about these copies of $K$ in $\mathcal {C}$-Frege, however, the $K$ variables must already be present in the statement we wish to prove in $\mathcal {C}$-Frege. The “dummy statements” in the new version of soundness are the $K$-clauses—with inputs and outputs not set to anything—for each of $m$ new copies of $K$, which we denote $K^{(3)},\ldots, K^{(m+2)}$ (recall that the first two copies $K^{(1)}$ and $K^{(2)}$ are already used in the statement of $Proof_{\text{IPS}}$). We will not actually need these clauses anywhere in the proof, we just need their variables to be present from the beginning.
Starting with $\textit{Truth}_{bool}(\underline{[\varphi ]},\vec{p})$, $K^{(1)}(\underline{[C(\vec{x},\vec{0})]})$, $K^{(2)}(\underline{[1-C(\vec{x},\vec{Q}(\vec{x}))]})$ we derive a contradiction. The only step of the proof of Lemma 5.5 that was not either the use of an axiom or modus ponens was step (1), so it suffices to verify that this can be carried out in $\mathsf {AC}^0$-Frege with the $K$-clauses.
Step (1) was to show for every $i \in [m]$, $Truth_{bool}([\varphi ],\vec{p}) \rightarrow K(\underline{[Q_i^\varphi (\vec{p})]})$, where $Q_i^\varphi$ is the low-degree polynomial corresponding to the clause, $\kappa _i$, of $\varphi$. Note that, as $\varphi$ is not a fixed formula but is determined by the propositional variables encoding $\underline{[\varphi ]}$, the encoding $\underline{[Q_i^\varphi ]}$ depends on a subset of these variables.
$Truth_{bool}(\underline{[\varphi ]},\vec{p})$ states that each clause $\kappa _i$ in $\varphi$ evaluates to true under $\vec{p}$. It is a tautology that if $\kappa _i$ evaluates to true under $\vec{p}$, then $Q_i^{\varphi }$ evaluates to 0 at $\vec{p}$. Since $K$ correctly computes PIT,
For an $\le m$-clause, $\le n$-variable 3CNF $\varphi = \kappa _1 \wedge \cdots \wedge \kappa _m$, its encoding is a Boolean string of length $3m(\lceil \log _2(n) \rceil +1)$. Each literal $x_i$ or $\lnot x_i$ is encoded as the binary encoding of $i$ ($\lceil \log _2(n) \rceil$ bits) plus a single other bit indicating whether the literal is positive (1) or negative (0). The encoding of a single clause is just the concatenation of the encodings of the three literals, and the encoding of $\varphi$ is the concatenation of these encodings.
We define
For a single 3-literal clause $\kappa$, we define $Truth_{bool,n}(\underline{[\kappa ]}, \vec{p})$ as follows. For an integer $i$, let $[i]$ denote the standard binary encoding of $i-1$ (so that the numbers $1,\ldots,2^k$ are put into bijective correspondence with $\lbrace 0,1\rbrace ^{k}$). Let $\underline{[\kappa ]} = \vec{q_1} s_1 \vec{q_2} s_2 \vec{q_3} s_3$, where each $s_i$ is the sign bit (positive/negative) and each $\vec{q_i}$ is a length-$\lceil \log _2 n \rceil$ string of variables corresponding to the encoding of the index of a variable. We write $\vec{q} = [k]$ as shorthand for $\bigwedge _{i=1}^{\lceil \log _2 n \rceil } (q_i \leftrightarrow [k]_i)$, where $x \leftrightarrow y$ is shorthand for $(x \wedge y) \vee (\lnot x \wedge \lnot y)$. Finally, we define:
For any 3CNF $\varphi$ on $n$ variables, there are $\operatorname{poly}(n)$-size $\mathsf {AC}^0$-Frege proofs of $\varphi (\vec{p}) \rightarrow Truth_{bool}([\varphi ], \vec{p})$.
In fact, we will see that for a fixed clause $\kappa$, after simplifying constants—that is, $\varphi \wedge 1$ and $\varphi \vee 0$ both simplify to $\varphi$, $\varphi \wedge 0$ simplifies to 0, and $\varphi \vee 1$ simplifies to 1—that $Truth_{bool}([\kappa ], \vec{p})$ in fact becomes syntactically identical to $\kappa (\vec{p})$. By the definition of $Truth_{bool}([\varphi ], \vec{p})$, we get the same conclusion for any fixed CNF $\varphi$. Simplifying constants can easily be carried out in $\mathsf {AC}^0$-Frege.
For a fixed $\kappa$, $\vec{q}_j$ and $s_j$ become fixed to constants for $j=1,2,3$. Denote the indices of the three variables in $\kappa$ by $i_1, i_2, i_3$. The only variables left in the statement $Truth_{bool}([\kappa ], \vec{p})$ are $\vec{p}$. Since the $\vec{q}_{j}$ and $[i]$ are all fixed, every term in $\bigvee _{i}(\vec{q}_j = [i] \wedge (p_i \leftrightarrow s_j))$ except for the $i_j$ term simplifies to 0, so this entire disjunction simplifies to $(p_{i_j} \leftrightarrow s_j)$. Since the $s_j$ are also fixed, if $s_j=1$, then $(p_{i_j} \leftrightarrow s_j)$ simplifies to $p_{i_j}$, and if $s_j=0$, then it simplifies to $\lnot p_{i_j}$. With this understanding, we write $\pm p_{i_j}$ for the corresponding literal. Then $Truth_{bool}([\kappa ], \vec{p})$ simplifies to $(\pm p_{i_1} \vee \pm p_{i_2} \vee \pm p_{i_3})$ (with signs as described previously). This is exactly $\kappa (\vec{p})$.
Theorem 1.2 shows that proving lower bounds on (even Hilbert-like) $\text{IPS}$, or on the number of lines in Polynomial Calculus proofs (equivalent to Hilbert-like $\det$-$\text{IPS}$), is at least as hard as proving algebraic circuit lower bounds. In this section, we begin to make the difference between proving proof complexity lower bounds and proving circuit lower bounds precise.
The key difference, which any technique must grapple with, is that while an algebraic circuit complexity lower bound is a lower bound on a single function family $(f_n)_{n=1,2,3,\ldots\,}$, one for each $n$, an IPS lower bound is instead a lower bound on $(\mathcal {C}_n)_{n=1,2,3,\ldots\,}$, where $\mathcal {C}_n$ is the set of all IPS certificates for the $n$th system of equations $\mathcal {F}_n$. Even over finite fields, $\mathcal {C}_n$ will be infinite for each $n$ (when it is not empty). However, we observe that $\mathcal {C}_n$ is finitely generated, and we use this to suggest a direction for proving new proof complexity lower bounds, aimed at proving the long-sought length-of-proof lower bounds on an algebraic proof system.
The key fact we use is embodied in Lemma 6.1, which says that the set of (Hilbert-like) certificates for a given unsatisfiable system of equations is, in a precise sense, “finitely generated.” The basic idea is then to leverage this finite generation to extend lower bound techniques from individual polynomials to entire “finitely generated” sets of polynomials.
Because Hilbert-like certificates are somewhat simpler to deal with, we begin with those and then proceed to general certificates. But keep in mind that all our key conclusions about Hilbert-like certificates will also apply to general certificates. For this section, we will need the notion of a module over a ring (the ring-analogue of a vector space over a field) and a few basic results about such modules (see Section 2.3).
Recall that a Hilbert-like $\text{IPS}$-certificate $C(\vec{x}, \vec{y})$ is one that is linear in the $y$-variables, that is, it has the form $\sum _{i=1}^{m}G_i(\vec{x}) y_i$. Each function of the form $\sum _i G_i(\vec{x})y_i$ is completely determined by the tuple $(G_1(\vec{x}), \ldots , G_m(\vec{x}))$, and the set of all such tuples is exactly the $R[\vec{x}]$-module $R[\vec{x}]^{m}$.
The algebraic circuit size of a Hilbert-like certificate $C=\sum _i G_i(\vec{x}) y_i$ is equivalent (up to a small constant factor and an additive $O(n)$) to the algebraic circuit size of computing the entire tuple $(G_1(\vec{x}),\ldots, G_m(\vec{x}))$. A circuit computing the tuple can easily be converted to a circuit computing $C$ by adding $m$ times gates and a single plus gate. Conversely, for each $i$ we can recover $G_i(\vec{x})$ from $C(\vec{x}, \vec{y})$ by plugging in 0 for all $y_j$ with $j \ne i$ and 1 for $y_i$. So from the point of view of lower bounds on Hilbert-like certificates, we may consider their representation as tuples essentially without loss of generality. This holds even in the setting of Hilbert-like depth 3 $\text{IPS}$-proofs.
Using the representation of Hilbert-like certificates as tuples, we find that Hilbert-like $\text{IPS}$-certificates are in bijective correspondence with $R[\vec{x}]$ solutions (in the new variables $g_i$) to the following $R[\vec{x}]$-linear equation:
Just as in linear algebra over a field, the set of such solutions can be described by taking one solution and adding to it all solutions to the associated homogeneous equation:
We now come to the key lemma for Hilbert-like certificates.
For a given set of unsatisfiable polynomial equations $F_1(\vec{x})=\cdots =F_m(\vec{x})=0$ over a Noetherian ring $R$ (such as a field or ), the set of Hilbert-like $\text{IPS}$-certificates is a coset of a finitely generated submodule of $R[\vec{x}]^{m}$.
The discussion above shows that the set of Hilbert-like certificates is a coset of a $R[\vec{x}]$-submodule of $R[\vec{x}]^{m}$, namely the solutions to Equation (7). As $R$ is a Noetherian ring, so is $R[\vec{x}]$ (by Hilbert's Basis Theorem). Thus $R[\vec{x}]^{m}$ is a Noetherian $R[\vec{x}]$-module, and hence every submodule of it is finitely generated.
Lemma 6.1 seems so conceptually important that it is worth re-stating:
The set of all Hilbert-like $\text{IPS}$-certificates for a given system of equations can be described by a single Hilbert-like $\text{IPS}$-certificate, together with a finite generating set for the syzygies.
Its importance may be underscored by contrasting the preceding statement with the structure (if any?) of the set of all proofs in other proof systems, particularly non-algebraic ones.
Note that a finite generating set for the syzygies (indeed, even a Gröbner basis) can be found in the process of computing a Gröbner basis for the $R[\vec{x}]$-ideal $\langle F_1(\vec{x}),\ldots, F_m(\vec{x}) \rangle$. This process is to Buchberger's Gröbner basis algorithm as the extended Euclidean algorithm is to the usual Euclidean algorithm; an excellent exposition can be found in the book by Ene and Herzog [32] (see also Eisenbud [31, Section 15.5]).
Lemma 6.1 suggests that one might be able to prove size lower bounds on Hilbert-like-$\text{IPS}$ along the following lines: (1) Find a single family of Hilbert-like $\text{IPS}$-certificates $(G_n)_{n=1}^{\infty }$, $G_n = \sum _{i=1}^{\operatorname{poly}(n)} y_i G_i(\vec{x})$ (one for each input size $n$); (2) use your favorite algebraic circuit lower bound technique to prove a lower bound on the polynomial family $G$; (3) find a (hopefully nice) generating set for the syzygies; and (4) show that when adding to $G$ any $R[\vec{x}]$-linear combinations of the generators of the syzygies, whatever useful property was used in the lower bound on $G$ still holds. Although this indeed seems significantly more difficult than proving a single algebraic circuit complexity lower bound, it at least suggests a recipe for proving lower bounds on Hilbert-like $\text{IPS}$ (and its subsystems such as homogeneous depth 3, depth 4, multilinear, etc.), which should be contrasted with the amorphous difficulty of transferring lower bounds for a circuit class to lower bounds on previous related proof systems, e.g., transferring $\mathsf {AC}^0[p]$ lower bounds [86, 94] to $\mathsf {AC}^0[p]$-Frege.
This entire discussion also applies to general $\text{IPS}$-certificates, with the following modifications. We leave a certificate $C(\vec{x}, \vec{y})$ as is, and instead of a module of syzygies we get an ideal (still finitely generated) of what we call zero-certificates. The difference between any two $\text{IPS}$-certificates is a zero-certificate; equivalently, a zero-certificate is a polynomial $C(\vec{x}, \vec{y})$ such that $C(\vec{x}, \vec{0}) = 0$ and $C(\vec{x}, \vec{F}(\vec{x})) = 0$ as well (contrast with the definition of $\text{IPS}$ certificate, which has $C(\vec{x}, \vec{F}(\vec{x})) = 1$). The set of $\text{IPS}$-certificates is then the coset intersection
A finite generating set for the ideal of zero-certificates can be computed using Gröbner bases (see, e.g., Ene and Herzog [32, Section 3.2.1]).
Just as for Hilbert-like certificates, we get:
The set of all $\text{IPS}$-certificates for a given system of equations can be described by a single $\text{IPS}$-certificate, together with a finite generating set for the ideal of zero-certificates.
Our suggestions above for lower bounds on Hilbert-like $\text{IPS}$ apply mutatis mutandis to general $\text{IPS}$-certificates, suggesting a route to proving true size lower bounds on $\text{IPS}$ using known techniques from algebraic complexity theory.
The discussion here raises many basic and interesting questions about the complexity of sets of (families of) functions in an ideal or module, which we propose in Section 7.
We introduced the Ideal Proof System $\text{IPS}$ (Definition 1.1) and showed that it is a very close algebraic analog of Extended Frege—the most powerful, natural system currently studied for proving propositional tautologies. We showed that lower bounds on $\text{IPS}$ imply (algebraic) circuit lower bounds, which to our knowledge is the first time that lower bounds on a proof system have been shown to imply any sort of complexity class lower bounds. Using the same techniques, we were also able to show that lower bounds on the number of lines (rather than the usual measure of number of monomials) in Polynomial Calculus proofs also imply strong algebraic circuit lower bounds. Because proofs in $\text{IPS}$ are just algebraic circuits satisfying certain polynomial identity tests, many results from algebraic circuit complexity apply immediately to $\text{IPS}$. In particular, the chasms at depths 3 and 4 in algebraic circuit complexity imply that lower bounds on even depth 3 or depth 4 $\text{IPS}$ proofs would be very interesting.
We introduced natural propositional axioms for polynomial identity testing (PIT) and showed that these axioms play a key role in understanding the 30-year open question of $\mathsf {AC}^0[p]$-Frege lower bounds: Either there are $\mathsf {AC}^0[p]$-Frege lower bounds on the PIT axioms or any $\mathsf {AC}^0[p]$-Frege lower bounds are as hard as showing $\mathsf {VP} \ne \mathsf {VNP}$ over a field of characteristic $p$. We expect PIT to be in $\mathsf {P}$ (given the connection to circuit lower bounds [50]); if this is the case, then IPS becomes a deterministic Cook–Reckhow system. Furthermore, in this case there should be some proof that PIT is in $\mathsf {P}$, which we expect to be in ZFC; if the full ZFC proof translates into a ZFC propositional proof of the PIT axioms for some specific Boolean circuit family $K$, then we would have that ZFC (used as a propositional proof system) p-simulates IPS.
In appendices, we discuss a variant of the Ideal Proof System that allows divisions, and its utility and limitations, as well as a geometric variant of the Ideal Proof System which suggests further geometric properties that might be of interest for computational and proof complexity. And finally, through an analysis of the set of all $\text{IPS}$ proofs of a given unsatisfiable system of equations, we suggest how one might transfer techniques from algebraic circuit complexity to prove lower bounds on $\text{IPS}$ (and thus on Extended Frege).
The Ideal Proof System raises many new questions, not only about itself but also about PIT, new examples of $\mathsf {VNP}$ functions coming from propositional tautologies, and the complexity of ideals or modules of polynomials.
In Proposition 3.7 we show that if a general $\text{IPS}$-certificate $C$ has only polynomially many $\vec{y}$-monomials (with coefficients in ), and the maximum degree of each $y_i$ is polynomially bounded, then $C$ can be converted to a polynomial-size Hilbert-like certificate. However, without this sparsity assumption general $\text{IPS}$ appears to be stronger than Hilbert-like $\text{IPS}$.
What, if any, is the difference in size between the smallest Hilbert-like and general $\text{IPS}$ certificates for a given unsatisfiable system of equations? What about for systems of equations coming from propositional tautologies?
For general IPS, the preceding question was essentially answered [34] after an initial version of our article appeared (see Section 8 below); however, for $\mathcal {C}$-IPS for various $\mathcal {C}$, the question remains interesting.
Is there a super-polynomial size separation—or indeed any nontrivial size separation—between $\text{IPS}$ certificates of degree $\le d_{small}(n)$ and $\text{IPS}$ certificates of degree $\ge$ $d_{large}(n)$ for some bounds $d_{small} \lt d_{large}$?
This question is particularly interesting in the following cases: (a) certificates for systems of equations coming from propositional tautologies, where $d_{small}(n) = n$ and $d_{large}(n) \ge \omega (n)$, since we know that every such system of equations has some (not necessarily small) certificate of degree $\le$ $n$, and (b) certificates for unsatisfiable systems of equations taking $d_{small}$ to be the bound given by the best-known effective Nullstellensätze, which are all exponential [20, 56, 96].
Are there tautologies for which the certificate family constructed in Theorem 1.2 is the one of minimum complexity (under p-projections or c-reductions8)?
If there is any family $\varphi = (\varphi _n)$ of tautologies for which Question 7.3 has a positive answer and for which the certificates constructed in Theorem 1.2 are $\mathsf {VNP}$-complete (Question 7.8 below), then super-polynomial size lower bounds on $\text{IPS}$-proofs of $\varphi$ would be equivalent to $\mathsf {VP} \ne \mathsf {VNP}$. This highlights the potential importance of understanding the structure of the set of certificates under computational reducibilities.
Since the set of all (Hilbert-like) $\text{IPS}$-certificates is a coset of a finitely generated ideal (respectively, module), the preceding question is a special case of considering, for a given family of cosets of ideals or modules $(f^{(0)}_n + I_n)$ ($I_n \subseteq R[x_1,\ldots, x_{\operatorname{poly}(n)}]$), the relationships under various reductions between all families of functions $(f_n)$ with $f_n \in f^{(0)}_n + I_n$ for each $n$. This next question is of a more general nature than the others we ask; we think it deserves further study.
Given a family of cosets of ideals $f^{(0)}_n + I_n$ (or more generally modules) of polynomials, with $I_n \subseteq R[x_1,\ldots, x_{\operatorname{poly}(n)}]$, consider the function families $(f_n) \in (f^{(0)}_n + I_n)$ (meaning that $f_n \in f^{(0)}_n + I_n$ for all $n$) under any computational reducibility $\le$ such as p-projections. What can the $\le$ structure look like? When, if ever, is there such a unique $\le$-minimum (even a single nontrivial example would be interesting, as in Question 7.3)? Can there be infinitely many incomparable $\le$-minima?
Say a $\le$-degree $\mathbf {d}$ is “saturated” in $(f^{(0)}_n + I_n)$ if every $\le$-degree $\mathbf {d^{\prime }} \ge \mathbf {d}$ has some representative in $f^{(0)} + I$. Must saturated degrees always exist? We suspect yes, given that one may multiply any element of $I$ by arbitrarily complex polynomials. What can the set of saturated degrees look like for a given $(f^{(0)}_n + I_n)$? Must every $\le$-degree in $f^{(0)} + I$ be below some saturated degree? What can the $\le$-structure of $f^{(0)} + I$ look like below a saturated degree?
Question 7.4 is of interest even when $f^{(0)} = 0$, that is, for ideals and modules of functions rather than their nontrivial cosets.
Can we leverage the fact that the set of $\text{IPS}$ certificates is not only a finitely generated coset intersection but also closed under multiplication?
We note that it is not difficult to show that a coset $c + I$ of an ideal is closed under multiplication if and only if $c^2 - c \in I$. Equivalently, this means that $c$ is idempotent ($c^2 = c$) in the quotient ring $R/I$. For example, if $I$ is a prime ideal, then $R/I$ has no zero-divisors, and thus the only choices for $c+I$ are $I$ and $1+I$. We note that the ideal generated by the $n^2$ equations $XY-I=0$ in the setting of the Hard Matrix Identities is prime (see Appendix A). It seems unlikely that all ideals coming from propositional tautologies are prime, however.
An IPS certificate $C(\vec{x}, \vec{y})$ for a system of equations $F_1(\vec{x}) = \cdots = F_m(\vec{x}) = 0$ can be viewed as an -homotopy [73, 107] as follows. Let be the graph of the map defined by $\vec{x} \mapsto (F_1(\vec{x}),\ldots, F_m(\vec{x}))$. Let $t$ be a new variable and consider the function $C^{\prime }(\vec{x}, \vec{f}, t) \stackrel{def}{=}C(\vec{x}, t\vec{y})$. Then $C^{\prime }$ is an -homotopy from a function on that vanishes on (namely, $C(\vec{x}, 0)$) to a function that is identically 1 on $V$ (namely, $C(\vec{x}, \vec{F}(\vec{x}))$). We have not yet found any use of this fact but hope it might inspire some of our readers.
The complexity of Gröbner basis computations obviously depends on the degrees and the number of polynomials that one starts with. From this point of view, Mayr and Meyer [71] showed that the doubly exponential upper bound on the degree of a Gröbner basis [43] (see also References [68, 91]) could not be improved in general. However, in practice, many Gröbner basis computations seem to work much more efficiently, and even theoretically many classes of instances—such as proving that 1 is in a given ideal—can be shown to have only a singly-exponential degree upper bound [20, 56, 96]. These points of view are reconciled by the more refined measure of the (Castelnuovo–Mumford) regularity of an ideal or module. For the definition of regularity and a discussion of its close connection with the complexity of Gröbner basis and syzygy computations, we refer the reader to the original articles [11, 12, 13] or the survey [10].
Given that the syzygy module or ideal of zero-certificates are so crucial to the complexity of $\text{IPS}$-certificates, and the tight connection between these modules/ideals and the computation of the Gröbner basis of the ideal one started with, we ask:
Is there a formal connection between the proof complexity of individual instances of TAUT (in, say, the Ideal Proof System), and the Castelnuovo–Mumford regularity of the corresponding syzygy module or ideal of zero-certificates?
The certificates constructed in the proof of Theorem 1.2 provide many new examples of polynomial families in $\mathsf {VNP}$. There are many natural questions one can ask about these polynomials. For example, the construction itself depends on the order of the clauses; does the complexity of the resulting polynomial family depend on this order? As another example, we suspect that, for any $\equiv _{p}$ or $\equiv _{c}$-degree within $\mathsf {VNP}$ (see Section 2.1), there is some family of tautologies for which the above polynomials are of that degree. However, we do not yet know this for even a single degree.
Are there tautologies for which the certificates constructed in Theorem 1.2 are $\mathsf {VNP}$-complete? More generally, for any given $\equiv _{p}$ or $\equiv _{c}$-degree within $\mathsf {VNP}$, are there tautologies for which this certificate is of that degree?
Finally, we wish to highlight an important and very basic question:
Find a function $f$ that vanishes on $\lbrace 0,1\rbrace ^n$ such that any IPS certificate showing that $f \in \langle x_i^2 - x_i | x \in [n] \rangle$ requires super-polynomial algebraic circuit size.
If $\mathsf {NP} \not\subseteq \mathsf {coAM}$, then such an $f$ must exist, but even if we assume just $\mathsf {VP} \ne \mathsf {VNP}$, then the existence of such an $f$ is currently unknown.
After the appearance of the preliminary version of this article [37], there were two significant follow-up works [34, 65], whose main results we briefly mention in the next two sections.
Li, Tzameret, and Wang [65] considered a noncommutative version of the Ideal Proof System. They consider precisely what one would imagine from the name “noncommutative formula IPS,” with the one caveat that—because it is designed to consider systems of polynomial equations coming from Boolean formulas—they always include the equations $x_i x_j - x_j x_i$ among the initial equations $F_i$. Their main result is as follows.
Noncommutative formula IPS p-simulates Frege, and Frege quasi-polynomially simulates noncommutative formula IPS. In particular, noncommutative formula IPS is quasi-polynomially equivalent to Frege.
Their proof follows the conditional proof in this article; they get an unconditional result by giving a quasi-polynomial-size Frege proof for the deterministic polynomial-time algorithm for noncommutative formula PIT [84].
They go on to suggest that proving lower bounds on noncommutative formula IPS is potentially a more promising avenue for getting Frege lower bounds than by considering commutative formula IPS (which is p-equivalent to Frege if the formula PIT axioms have short Frege proofs). Their reasoning is that (a) noncommutative formula IPS is unconditionally quasi-polynomially equivalent to Frege and (b) exponential lower bounds on computing functions by noncommutative formulas have been known for decades [80]. Despite these facts, there are a few issues making this approach more difficult than it might appear in light of known noncommutative formula lower bounds [80]. In particular, although it remains the case that the set of noncommutative IPS certificates for a given tautology is a coset of an ideal—using essentially the same proof as in Section 6—it is now a coset of an ideal in a noncommutative polynomial ring. The issue here is that the remaining discussion in Section 6 does not go through a priori, because noncommutative polynomial rings are not Noetherian: For example, the ideal $\langle yxy, yx^2 y, yx^3 y,\ldots, \rangle$ in two noncommuting variables is not finitely generated. This raises a potentially important question about noncommutative IPS:
Are the noncommutative analogs of the ideals from Section 6 finitely generated, when the initial system of equations comes from a CNF tautology and includes both $x_i^2 - x_i$ and $x_i x_j - x_j x_i$ for all $i,j$?
However, Nisan's original noncommutative circuit lower bound applied to the permanent and determinant regardless of the ordering of variables within a monomial [80]. This gives some hope that even if the answer to the preceding question is negative, one might be able to prove lower bounds on noncommutative IPS by proving noncommutative circuit lower bounds by considering (the noncommutative versions of) a finite generating set of the commutative version of the relevant coset of an ideal.
Forbes, Shpilka, Tzameret, and Wigderson [34] improved some of our foundational simulations and used circuit complexity lower bounds (some of which they developed in their article) to prove lower bounds on simple systems of equations (often the Boolean axioms plus a single equation) in restricted forms of IPS.
First, they show that Hilbert-like IPS is essentially equivalent to IPS:
Let be an unsatisfiable system of equations of degree at most $d$, over a sufficiently large field (). Let $s$ be such that each $F_i$ can be computed by an algebraic circuit of size $s$, and such that there is an IPS certificate of the unsatisfiability of $F_1 = \cdots = F_m = 0$ computable by a circuit of size $s$. Then a Hilbert-like IPS certificate for this system can be computed by a circuit of size $\operatorname{poly}(d,s)$.
As with our simulation result Proposition 3.7, in their result it is also difficult to get a good handle on the depth, so the result seems to only hold for IPS of unrestricted depth.
In their article [34], they prove many results; here we just highlight the main IPS lower bounds that they get and some open questions that are underscored by their results. Though we do not discuss their techniques, they surely deserve further investigation.
For definitions of the circuit classes considered, we refer to their article [34]. For some of their results, they introduce a new variant of IPS, which we call “weakly Hilbert-like”: This is IPS where the initial equations include the Boolean axioms $x_i^2 - x_i$, but the certificate is only required to be linear in the placeholder variables for the initial equations other than the Boolean axioms.
As they point out in their article, all of these lower bounds have the form of the Boolean axioms plus a single polynomial involving all of the variables. In particular, this allowed them to use techniques treating these formal polynomials as functions on the Boolean cube, implicitly handling the syzygies between the Boolean axioms and the one other function. But their techniques seem ill suited to handle situations with more complicated syzygies. Even the following question would be an interesting extension of their results:
Let $\beta \notin \lbrace 0,\ldots, 2n\rbrace$, and let be a field of characteristic at least $2n+1$. Prove lower bounds on restricted versions of IPS certificates for the unsatisfiable system of equations
We begin with an example where it is advantageous to include divisions in an $\text{IPS}$-certificate. Note that this is different than merely computing a polynomial $\text{IPS}$-certificate using divisions. In the latter case, divisions can be eliminated [98]. In the case we discuss here, the certificate itself is no longer a polynomial but is a rational function.
The inversion principle, one of the “Hard Matrix Identities” [95], states that
In terms of ideals, the inversion principle says that the $n^2$ polynomials $(YX - I)_{i,j}$ (the entries of the matrix $YX -I$) are in the ideal generated by the $n^2$ polynomials $(XY-I)_{i,j}$. The simplest rational proof of the inversion principle that we are aware of is as follows:
To introduce an $\text{IPS}$-like proof system that allows rational certificates, we generalize the preceding reasoning. We must be careful what we allow ourselves to divide by. If we are allowed to divide by arbitrary polynomials, then this would yield an unsound proof system, because then from any polynomials $F_1(\vec{x}),\ldots, F_m(\vec{x})$ we could derive any other polynomial $G(\vec{x})$ via the false “certificate” $\frac{G(x)}{F_1(x)}y_1$.
Unfortunately, although we try to eschew as many definitions as possible, our definition of the Rational Ideal Proof System and our results about it are made much cleaner by using some additional standard terminology from commutative algebra, which we now review for the reader's convenience, such as prime ideals, irreducible components of algebraic sets, and localization of rings.
The following preliminaries from commutative algebra are only needed in this appendix. We refer to the standard textbooks [6, 31, 69, 88] for proofs and further details.
The radical of an ideal $I \subseteq R$ is the ideal $\sqrt {I}$ consisting of all $r \in R$ such that $r^k \in I$ for some $k \gt 0$. An ideal $I$ is prime if whenever $rs \in P$, at least one of $r$ or $s$ is in $P$. For any ideal $I$, its radical is equal to the intersection of the prime ideals containing $I$: $\sqrt {I} = \bigcap _{\text{prime } P \supseteq I} P$. We refer to prime ideals that are minimal under inclusion, subject to containing $I$, as “minimal over $I$”; there are only finitely many such prime ideals. The radical $\sqrt {I}$ is thus also equal to the intersections of the primes minimal over $I$.
An algebraic set in is any set of the form , which we denote $V(F_1,\ldots, F_m)$ (“$V$” for “variety”). The algebraic set $V(F_1,\ldots, F_m)$ depends only on the ideal $\langle F_1,\ldots, F_m \rangle$, and even its radical, in the sense that $V(F_1,\ldots, F_m) = V(\sqrt {\langle F_1,\ldots, F_m \rangle })$. Conversely, the set of all polynomials vanishing on a given algebraic set $V$ is a radical ideal, denoted $I(V)$. An algebraic set is irreducible if it cannot be written as a union of two algebraic proper subsets. $V$ is irreducible if and only if $I(V)$ is prime. The irreducible components of an algebraic set $V = V(I)$ are the maximal irreducible algebraic subsets of $V$, which are exactly the algebraic sets corresponding to the prime ideals minimal over $I$.
If $U$ is any subset of a ring $R$ that is closed under multiplication—$a, b \in U$ implies $ab \in U$—then we may define the localization of $R$ at $U$ to be the ring in which we formally adjoin multiplicative inverses to the elements of $U$. Equivalently, we may think of the localization of $R$ at $U$ as the ring of fractions over $R$ where the denominators are all in $U$. If $P$ is a prime ideal, then its complement is a multiplicatively closed subset (this is an easy and instructive exercise in the definition of prime ideal). In this case, rather than speak of the localization of $R$ at the complement $R \backslash P$, it is common usage to refer to the localization of $R$ at $P$, denoted $R_P$. Similar statements hold for the union of finitely many prime ideals. We will use the fact that the localization of a Noetherian ring is again Noetherian (however, if $R$ is merely finitely generated, then its localizations need not be, e.g., the localization of at $P = \langle 2 \rangle$ consists of all rationals with odd denominators; this is one of the ways in which the condition of being Noetherian is nicer than that of merely being finitely generated).
A rational $\text{IPS}$ certificate or R$\text{IPS}$-certificate that a polynomial is in the radical of the -ideal generated by $F_1(\vec{x}),\ldots, F_m(\vec{x})$ is a rational function $C(\vec{x}, \vec{y})$ such that
A R$\text{IPS}$ proof that $G(\vec{x})$ is in the radical of the ideal $\langle F_1(\vec{x}),\ldots, F_m(\vec{x}) \rangle$ is an -algebraic circuit with divisions on inputs $x_1,\ldots ,x_n,y_1,\ldots ,y_m$ computing some R$\text{IPS}$ certificate.
Condition (0) is equivalent to: If $G(\vec{x})$ is an invertible constant, then $D(\vec{x}, \vec{y})$ is also an invertible constant and thus $C$ is a polynomial; otherwise, after substituting the $F_i(\vec{x})$ for the $y_i$, the denominator $D(\vec{x}, \vec{F}(\vec{x}))$ does not vanish identically on any of the irreducible components (over the algebraic closure ) of the algebraic set . In particular, for proofs of unsatisfiability of systems of equations, the Rational Ideal Proof System reduces by definition to the Ideal Proof System. For derivations of one polynomial from a set of polynomials, this need not be the case, however; indeed, there are examples for which every R$\text{IPS}$-certificate has a nonconstant denominator, that is, there is a R$\text{IPS}$-certifiate, but there are no $\text{IPS}$-certificates (see Example A.4).
Grigoriev and Hirsch [35, Section 2.5] introduced a related system, denoted (F-)PC$\sqrt {}$, for proving that a polynomial is in the radical of an ideal. Beyond the differences between IPS and F-PC (discussed just after Definition 1.8), RIPS also allows potentially more general divisions than (F-)PC$\sqrt {}$.
The Rational Ideal Proof System is sound. That is, if there is a R$\text{IPS}$-certificate that $G(\vec{x})$ is in the radical of $\langle F_1(\vec{x}),\ldots, F_m(\vec{x}) \rangle$, then $G(\vec{x})$ is in fact in the radical of $\langle F_1(\vec{x}),\ldots, F_m(\vec{x}) \rangle$.
Let $C(\vec{x}, \vec{y}) = \frac{1}{D(\vec{x}, \vec{y})} C^{\prime }(\vec{x}, \vec{y})$ be a R$\text{IPS}$ certificate that $G$ is in $\sqrt {\langle F_1,\ldots, F_m \rangle }$, where $D$ and $C^{\prime }$ are relatively prime polynomials. Then $C^{\prime }(\vec{x}, \vec{y})$ is an $\text{IPS}$-certificate that $G(\vec{x})D(\vec{x}, \vec{F}(\vec{x}))$ is in the ideal $\langle F_1(\vec{x}),\ldots, F_m(\vec{x}) \rangle$ (recall Definition 1.8). Let $D_{F}(\vec{x}) = D(\vec{x}, \vec{F}(\vec{x}))$.
Geometric proof: Since $G(\vec{x}) D_{F}(\vec{x}) \in \langle F_1(\vec{x}),\ldots, F_m(\vec{x}) \rangle$, $GD_{F}$ must vanish identically on every irreducible component of the algebraic set $V(F_1,\ldots, F_m)$. On each irreducible component $V_i$, since $D_{F}(\vec{x})$ does not vanish identically on $V_i$, $G(\vec{x})$ must vanish everywhere except for the proper subset $V(D_{F}(\vec{x})) \cap V_i$. Since $D_{F}$ does not vanish identically on $V_i$, we have $\dim V(D_{F}) \cap V_i \le \dim V_i - 1$ (in fact this is an equality). In particular, this means that $G$ must vanish on a dense subset of $V_i$. Since $G$ is a polynomial, by (Zariski-) continuity, $G$ must vanish on all of $V_i$. Finally, since $G$ vanishes on every irreducible component of $V(F_1,\ldots, F_m)$, it vanishes on $V(F_1,\ldots, F_m)$ itself, and by the Nullstellensatz, $G \in \sqrt {\langle F_1,\ldots, F_m\rangle }$.
Algebraic proof: For each prime ideal that is minimal subject to containing $\langle F_1,\ldots, F_m \rangle$, $D_{F}$ is not in $P_i$ by the definition of $R\text{IPS}$-certificate. Since $GD_{F} \in \langle F_1,\ldots, F_m \rangle \subseteq P_i$, by the definition of prime ideal $G$ must be in $P_i$. Hence $G$ is in the intersection $\bigcap _i P_i$ over all minimal prime ideals $P_i \supseteq \langle F_1,\ldots, F_m \rangle$. This intersection is exactly the radical $\sqrt {\langle F_1,\ldots, F_m \rangle }$.
Any derivation of a polynomial $G$ that is in the radical of an ideal $I$ but not in $I$ itself will require divisions. Although it is not a priori clear that R$\text{IPS}$ could derive even one such $G$, the next example shows that this is the case. In other words, the next example shows that certain derivations require rational functions.
Let $G(x_1, x_2) = x_1$, $F_1(\vec{x}) = x_1^2$, $F_2(\vec{x}) = x_1 x_2$. Then $C(\vec{x}, \vec{y}) = \frac{1}{x_1-x_2}(y_1 - y_2)$ is a R$\text{IPS}$-certificate that $G \in \sqrt {\langle F_1, F_2 \rangle }$: By plugging in one can verify that $C(\vec{x}, \vec{F}(\vec{x})) = G(\vec{x})$. For Condition (0), we see that $V(F_1, F_2)$ is the entire $x_2$-axis, on which $x_1 - x_2$ only vanishes at the origin. However, there is no $\text{IPS}$-certificate that $G \in \langle F_1, F_2 \rangle$, since $G$ is not in $\langle F_1, F_2 \rangle$: $\langle F_1, F_2 \rangle = \lbrace x_1(H_1(\vec{x}) x_1 + H_2(\vec{x}) x_2)\rbrace$ where $H_1, H_2$ may be arbitrary polynomials. Since the only constant of the form $H_1(\vec{x}) x_1 + H_2(\vec{x}) x_2$ is zero, $G(x) = x \notin \langle F_1, F_2 \rangle$.
In the following circumstances a R$\text{IPS}$-certificate can be converted into an $\text{IPS}$-certificate.
Notational convention. Throughout, we continue to use the notation that if $D$ is a function of the placeholder variables $y_i$ (and possibly other variables), then $D_{F}$ denotes $D$ after substituting in $F_i(\vec{x})$ for the placeholder variable $y_i$.
If $C = C^{\prime }/D$ is a R$\text{IPS}$ proof that $G(\vec{x}) \in \sqrt {\langle F_1(\vec{x}),\ldots, F_m(\vec{x}) \rangle }$, such that $D_{F}(\vec{x})$ does not vanish anywhere on the algebraic set $V(F_1(\vec{x}),\ldots, F_m(\vec{x}))$, then $G(\vec{x})$ is in fact in the ideal $\langle F_1(\vec{x}),\ldots, F_m(\vec{x}) \rangle$. Furthermore, there is an $\text{IPS}$ proof that $G(\vec{x}) \in \langle F_1(\vec{x}),\ldots, F_m(\vec{x}) \rangle$ of size $\operatorname{poly}(|C|,|E|)$ where $E$ is an $\text{IPS}$ proof of the unsolvability of $D_{F}(\vec{x}) = F_1(\vec{x}) = \cdots = F_m(\vec{x}) = 0$.
Since $D_{F}(\vec{x})$ does not vanish anywhere on $V(F_1,\ldots, F_m)$, the system of equations $D_F(\vec{x}) = F_1(\vec{x}) = \cdots = F_m(\vec{x}) = 0$ is unsolvable.
Geometric proof idea: The preceding means that when restricted to the algebraic set $V(F_1,\ldots, F_m)$, $D_{F}$ has a multiplicative inverse $\Delta$. Rather than dividing by $D$, we then multiply by $\Delta$, which, for points on $V(F_1,\ldots, F_m)$, amounts to the same thing.
Algebraic proof: Let $E(\vec{x}, \vec{y}, d)$ be an $\text{IPS}$-certificate for the unsolvability of this system, where $d$ is a new placeholder variable corresponding to the polynomial $D_{F}(\vec{x}) = D(\vec{x}, \vec{F}(\vec{x}))$. By separating out all of the terms involving $d$, we may write $E(\vec{x}, \vec{y}, d)$ as $d\Delta (\vec{x}, \vec{y}, d) + E^{\prime }(\vec{x}, \vec{y})$. As $E(\vec{x}, \vec{F}(\vec{x}), D_{F}(\vec{x})) = 1$ (by the definition of $\text{IPS}$), we get:
Finally, we give an upper bound on the size of a circuit for $C_{\Delta }$. The numerator and denominator of a rational function computed by a circuit of size $s$ can be computed individually by circuits of size $O(s)$. The basic idea, going back to Strassen [98], is to replace each wire by a pair of wires explicitly encoding the numerator and denominator, to replace a multiplication gate by a pair of multiplication gates—since $(A/B) \times (C/D) = (A \times C)/(B \times D)$—and to replace an addition gate by the appropriate gadget encoding the expression $(A/B) + (C/D) = (AD + BC)/BD$. In particular, we may assume that a circuit computing $C^{\prime }/D$ has the following form: It first computes $C^{\prime }$ and $D$ separately and then has a single division gate computing $C^{\prime }/D$. Thus from a circuit for $C$, we can get circuits of essentially the same size for both $C^{\prime }$ and $D$. Given a circuit for $E = d^{\prime } \Delta + E^{\prime }$, we get a circuit for $E^{\prime }$ by setting $d^{\prime }=0$. We can then get a circuit for $d^{\prime }\Delta$ as $E - E^{\prime }$. From a circuit for $d^{\prime }\Delta$, we can get a circuit for $\Delta$ alone by first dividing $d^{\prime }\Delta$ by $d^{\prime }$ and then eliminating that division using Strassen [98]. Combining these, we then easily construct a circuit for the $\text{IPS}$-certificate $C_{\Delta }$ of size $\operatorname{poly}(|C|, |E|)$.
Returning to the inversion principle, we find that the certificate from Example A.1 only divided by $\det (X)$, which we already remarked does not vanish anywhere that $XY - I$ vanishes. By the preceding proposition, there is thus an $\text{IPS}$-certificate for the inversion principle of polynomial size, if there is an $\text{IPS}$-certificate for the unsatisfiability of $\det (X) = 0 \wedge XY-I=0$ of polynomial size. In this case, we can guess the multiplicative inverse of $\det (X)$ modulo $XY-I$, namely $\det (Y)$, since we know that $\det (X)\det (Y) = 1$ if $XY=I$. Hence, we can try to find a certificate for the unsatisfiability of $\det (X) = 0 \wedge XY-I=0$ of the form
In fact, for this particular example, we could have anticipated that a rational certificate was unnecessary, because the ideal generated by $XY-I$ is prime and hence radical. (Indeed, the ring is the coordinate ring of the algebraic group , which is an irreducible variety.)
Unfortunately, the Rational Ideal Proof System is not complete, as the next example shows.
Let $F_1(x) = x^2$ and $G(x) = x$. Then $G(x) \in \sqrt {\langle F_1(\vec{x}) \rangle }$, but any R$\text{IPS}$ certificate would show $G(x) D(x) = F_1(x) H(x)$ for some $D, H$. Plugging in, we get $x D(x) = x^2 H(x)$, and by unique factorization we must have that $D(x) = x D^{\prime }(x)$ for some $D^{\prime }$. But then $D$ vanishes identically on $V(F_1)$, contrary to the definition of R$\text{IPS}$-certificate.
To get a more complete proof system, we could generalize the definition of R$\text{IPS}$ to allow dividing by any polynomial that does not vanish to appropriate multiplicity on each irreducible component (see, e.g., Eisenbud [31, Section 3.6] for the definition of multiplicity). For example, this would allow dividing by $x$ to show that $x \in \sqrt {\langle x^2 \rangle }$ but would disallow dividing by $x^2$ or any higher power of $x$. However, the proof of soundness of this generalized system is more involved, and the results of the next section seem not to hold for such a proof system. As of this writing, we do not know of any better characterization of when R$\text{IPS}$ certificates exist other than the definition itself.
A R$\text{IPS}$ certificate is Hilbert-like if the denominator does not involve the placeholder variables $y_i$ and the numerator is $\vec{y}$-linear. In other words, a Hilbert-like R$\text{IPS}$ certificate has the form $\frac{1}{D(\vec{x})}\sum _{i} y_i G_i(\vec{x})$.
If there is a R$\text{IPS}$ certificate that $G \in \sqrt {\langle F_1,\ldots, F_m \rangle }$, then there is a Hilbert-like R$\text{IPS}$ certificate proving the same.
Let $C = C^{\prime }(\vec{x}, \vec{y})/D(\vec{x}, \vec{y})$ be a R$\text{IPS}$ certificate. First, replace the denominator by $D_{F}(\vec{x}) = D(\vec{x}, \vec{F}(\vec{x}))$. Next, for each monomial appearing in $C^{\prime }$, replace all but one of the $y_i$ in that monomial with the corresponding $F_i(\vec{x})$, reducing the monomial to one that is $\vec{y}$-linear.
As in the case of $\text{IPS}$, we only know how to guarantee a size-efficient reduction under a sparsity condition. The following is the R$\text{IPS}$-analogue of Proposition 3.7.
If $C = C^{\prime }/D$ is a R$\text{IPS}$ proof that $G \in \sqrt {\langle F_1,\ldots, F_m \rangle }$, where the numerator $C^{\prime }$ satisfies the same sparsity condition as in Proposition 3.7, then there is a Hilbert-like R$\text{IPS}$ proof that $G \in \sqrt {\langle F_1,\ldots, F_m \rangle }$, of size $\operatorname{poly}(|C|)$.
We follow the proof of Lemma A.9, making each step effective. As in the last paragraph of the proof of Proposition A.5, any circuit with divisions computing a rational function $C^{\prime }/D$, where $C^{\prime },D$ are relatively prime polynomials can be converted into a circuit without divisions computing the pair $(C^{\prime }, D)$. By at most doubling the size of the circuit, we can assume that the subcircuits computing $C^{\prime }$ and $D$ are disjoint. Now replace each $y_i$ input to the subcircuit computing $D$ with a small circuit computing $F_i(\vec{x})$. Next, we apply sparse multivariate interpolation to the numerator $C^{\prime }$ exactly as in Proposition 3.7. The resulting circuit now computes a Hilbert-like R$\text{IPS}$ certificate.
We begin by noting that, since the numerator and denominator can be computed separately (originally due to Strassen [98], see the proof of Proposition A.5 above for the idea), it suffices to prove a lower bound on, for each R$\text{IPS}$-certificate, either the denominator or the numerator.
As in the case of Hilbert-like $\text{IPS}$ and general $\text{IPS}$ (recall Section 6), the set of R$\text{IPS}$ certificates showing that $G \in \sqrt {\langle F_1,\ldots, F_m \rangle }$ is a coset of a finitely generated ideal.
The set of R$\text{IPS}$-certificates showing that $G \in \sqrt {\langle F_1,\ldots, F_m \rangle }$ is a coset of a finitely generated ideal in $R$, where $R$ is the localization of at $\bigcup _i P_i$, where the union is over the prime ideals minimal over $\langle F_1,\ldots, F_m \rangle$.
Similarly, the set of Hilbert-like R$\text{IPS}$ certificates is a coset of a finitely generated submodule of $R^{\prime m}$, where is the localization of at .
The proof is essentially the same as that of Lemma 6.1, but with one more ingredient. Namely, we need to know that the rings $R$ and $R^{\prime }$ are Noetherian. This follows from the fact that polynomial rings over fields are Noetherian, together with the general fact that any localization of a Noetherian ring is again Noetherian.
Exactly analogous to the the case of $\text{IPS}$ certificates, we define general and Hilbert-like R$\text{IPS}$ zero-certificates to be those for which, after plugging in the $F_i$ for $y_i$, the resulting function is identically zero. In the case of Hilbert-like R$\text{IPS}$, these are again syzygies of the $F_i$ but now syzygies with coefficients in the localization .
However, somewhat surprisingly, we seem to be able to go further in the case of R$\text{IPS}$ than $\text{IPS}$, as follows. In general, the ring is a Noetherian semi-local ring, that is, in addition to being Noetherian, it has finitely many maximal ideals, namely $P_1,\ldots, P_k$. Modules over semi-local rings, including ideals, enjoy properties not shared by ideals and modules over arbitrary rings.
In the special case when there is just a single prime ideal $P_1$, the localization is a local ring (just one maximal ideal). We note that this is the case in the setting of the Inversion Principle, as the ideal generated by the $n^2$ polynomials $XY-I$ is prime. Local rings are in some ways very close to fields—if $R$ is a local ring with unique maximal ideal $P$, then $R/P$ is a field—and modules over local rings are much closer to vector spaces than are modules over more general rings. This follows from the fact that $M/P$ is then in fact a vector space over the field $R/P$, together with Nakayama's Lemma (see, e.g., Eisenbud [31, Corollary 4.8] or Reid [88, Section 2.8]). One nice feature is that, if $M$ is a module over a local ring, then every minimal generating set has the same size, which is the dimension of $M/P$ as an $R/P$-vector space. We also get that for every minimal generating set $b_1,\ldots, b_k$ of $M$ (“$b$” for “basis,” even though the word basis is reserved for free modules), for each $m \in M$, any two representations $m = \sum _{i=1}^{k} r_i b_i$ with $r_i \in R$ differ by an element in $PM$. This near-uniqueness could be very helpful in proving lower bounds, as normal forms have proved useful in proving many circuit lower bounds.
Does every R$\text{IPS}$ proof of the $n \times n$ Inversion Principle $XY = I \Rightarrow YX = I$ require computing a determinant? That is, is it the case that for every R$\text{IPS}$ certificate $C=C^{\prime }/D$, some determinant of size $n^{\Omega (1)}$ reduces to one of $C, C^{\prime }, D$ by a $O(\log n)$-depth circuit reduction?
A positive answer to this question would imply that the Hard Matrix Identities do not have $O(\log n)$-depth R$\text{IPS}$ proofs unless the determinant can be computed by a polynomial-size algebraic formula. Since $\text{IPS}$ (and hence R$\text{IPS}$) simulates Frege-style systems in a depth-preserving way (Theorem 3.5), a positive answer would also imply that there are not ($\mathsf {NC}^1$-)Frege proofs of the Boolean Hard Matrix Identities unless the determinant has polynomial-size algebraic formulas. Although answering this question may be difficult, the fact that we can even state such a precise question on this matter should be contrasted with the preceding state of affairs regarding Frege proofs of the Boolean Hard Matrix Identities (which was essentially just a strong intuition that they should not exist unless the determinant is in $\mathsf {NC}^1$).
Let $R$ be a ring. A function $(F_1,\ldots, F_m) = F :R^n \rightarrow R^m$ is called a polynomial map if each coordinate $F_i(\vec{x})$ is a polynomial. Given a polynomial map $F:R^n \rightarrow R^m$, define $F_{*}:R[y_1,\ldots, y_m] \rightarrow R[x_1,\ldots, x_n]$ to be the map of $R$-algebras—i.e., $F_{*}(1) = 1$ and $F_{*}(rf) = r F_{*}(f)$ for all $r \in R$ and all $f \in R[y_1,\ldots, y_m]$—such that $F_{*}(y_i) = F_i(\vec{x})$. For convenience, let $A = R[y_1,\ldots, y_m]$ and $B = R[x_1,\ldots, x_n]$. Then the map $F_{*}:A \rightarrow B$ makes $B$ into an $A$-module by $a \cdot b \stackrel{def}{=}F_{*}(a)b$. The following is a standard definition in commutative algebra and algebraic geometry:
The map $F:R^n \rightarrow R^m$ is finite if the corresponding map $F_{*}$ makes $R[x_1,\ldots, x_n]$ into a finitely generated module over $R[y_1,\ldots, y_m]$.
The key fact that we will need about finite maps is as follows:
Suppose $F:R^n \rightarrow R^m$ is a finite map. Then the image of $F$ is Zariski-closed—equivalently, an algebraic set—in $R^m$.
We may consider $F_1(x_1,\ldots, x_n),\ldots, F_m(x_1,\ldots,x_n)$ as a polynomial map . Then this system of polynomials has a common zero if and only if $\vec{0}$ is the image of $F$. In fact, we show that for any system of equations coming from a Boolean tautology, the system of polynomials has a common zero if and only if $\vec{0}$ is in the closure of the image of $F$ (this is true regardless of whether the equations include $x_i^2 - x_i = 0$, $x_i^2 -1 = 0$, or neither of these).
The preceding is the geometric picture we pursue in this section; now we describe the corresponding algebra. The set of $\text{IPS}$ certificates is the intersection of the ideal $\langle y_1,\ldots, y_m \rangle$ with the coset $1 + \langle y_1 - F_1(\vec{x}),\ldots, y_m - F_m(\vec{x}) \rangle$. The map $a \mapsto 1 - a$ is a bijection between this coset intersection and the coset intersection $\left(1 + \langle y_1,\ldots, y_m \rangle \right) \cap \langle y_1 - F_1(\vec{x}),\ldots, y_m - F_m(\vec{x}) \rangle$. In particular, the system of equations $F_1 = \cdots = F_m = 0$ is unsatisfiable if and only if the latter coset intersection is nonempty.
We show below that if the latter coset intersection contains a polynomial involving only the $y_i$’s—that is, its intersection with the subring (rather than the much larger ideal ) is nonempty—then $\vec{0}$ is not even in the closure of the image of $F$. Hence we call such polynomials “geometric certificates”:
A geometric $\text{IPS}$ certificate that a system of -polynomial equations $F_1(\vec{x}) = \cdots = F_m(\vec{x}) = 0$ is unsatisfiable over is a polynomial such that
A geometric $\text{IPS}$ proof of the unsatisfiability of $F_1 = \cdots = F_m = 0$, or a geometric $\text{IPS}$ refutation of $F_1 = \cdots = F_m = 0$, is an -algebraic circuit on inputs $y_1,\ldots,y_m$ computing some geometric certificate of unsatisfiability.
If $C$ is a geometric certificate, then $1-C$ is an $\text{IPS}$ certificate that involves only the $y_i$’s, somewhat the “opposite” of a Hilbert-like certificate. Hence the smallest circuit size of any geometric certificate is at least the smallest circuit size of any algebraic certificate. We do not know, however, if these complexity measures are polynomially related, as highlighted in the next question.
We call a system of equations “standard Boolean” if it includes $x_i^2 = x_i$ for all $i$ and “multiplicative Boolean” if it includes $x_i^2 = 1$ for all $i$; by “Boolean system of equations” we mean either of these.
For Boolean systems of equations, is Geometric $\text{IPS}$ polynomially equivalent to $\text{IPS}$? That is, is there always a geometric certificate whose circuit size is at most a polynomial in the circuit size of the smallest algebraic certificate?
Although the Nullstellensatz does not guarantee the existence of geometric certificates for arbitrary unsatisfiable systems of equations—and, indeed, geometric certificates need not always exist—for Boolean systems of equations geometric certificates always exist. In fact, this holds for any system of equations that contains at least one polynomial containing only the variable $x_i$, for each variable $x_i$:
Let $R$ be any ring. A Boolean system of equations over $R$—or more generally any system of equations containing, for each variable $x_i$, at least one non-constant equation involving only $x_i$—has a common root if and only if it does not have a geometric certificate.
Let $F_1,\ldots, F_m$ be an unsatisfiable system of equations over $R$ satisfying the conditions of Proposition B.5, and let $F = (F_1,\ldots, F_m) :R^{n} \rightarrow R^{m}$ be the corresponding polynomial map.
First, suppose that $F_1 = \cdots = F_m = 0$ has a solution. Then $\vec{0} \in \mathrm{Im}(F)$, so any $C(y_1,\ldots, y_m)$ that vanishes everywhere on $\mathrm{Im}(F)$, as required by condition (2) of Definition B.3, must vanish at $\vec{0}$. In other words, $C(0,\ldots,0) = 0$, contradicting condition (1). So there are no geometric certificates.
Conversely, suppose that there are no geometric certificates. Then every polynomial $C(\vec{y})$ that vanishes on $\mathrm{Im}(F)$ also vanishes at $\vec{0}$. Hence, by definition, the origin is in the Zariski closure $\overline{\mathrm{Im}(F)}$. But we will now show that in fact the image of $F$ is already closed, so $\mathrm{Im}(F) = \overline{\mathrm{Im}(F)}$, and thus $\vec{0}$ is in the image of $F$.
Since $F$ contains, for each variable $x_i$, one equation involving only $x_i$, $F$ is finite (see Definition B.1). To see this, let $d_i$ be the smallest degree of any of the $F_j$ that depend only on $x_i$; by assumption each $d_i$ is finite and at least one. Then $R[x_1,\ldots, x_m]$ is generated, as a module over $R[F_1,\ldots, F_m]$, by the finite set $\{\vec{x}^{\vec{e}} | ( \forall i)[0 \leq e_i \leq d_i]\}$. By Proposition B.2, the image of $F$ is thus closed, hence is equal to its closure. By the preceding paragraph, this completes the proof.
As we see, the preceding proposition followed almost immediately from standard facts in algebraic geometry, without concern for the nature of the equations coming from a Boolean tautology. However, we also show that even without the equations $x_i^2 = x_i$ (nor $x_i^2 = 1$), if $\varphi$ is an unsatisfiable CNF, then the corresponding set of polynomial equations has a geometric $\text{IPS}$ certificate:
Let be either (1) any algebraically closed field or (2) a dense subfield of (in the Euclidean topology). For a Boolean CNF formula $\varphi (x_1,\ldots, x_n)$ with $m$ clauses, let denote the corresponding polynomial map. Note here that we did not add $x_i^2 - x_i$ (nor $x_i^2 -1$, nor any similar) to $F$.
A Boolean CNF formula $\varphi$ is unsatisfiable if and only if $F_{\varphi }$ has a geometric certificate.
The field of algebraic numbers, and even the field (the smallest subfield of containing both the rationals and $i$) are potentially interesting examples of fields satisfying (2).
Note that in this case we cannot merely apply the idea of Proposition B.5, since the polynomial map $F$ corresponding to $\varphi$ need not be finite nor have closed image, as the following example shows.
Let $\varphi$ be the unsatisfiable CNF $\lnot IND_2$ (for “induction”), namely $\varphi = x \wedge (x \rightarrow y) \wedge (y \rightarrow z) \wedge (\lnot z) = x \wedge (\lnot x \vee y) \wedge (\lnot y \vee z) \wedge (\lnot z)$. This translates into the polynomials
(This example can easily be modified to give an example of a satisfiable CNF with non-closed image; namely, un-negate $z$ and consider $\varphi = x \wedge (\lnot x \vee y) \wedge (\lnot y \vee z) \wedge z$. The intersection of the image of the corresponding $F$ with the $(1,0,c,d)$-plane is the union of the point $(1,0,0,0)$ and the non-closed set $\lbrace (1,0,c,d) | d \ne 0\rbrace$.)
Let be the polynomial map corresponding to $\varphi$ as above. As with Proposition B.5, the key to the proof is that $\vec{0}$ is in the closure $\overline{\mathrm{Im}(F)}$ if and only if $\vec{0}$ is in fact in the image of $F$. The rest of the reasoning is the same as in Proposition B.5. Because $\varphi$ is in CNF, each polynomial $F_i$ is a product of terms, each of which is either $x_i$ or $(1-x_i)$. For such maps $F$, we will show that $\vec{0} \in \overline{\mathrm{Im}(F)}$ if and only if $\vec{0} \in \mathrm{Im}(F)$. Only the “only if” direction is nontrivial.
So suppose $\vec{0}$ is in the closure of the image of $F$. We first prove case (2) (the characteristic zero case) using very little beyond arguments about convergence of Cauchy sequences, then we prove case (1), the case of an arbitrary algebraically closed field.
In both cases, the basic idea is that a clause gets mapped to a product like $x_1 x_2 \cdots x_k (1-x_{k+1}) \cdots (1-x_{\ell })$, and for such a polynomial to approach zero in the limit, one of its factors must approach zero. For each such factor, rather than considering a limit, we simply set its value to zero. (Setting $1-x_k$ to zero amounts to setting $x_k=1$.) In both cases (1) and (2), making this idea rigorous requires some technicalities. We do (2) first only because the technicalities in that case may be more familiar to more readers.
(2) (Dense subfields of .) First, we note that the closure of the image of $F$ in the Zariski topology agrees with its closure in the standard Euclidean topology on , induced by the Euclidean topology on . For , see, e.g., Mumford [79, Theorem 2.33]. For other dense , suppose $\vec{y}$ is in the -Zariski closure of , that is, every -polynomial that vanishes everywhere on also vanishes at $\vec{y}$. By the density of in , every -polynomial that vanishes on also vanishes at $\vec{y}$, so $\vec{y}$ is in the -Zariski closure of and therefore also in the closure of . By the aforementioned result for , there is a Cauchy sequence of points such that each $\vec{v}^{(i)}$ is in and $\vec{y} = \lim _{k \rightarrow \infty } \vec{v}^{(k)}$. As is dense in in the Euclidean topology, is dense in in the Euclidean topology. Thus there is a Cauchy sequence of points such that $|\vec{v}^{(k)} - \vec{v}^{\prime (k)}| \le 1/k$ for all $k$. Hence $\lim _{k \rightarrow \infty } \vec{v}^{\prime (k)} = \lim _{k \rightarrow \infty } \vec{v}^{(k)} = \vec{y}$.
In particular, $\vec{0}$ is in the (Zariski-)closure of the image of $F$ if and only if there is a Cauchy sequence of points $\vec{v}^{(1)}, \vec{v}^{(2)}, \vec{v}^{(3)}, \ldots\,$ in such that $\lim _{k \rightarrow \infty } \vec{v}^{(k)} = 0$. As each $\vec{v}^{(k)}$ is in the image of $F$, there is some point such that $\vec{v}^{(k)} = F(\vec{\nu }^{(k)})$. As the $\vec{v}^{(k)}$ approach the origin, each $F_i(\vec{\nu }^{(k)})$ approaches 0, since it is the $i$th coordinate of $\vec{v}^{(k)}$ ($\vec{v}^{(k)}_i = F_i(\vec{\nu }^{(k)})$).
We will show how to construct a such that $\vec{F}(\vec{\mu }) = \vec{0}$. Without loss of generality, by renumbering if necessary, suppose that $F_1(\vec{x})$ is $x_1 x_2 \cdots x_k (1-x_{k+1})(1-x_{k+2}) \cdots (1-x_{\ell })$. As $F_1(\vec{\nu }^{(k)})$ approaches 0, at least one of its factors must get arbitrarily close to zero infinitely often, say $x_1$. (The case of $1-x_i$ approaching zero, for $k+1 \le i \le \ell$, is handled similarly.) Then there is an infinite subsequence $(\vec{\nu }^{(k_i)})_{i=1,2,3,\ldots\,}$ of $(\vec{\nu }^{(k)})_{k=1,2,3,\ldots\,}$ such that the first coordinates of this subsequence form a Cauchy sequence in whose limit is 0. Since $\vec{F}(\vec{\nu }^{(k)})$ is a Cauchy sequence whose limit is $\vec{0}$, and $\vec{\nu }^{(k_i)}$ is an infinite subsequence of $\vec{\nu }^{(k)}$, we have that $\vec{F}(\vec{\nu }^{k_i})$ is also a Cauchy sequence whose limit is $\vec{0}$. Replace $\vec{\nu }^{(k)}$ by its subsequence $\vec{\nu }^{(k_i)}$ and renumber.
We now have a Cauchy sequence $\vec{F}(\vec{\nu }^{(k)})$ whose limit is zero, and such that at least one of the factors of $F_1$ corresponds to a coordinate of $\vec{\nu }^{(k)}$ that is itself a Cauchy sequence approaching 0 or 1. We now repeat this argument with the new sequence $\vec{\nu }^{(k)}$ for $F_2$, then for $F_3$, and so on. The result is a sequence such that $\vec{F}(\vec{\nu }^{(k)})$ is a Cauchy sequence with limit $\vec{0}$, and such that each $F_i$ has at least one of its factors corresponding to a coordinate $i \in [n]$ such that $\nu ^{(k)}_i$ is a Cauchy sequence in approaching 0 or 1. For each such coordinate, replace $\nu ^{(k)}_i$ with 0 (respectively, 1) for all $k$. Then each $F_j(\vec{\nu }^{(k)})$ is identically zero as a function of $k$. Thus, any coordinates of $\vec{\nu }^{(k)}$ that were not just set are irrelevant, so we may set them to 0 or 1 arbitrarily. The result is that $\vec{\nu }^{(k)} = \vec{\mu } \in \lbrace 0,1\rbrace ^n$ is constant, and we have that $\vec{F}(\vec{\mu })=\vec{0}$.
(1) ( any algebraically closed field.) Here we cannot use an argument based on the Euclidean topology, but there is a suitable purely algebraic analogue, encapsulated in the following lemma:
If $p$ is a point in the Zariski closure of the image of a polynomial map , then there are formal Laurent series 9 $\chi _1(\varepsilon),\ldots, \chi _n(\varepsilon)$ in a new variable $\varepsilon$ such that $F_i(\chi _1(\varepsilon),\ldots, \chi _n(\varepsilon))$ is in fact a power series—that is, involves no negative powers of $\varepsilon$—for each $i=1,\ldots,m$, and such that evaluating the power series $(F_1(\vec{\chi }(\varepsilon)),\ldots, F_m(\vec{\chi }(\varepsilon)))$ at $\varepsilon =0$ yields the point $p$.
Note that the evaluation at $\varepsilon =0$ must occur after applying $F_i$, since each individual $\chi _i$ may involve negative powers of $\varepsilon$.
For a Laurent series $\chi$, let $\deg \chi (\varepsilon)$ denote the lowest degree of $\varepsilon$ that appears in $\chi$ with nonzero coefficient. Note that in a product of several Laurent series, the product of the lowest-degree terms is the lowest-degree term of the product, as this term cannot be cancelled by any other term. So for any Laurent series $\chi _1,\ldots, \chi _k$, we have that $\deg \prod _{i \in [k]} \chi _i = \sum _{i \in [k]} \deg \chi _i$. By a natural convention that is consistent with the preceding facts, we define $\deg 0 = \infty$.
Now, assume that $F_1(\vec{x}) = x_1 \cdots x_{k} (1-x_{k+1}) \cdots (1-x_{\ell })$. The fact that $F_1(\vec{\chi }(\varepsilon))|_{\varepsilon =0} = 0$ means that $F_1(\vec{\chi }(\varepsilon))$ is a power series in $\varepsilon$ whose constant term is 0 or, equivalently, that $\deg F_1(\vec{\chi }(\varepsilon)) \gt 0$. But this is equivalent to
All that remains to check is that when we make these assignments across all the $F_i$ we do not run into a contradiction. For this, note that if $\deg \chi _i \gt 0$, then $\deg (1-\chi _i) = 0$, since $1-\chi _i$ has constant term 1; similarly, if $\deg (1-\chi _i) \gt 0$, then $\deg \chi _i = 0$. Thus, these two possibilities are mutually exclusive, so we arrive at a consistent setting of the $\mu _i$. As argued above, any index $i$ for which $\mu _i$ has not been set is irrelevant, so we may set them arbitrarily. Finally, we arrive at $\vec{F}(\vec{\mu }) = \vec{0}$.
Finally, as with $\text{IPS}$ certificates and Hilbert-like $\text{IPS}$ certificates (see Section 6), a geometric zero-certificate for a system of equations $F_1(\vec{x}),\ldots, F_m(\vec{x})$ is a polynomial $C(y_1,\ldots, y_m) \in \langle y_1,\ldots, y_m \rangle$—that is, such that $C(0,\ldots,0) = 0$—and such that $C(F_1(\vec{x}),\ldots, F_{m}(\vec{x})) = 0$ identically as a polynomial in $\vec{x}$. The same arguments as in the case of algebraic certificates show that any two geometric certificates differ by a geometric zero-certificate, and that the geometric certificates are closed under multiplication. Furthermore, the set of geometric zero-certificates is the intersection of the ideal of (algebraic) zero-certificates $\langle y_1,\ldots, y_m \rangle \cap \langle y_1 - F_1(\vec{x}),\ldots, y_m - F_m(\vec{x}) \rangle$ with the subring . As such, it is an ideal of and so is finitely generated. Thus, as in the case of $\text{IPS}$ certificates, the set of all geometric certificates can be specified by giving a single geometric certificate and a finite generating set for the ideal of geometric zero-certificates, suggesting an approach to lower bounds on the Geometric Ideal Proof System.
We note that geometric zero-certificates are also called syzygies amongst the $F_i$—sometimes “geometric syzygies” or “polynomial syzygies” to distinguish them from the “module-type syzygies” or “linear syzygyies” we discussed above in relation to Hilbert-like $\text{IPS}$. As in all the other cases we have discussed, a generating set of the geometric syzygies can be computed using Gröbner bases, this time using elimination theory: compute a Gröbner basis for the ideal $\langle y_1 - F_1(\vec{x}),\ldots, y_m - F_m(\vec{x}) \rangle$ using an order that eliminates the $x$-variables, and then take the subset of the Gröbner basis that consists of polynomials only involving the $y$-variables. The ideal of geometric syzygies is exactly the ideal of the closure of the image of the map $F$, and for this reason this kind of syzygy is also well-studied. This suggests that geometric properties of the image of the map $F$ (or its closure) may be useful in understanding the complexity of individual instances of $\mathsf {coNP}$-complete problems.
We thank David Liu for many interesting discussions and for collaborating with us on some of the open questions posed in this article. We thank Eric Allender and Andy Drucker for asking whether “Extended Frege-provable PIT” implied that $\text{IPS}$ was equivalent to Extended Frege, which led to the results of Section 5.2. We thank Pascal Koiran for providing the second half of the proof of Proposition 3.2. We thank Iddo Tzameret for useful discussions that led to the second half of Proposition 3.4. We thank Pavel Hrubeš, Iddo Tzameret, and anonymous reviewers for useful feedback. Finally, in addition to several useful discussions, we also thank Eric Allender for suggesting the name “Ideal Proof System”—all of our other potential names did not even hold a candle to this one.
We gratefully acknowledge financial support from NSERC; in particular, J.A.G. was supported by A. Borodin's NSERC_Grant # 482671. During the preparation of this article, J.A.G. was also supported by a Santa Fe Institute Omidyar Fellowship and NSF_grant DMS-1620484.
Author's address: J. A. Grochow, 1111 Engineering Dr., ECOT 717, 430 UCB Boulder, CO, 80309, USA; email: jgrochow@colorado.edu; T. Pitassi, 10 Kings College Road, University of Toronto, Department of Computer Science, Toronto ON M5S 3G4, Canada; email: toni@cs.toronto.edu.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org.
©2018 Copyright held by the owner/author(s). Publication rights licensed to ACM. 0004-5411/2018/12-ART37 $15.00
DOI: https://doi.org/10.1145/3230742
Publication History: Received January 2017; revised February 2018; accepted June 2018