Frederick Trick

Fredman’s Trick Meets Dominance Product: Fine-Grained
Complexity of Unweighted APSP, 3SUM Counting, and More

Timothy M. Chan* Virginia Vassilevska Williams† Yinzhan Xu‡
UIUC MIT MIT
arXiv:2303.14572v1 [cs.DS] 25 Mar 2023
tmc@illinois.edu virgi@mit.edu xyzhan@mit.edu
Abstract
In this paper we carefully combine Fredman’s trick [SICOMP’76] and Matoušek’s approach for
dominance product [IPL’91] to obtain powerful results in fine-grained complexity:
• Under the hypothesis that APSP for undirected graphs with edge weights in {1, 2, . . . , n} requires
n3−o(1) time (when ω = 2), we show a variety of conditional lower bounds, including an n7/3−o(1)
lower bound for unweighted directed APSP and an n2.2−o(1) lower bound for computing the Minimum
Witness Product between two n × n Boolean matrices, even if ω = 2, improving upon their trivial
n2 lower bounds. Our techniques can also be used to reduce the unweighted directed APSP problem
to other problems. In particular, we show that (when ω = 2), if unweighted directed APSP requires
n2.5−o(1) time, then Minimum Witness Product requires n7/3−o(1) time.
• We show that, surprisingly, many central problems in fine-grained complexity are equivalent to their
natural counting versions. In particular, we show that Min-Plus Product and Exact Triangle are subcu-
bically equivalent to their counting versions, and 3SUM is subquadratically equivalent to its counting
version.
• We obtain new algorithms using new variants of the Balog-Szemerédi-Gowers theorem from additive
combinatorics. For example, we get an O(n3.83 ) time deterministic algorithm for exactly counting the
e 4 ) time algorithm.
number of shortest paths in an arbitrary weighted graph, improving the textbook O(n
We also get faster algorithms for 3SUM in preprocessed universes, and deterministic algorithms for
3SUM on monotone sets in {1, 2, . . . , n}d .
* Supported by NSF Grant CCF-2224271.

†
Supported by an NSF CAREER Award, NSF Grant CCF-2129139 and BSF Grant BSF:2012338, a Google Research Fellowship
and a Sloan Research Fellowship.
‡
Partially supported by NSF Grant CCF-2129139.
Contents
1 Introduction 1
1.1 Summary of Our Contributions and New Tool . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Conditional Lower Bounds for Unweighted Directed APSP, Min-Witness Product and Other
Problems with Intermediate Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Equivalence of Counting and Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4 New Variants of the BSG Theorem with Algorithmic Applications . . . . . . . . . . . . . . 9
1.5 Paper Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2 Problem Definitions 11
3 Conditional Lower Bounds for Problems with Intermediate Complexity:

u-dir-APSP under the Strong APSP Hypothesis 13
3.1 Preliminaries: Generalized Equality Products . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.2 The Key Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.3 Consequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4 Equivalences Between Counting and Detection Problems:

#Exact-Triangle vs. Exact-Triangle 17
5 Alternative to BSG: A Triangle Decomposition Theorem and Its Applications 18

5.1 Application 1: Exact Triangle in Preprocessed Universes . . . . . . . . . . . . . . . . . . . 19
5.2 Application 2: 3SUM in Preprocessed Universes . . . . . . . . . . . . . . . . . . . . . . . 20
5.3 Application 3: A Deterministic 3SUM Algorithm for Bounded Monotone Sets . . . . . . . . 20
5.4 Application 4: A Truly Subquartic #APSP Algorithm . . . . . . . . . . . . . . . . . . . . . 21
6 More Lower Bounds under the Strong APSP Hypothesis 22

6.1 Min-Witness Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
6.2 All-Pairs Shortest Lightest Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
6.3 Batched Range Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
6.4 Dynamic Shortest Paths in Unweighted Planar Graphs . . . . . . . . . . . . . . . . . . . . 25
6.5 Min-Witness Equality Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
7 Lower Bounds under the u-dir-APSP Hypothesis 27

7.1 Min-Witness Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
7.2 All-Pairs Shortest Lightest Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
7.3 Batched Range Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
7.4 Dynamic Shortest Paths in Planar Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
7.5 An Equivalence Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
8 More Equivalences Between Counting and Detection Problems 29

8.1 Min-Plus Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
8.2 3SUM Convolution and Min-Plus Convolution . . . . . . . . . . . . . . . . . . . . . . . . 31
8.3 All-Numbers 3SUM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
8.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
9 Counting Algorithms in Other Models 35
9.1 Co-Nondeterministic Algorithms for Counting Problems with Integer Inputs . . . . . . . . . 35
9.2 Co-Nondeterministic Algorithms for Counting Problems with Real Inputs . . . . . . . . . . 37
9.3 Quantum Algorithms for Counting Problems . . . . . . . . . . . . . . . . . . . . . . . . . . 43
9.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
10 BSG Theorems Revisited 43

10.1 A New Simpler Version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
10.2 Application: Improved 3SUM in Preprocessed Universes . . . . . . . . . . . . . . . . . . . 46
10.3 Reinterpreting Gower’s Version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
11 Lower Bounds for Min-Equality Convolution 49
12 Acknowledgement 52
A Still More Equivalences Between Counting and Detection Problems 60

A.1 Exact k-Clique and Minimum k-Clique . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
A.2 Monochromatic Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
A.3 All-Pairs Shortest Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
B Bounded-Difference or Monotone Min-Plus Product from the Triangle Decomposition Theo-

rem 63
C More on BSG 64
1 Introduction
There are many examples of computational problems for which a slight variant can become much harder
than the original: 2-SAT (in P) vs. 3-SAT (NP-complete), perfect matching (in P) vs. its counting version
(#P-complete), and so on. Fine-grained complexity (FGC) has provided explanations for such differences
for many natural problems – e.g. finding a minimum weight triangle in a node-weighted graph is “easy”
(in O(nω ) time [CL09], where ω < 2.373 [AV21] is the matrix multiplication exponent) while finding a
minimum weight triangle in an edge-weighted graph is “hard” (subcubically equivalent to All-Pairs Shortest
Paths (APSP) [VW10, VW18]). Nevertheless, many fundamental questions remain unanswered:
• Seidel [Sei95] showed that APSP in unweighted undirected graphs can be solved in Õ(nω ) time1 . The
fastest algorithm for APSP in unweighted directed graphs by Zwick [Zwi02] runs in O(n2.53 ) for the
current bound of rectangular matrix multiplication [LU18]. If ω = 2, then undirected APSP would have
an essentially optimal Õ(n2 ) time algorithm, whereas Zwick’s algorithm for directed APSP would run
in, slower, Õ(n2.5 ) time.
Assuming that the runtime of Zwick’s algorithm is the best possible for unweighted directed APSP (ab-
breviated u-dir-APSP) has been used as a hardness hypothesis (see e.g. [LPV20, VX20a]). However, so
far there has been no explanation for why u-dir-APSP should be harder than its undirected counterpart.
Q1: Does APSP in directed unweighted graphs require superquadratic time, even if ω = 2?
• Boolean matrix multiplication (BMM) asks to compute, for two n × n Boolean matrices A and B, for
every i, j ∈ {1, . . . , n} whether there is some k so that A[i, k] = B[k, j] = 1. The Min-Witness product
(Min-Witness-Prod) is a well-studied [KL05, VWY10, SYZ11, CY14, KL21] generalization of BMM
in which for every i, j, one needs to instead compute the minimum k for which A[i, k] = B[k, j] = 1.
Similarly to u-dir-APSP, the fastest algorithm for Min-Witness-Prod runs in O(n2.53 ) time [CKL07],
and would run in Õ(n2.5 ) time if ω = 2, whereas BMM would have an essentially optimal algorithm in
that case.
The assumption that Min-Witness-Prod requires n2.5−o(1) time has been used as a hardness hypothesis
(e.g. [LPV20]), but similar to u-dir-APSP, so far there has been no explanation as to why Min-Witness-Prod
should be harder than BMM.
Q2: Does Min-Witness-Prod require superquadratic time, even if ω = 2?
• For a large number of problems of interest within FGC, the known algorithms for finding a solution can
also compute the number of solutions in the same time. This is true for triangle detection in graphs,
BMM, Min-Plus product, 3SUM, Exact Triangle, Negative Triangle, and many more. Is this merely a
coincidence, or are the decision variants of these problems equivalent to the counting variants?
A natural and important question is:
Q3: Are the core FGC problems like 3SUM, Min-Plus product and Exact Triangle easier than their
exact counting variants?
1
The notation Õ(f (n)) denotes f (n) poly log(n).
1
A recent line of work [DL21, DLM20] gives fine-grained reductions from approximate counting to deci-
sion for several core problems of FGC, showing that if the decision versions have improved algorithms,
then one can also obtain fast approximation schemes. As an approximate count can solve the decision
problem, we get that for many key problems approximate counting and decision are equivalent.
This does not imply that exact counting would be equivalent to decision however. In fact, quite often the
decision and counting versions of computational problems can have vastly different complexities. There
are many examples of polynomial time decision problems (e.g. perfect matching [Val79]) whose counting
variants become #P-complete. For many of these problems, polynomial time approximation schemes
are possible (see e.g. [JSV04] for perfect matching), however obtaining a fast exact counting algorithm is
considered infeasible.
Within FGC, there are more examples. Consider for example the case of induced subgraph isomorphism
for pattern graphs of constant size k. It is known (see e.g. [CDM17]) that for every k-node pattern graph H,
counting the induced copies of H in an n-node graph is fine-grained equivalent to counting the k-cliques
in an n-node graph. Meanwhile, the work of [DL21, DLM20] implies that approximately counting the in-
duced k-paths in a graph is fine-grained equivalent to detecting a single k-path. However, induced k-paths
can be found (and hence can be approximately counted) faster than counting k-cliques: combinatorially,
there’s an O(nk−2 ) time algorithm [BKS18], whereas k-clique counting is believed to require nk−o(1)
time; with the use of fast matrix multiplication, if k < 7, k-paths can be found and approximately counted
in the best known running time for (k − 1)-clique detection [VWWY15, BKS18], faster than k-clique.
To reiterate Question Q3 in this context, it asks whether the core FGC problems are like induced k-path
above, or perhaps surprisingly, are equivalent to their counting versions?
The fine-grained problems of interest such as 3SUM, Exact-Triangle, Orthogonal Vectors and more, ad-
mit efficient self-reductions. For such problems, prior work [VW18] has given generic techniques to
show that the problems are fine-grained equivalent to the problem of listing any “small” number of so-
lutions. For example, 3SUM is subquadratically equivalent to listing any truly subquadratic2 number of
3SUM solutions. Thus, as long as the count is small, the listing problem is equivalent to the decision
problem. This technique has become an important technique in FGC, used in many subsequent works
(e.g. [HKNS15, BIS17, KPS17, CMWW19]). The issue is, however, that when the count is actually
“large”, say Ω(n2 ) for 3SUM, the listing approach is too expensive. The question becomes, how do we
count faster than listing when the count is large? Until now (more than 10 years after the conference
version of [VW18]) there has been no technique to do this.
• Recently, wide classes of structured instances of Min-Plus matrix products, Min-Plus convolutions, and
3SUM have been identified which can be solved faster. Chan and Lewenstein [CL15] obtained the first
truly subquadratic algorithm for Min-Plus convolution for monotone sequences over [n] := {1, . . . , n}
(and also integer sequences with bounded differences), and Bringmann, Grandoni, Saha and Vassilevska
W. [BGSV16] obtained the first truly subcubic algorithm for Min-Plus product for integer matrices with
bounded differences. The latter result is in some sense more general, as bounded-difference Min-Plus
convolution reduces to (row/column) bounded-difference Min-Plus matrix product. Both results have
subsequently been generalized or improved [VX20b, GPVX21, CDX22, CDXZ22]. Interestingly, Chan
and Lewenstein’s approach made use of a famous result in additive combinatorics known as the Balog–
Szemerédi–Gowers Theorem (the “BSG Theorem”), whereas [BGSV16]’s approach and all subsequent
2
Truly subquadratic means O(n2−ε ) for some constant ε > 0; truly subcubic means O(n3−ε ) for constant ε > 0, etc.
2
algorithms use more direct techniques without the need for additive combinatorics. So far, there has been
no explanation for why there are two seemingly different approaches. This leads to our last main question:
Q4: What is the relationship between the BSG-based approach and the direct approach to monotone
Min-Plus product/convolution, and could they be unified?
Question Q4 might appear unrelated to the preceding questions, and is more vague or conceptual. But
the hope is that by understanding the relationship better, we may obtain new improved algorithms, since
Chan and Lewenstein’s approach has several applications, e.g. to 3SUM with preprocessed universes and
3SUM for d-dimensional monotone sets in [n]d , which are not handled by the subsequent approaches.
1.1 Summary of Our Contributions and New Tool

1. Conditional hardness for u-dir-APSP. One of the main hypotheses of fine-grained complexity, known
as the “APSP Hypothesis”, is that APSP in n-node graphs with polynomial integer edge weights requires
n3−o(1) time. Meanwhile, the fastest algorithms for APSP [Sei95, Zwi02, Wil18] still fail to solve the
problem in truly subcubic time when the edge weights are in [n]. In fact, even Min-Plus product on n × n
matrices with entries in [n] is not known to be solvable in truly subcubic time: the known algorithm
for small integer entries runs in Õ(M nω ) time [AGM97], and even if ω = 2, this is no better than the
brute-force cubic time algorithm when M = n. Moreover, Zwick [Zwi02] explicitly asks whether there
is a truly subcubic time algorithm for APSP with weights in [n].
We formulate a version of the APSP Hypothesis, which we call the Strong APSP Hypothesis, stating (for
ω = 2) that APSP in graphs with edge weights in [n] requires n3−o(1) time. Under this hypothesis we
show that u-dir-APSP requires n7/3−o(1) time, even if ω = 2.
Thus, we conditionally resolve question Q1: either unweighted directed APSP requires super-quadratic
time and is harder than unweighted undirected APSP (when ω = 2), or APSP in weighted graphs with
weights in [n] is in truly subcubic time.
This is the first fine-grained connection between unweighted and weighted APSP.
2. Conditional hardness for Min-Witness-Prod. We present the first fine-grained reduction from APSP
in unweighted directed graphs to Min-Witness-Prod (which was left open by [LPV20]). Our reduction
implies that, when ω = 2, Min-Witness-Prod requires n7/3−o(1) time unless the current best algorithm
for u-dir-APSP [Zwi02] can be improved. Alternatively, working under the Strong APSP Hypothesis,
we also obtain an n11/5−o(1) lower bound for Min-Witness-Prod.
Thus, either Min-Witness-Prod is truly harder than BMM (if ω = 2), or there is a breakthrough in
unweighted or weighted APSP algorithms. This gives an answer to question Q2.
See Section 1.2 for further discussion and details of our results on Min-Witness-Prod and u-dir-APSP.
3. More hardness results. We give many more fine-grained lower bounds under the Strong APSP Hypoth-
esis for problems such as Batched Range Mode, All Pairs Shortest Lightest Paths, Min Witness Equality
Product, dynamic shortest paths in unweighted planar graphs and more.
4. Counting is equivalent to detection. We resolve question Q3 for several core problems in FGC. We
show that the Min-Plus-Product problem is subcubically equivalent to its counting version, the 3SUM
problem is subquadratically equivalent to its counting version, and the Exact-Weight Triangle problem
3
(Exact-Tri) is subcubically equivalent to its counting version. These are the first fine-grained equiva-
lences between exact counting and decision problem variants, to our knowledge. See Section 1.3 for
further results and discussion.
5. New variants of the BSG Theorem and new algorithms. We formulate a new decomposition theorem
for zero-weight triangles of weighted graphs, which may be viewed as a substitute to the BSG Theorem
but has a simple direct proof, providing an answer to question Q4. Besides being applicable to mono-
tone Min-Plus convolution/products, 3SUM with preprocessed universes and 3SUM with d-dimensional
monotone sets, the theorem yields the first truly subquartic algorithm for the counting version of the
general weighted APSP problem. Our ideas also lead to a new bound on the BSG Theorem itself which
is an improvement for a certain range of parameters. As a result, we obtain an improved new algorithm
for 3SUM with preprocessed universes. See Sections 1.4 and 10 for more details.
Surprisingly, we are able to achieve all of these results with a single new tool, a careful combination
of two known techniques in tackling shortest paths problems in graphs: Fredman’s trick [Fre76] and Ma-
toušek’s approach for dominance product [Mat91].
It has long been known that APSP in general n-node graphs is equivalent to computing the Min-Plus
product of two n×n matrices A and B (Min-Plus-Product), defined as the matrix C with Cij = mink (Aik +
Bkj ). Fredman [Fre76] introduced the following powerful “trick” for dealing with the above minimum: to
determine if Aik + Bkj ≤ Aiℓ + Bℓj , we simply need to check if
Aik − Aiℓ ≤ Bℓj − Bkj .
While seemingly trivial, this idea of comparing a left-hand side that is purely in terms of entries of A to
a right-hand side that is purely in terms of entries of B, leads to a variety of amazing results. Fredman used
it to show that the decision tree complexity of Min-Plus-Product is O(n2.5 ), and not cubic as previously
thought. Practically all subcubic
√ algorithms for APSP (e.g. [Fre76, Tak98, Cha10, Wil18]) including the
3
current fastest n / exp(Θ( log n)) time algorithm by Williams [Wil18] use Fredman’s trick. Several new
truly subcubic time algorithms for variants of APSP (e.g. [CDX22] and, implicitly, [BGSV16]) and recent
algorithms for 3SUM (e.g. [GP18, Cha20]) also use it.
A completely different technique is Matoušek’s truly subcubic time algorithm [Mat91] for Dominance
Product (Dominance-Product). The dominance product of two n × n matrices A and B is the n × n matrix
C such that for all i, j ∈ [n], Cij = |{k ∈ [n] : Aik ≤ Bkj }|. Matoušek gave an approach that combined fast
matrix multiplication with brute-force to obtain an O(n e (3+ω)/2 ) time algorithm for Dominance-Product.
Dominance-Product is known to be equivalent to the so-called Equality Product (Equality-Product) prob-
lem3 which asks to compute Cij = |{k ∈ [n] : Aik = Bkj }|, so we sometimes refer to Equality-Product
instead.
Matoušek’s subcubic time techniques have been used to obtain truly subcubic time algorithms for All-
Pairs Bottleneck Paths [VWY07, DP09], All-Pairs Nondecreasing Paths [Vas10, DJW19], APSP in node-
weighted graphs [Cha10] and more. Unfortunately, the technique has fallen short when applied directly to
the general APSP problem.
In this paper we give a combination of these two techniques that allows us to obtain reductions that
exploit fast matrix multiplication in a new way, thus allowing us to overcome many difficulties, such
as counting solutions when the number of solutions is large. The main ideas involve: (i) division into
3
The equivalence was proven e.g. by [LUWG19, Vas15], but also Matoušek’s algorithm almost immediately works for
Equality-Product.
4
“few-witnesses” and “many-witnesses” cases, (ii) standard witness-finding techniques to handle the “few-
witnesses” case, (iii) hitting sets to hit the “many-witnesses”, (iv) Fredman’s trick (the obvious equivalence
of a + b = a′ + b′ with a − a′ = b′ − b), and lastly (v) Matoušek’s technique for dominance or equality
products (which also involves a division into “low-frequency” and “high-frequency” cases). Steps (iv) and
(v) crucially allow us to handle the “many-witnesses” case, delicately exploiting the small hitting set from
(iii). Individually, each of these ideas is simple and has appeared before. But the novelty lies in how they are
pieced together (see Sections 3, 4, and 5), and the realization that these ideas are powerful enough to yield
all the new results in the above bullets!
In the remainder of the introduction we give more details on each of the bullets above.
1.2 Conditional Lower Bounds for Unweighted Directed APSP, Min-Witness Product and
Other Problems with Intermediate Complexity
u-dir-APSP and Min-Witness-Prod are both “intermediate” problems as dubbed by Lincoln, Polak and
Vassilevska W. [LPV20], a class of matrix product and all-pairs graph problems whose running times are
e 2.5 ) when ω = 2 (and hence right in the middle between the brute-force n3 and and the desired optimal
O(n
n2 ), and for which no O(n2.5−ε ) time algorithms are known. More examples of intermediate problems in-
clude Min-Equality Product and Max-Min Product. A similar class of “intermediate” convolution problems
that are known to be solvable in O(ne 1.5 ) time but not much faster [LPV20] includes Max-Min Convolu-
tion, Minimum-Witness Convolution, Min-Equality Convolution, and pattern-to-text Hamming distances.
e 1.5 )-time algorithms were known since the 1980s for Max-Min convolution [Kos89] and
For instance, O(n
pattern-to-text Hamming distances [Abr87], and these remain the fastest algorithms for these problems.
None of these intermediate problems currently have nontrivial conditional lower bounds under stan-
dard hypotheses in FGC. (Two exceptions are All-Edges Monochromatic Triangle (AE-Mono-Tri) and
Monochromatic Convolution (Mono-Convolution), where a near-n2.5 lower bound is known for the for-
mer under the 3SUM or the APSP Hypothesis [LPV20, VX20a] and a near-n1.5 lower bound is known for
the latter under the 3SUM Hypothesis [LPV20], but one may argue that these monochromatic problems are
not true matrix product or convolution problems since their inputs involve three matrices or sequences rather
than two.)
Thus, an important research direction is to prove super-quadratic conditional lower bounds for interme-
diate matrix product problems, and super-linear lower bounds for intermediate convolution problems. Some
relationships are known between intermediate problems, some of which are illustrated in Figure 1. However,
many questions remain; for instance, it was open whether u-dir-APSP and Min-Witness-Prod are related.
As u-dir-APSP and Min-Witness-Prod are among the “lowest” problems in this class, proving con-
ditional lower bounds for these two problems is especially fundamental. (Besides, Min-Witness-Prod has
been extensively studied, arising in many applications [KL05, VWY10, SYZ11, CY14, KL21].) The precise
time bounds of the current best algorithms for u-dir-APSP by Zwick [Zwi02] and Min-Witness-Prod by
Czumaj, Kowaluk and Lingas [CKL07] are both O(n e 2+ρ ), where ρ ∈ [1/2, 0.529) satisfies4 ω(1, ρ, 1) =
1 + 2ρ.
To explain the hardness of u-dir-APSP, Min-Witness-Prod and more, we introduce a strong version of
the APSP Hypothesis:
Hypothesis 1 (Strong APSP Hypothesis). In the Word-RAM model with O(log n)-bit words, computing
APSP for an undirected5 graph with edge weights in [n3−ω ] requires n3−o(1) randomized time.
4
ω(a, b, c) denotes the rectangular matrix multiplication exponent between an na × nb matrix and an nb × nc matrix.
5
The version of this hypothesis for directed graphs is equivalent: see Remark 6.4.
5
n3 n2.5
[CVX22]
Real-APSP AE-Mono-Tri Max-Min-Prod Min-Witness-Eq-Prod
[LPV20] [CVX21]
[VX20a]
Int-APSP [LPV20]
[LPV20]
[CVX21]+Lem. 6.1
APSP in [n] u-dir-APSP Min-Witness-Prod
open open
open
Figure 1: Previous work on intermediate matrix product problems, assuming ω = 2. Here, APSP in [n] de-
notes APSP with edge weights in [n], Int-APSP denotes APSP with arbitrary O(log n)-bit integer weights,
and Real-APSP denotes APSP with real weights. All unlabelled arrows follow by trivial reductions.
For ω = 2, the above asserts that the standard textbook cubic time algorithms for APSP are near-optimal
when the edge weights are integers in [n]. Due to a tight equivalence [SZ99] between undirected APSP with
edge weights in [M ] and Min-Plus-Product of two matrices with entries in [O(M )], the above hypothesis
is equivalent to the following:
Hypothesis 1′ (Strong APSP Hypothesis, restated). In the Word-RAM model with O(log n)-bit words, com-
puting the Min-Plus product between two n × n matrices with entries in [n3−ω ] requires n3−o(1) randomized
time.
The current upper bound for Min-Plus-Product between two n × n matrices with entries in [M ] is
e
O(min{M nω , n3 }) [AGM97], which is cubic for M = n3−ω , and this has not been improved for over 30
years.
The above strengthening of the APSP Hypothesis is reminiscent of a strengthening of the 3SUM Hy-
pothesis proposed by Amir, Chan, Lewenstein and Lewenstein [ACLL14], which they called the “Strong
3SUM-hardness assumption”, asserting that the 3SUM Convolution problem requires near-quadratic time
for integers in [n].
We prove the following results:
Theorem 1.1. Under the Strong APSP Hypothesis, u-dir-APSP requires n7/3−o(1) time, and Min-Witness-Prod
requires n11/5−o(1) time (on a Word-RAM with O(log n)-bit words).
In fact, for u-dir-APSP, we can obtain a conditional lower bound of n2+β−o(1) for graphs with n1+β
edges, for any 0 < β ≤ 13 . This time bound is tight for graphs of such sparsity. In other words, under the
Strong APSP Hypothesis, the naive O(mn) time algorithm for u-dir-APSP with m edges by repeated BFSs
is essentially optimal for sufficiently sparse graphs, and fast matrix multiplication helps only for sufficiently
large densities. Such lower bounds for sparse graphs were known only for weighted APSP [LWW18, AR18].
Our technique also yields new conditional lower bounds for many other problems, such as All-Pairs
Shortest Lightest Paths (undir-APSLP) or All-Pairs Lightest Shortest Paths (undir-APLSP) for undirected
6
small-weighted graphs (which was first studied by Zwick [Zwi99]), a batched version of the range mode
problem (Batch-Range-Mode) (which has received considerable recent attention [CDL+ 14, VX20b, GPVX21,
GH22, JX22]), and dynamic shortest paths in planar graphs [AD16]. See Table 1 for some of the specific
results, and Section 6 for more discussion on all these problems. As demonstrated by the applications to
range mode and dynamic planar shortest paths, our new technique will likely be useful to proving condi-
tional lower bounds for other data structure problems, serving as an alternative to existing techniques based
on the Combinatorial BMM Hypothesis (see e.g. [AV14]) or the OMv Hypothesis [HKNS15].
We also consider conditional lower bounds based on the hardness of u-dir-APSP itself. The following
hypothesis has been proposed by Chan, Vassilevska W. and Xu [CVX21]:
Hypothesis 2 (u-dir-APSP Hypothesis). In the Word-RAM model with O(log n)-bit words, computing APSP
for an n-node unweighted directed graph requires at least n2+ρ−o(1) randomized time where ρ is the constant
satisfying ω(1, ρ, 1) = 1 + 2ρ.
As noted in Remark 6.3, the u-dir-APSP Hypothesis implies the Strong APSP Hypothesis if ω = 2.
The Strong APSP Hypothesis is thus more believable in some sense, but the u-dir-APSP Hypothesis al-
lows us to prove higher lower bounds. For example, we prove the following conditional lower bound for
Min-Witness-Prod:
Theorem 1.2. Under the u-dir-APSP Hypothesis, Min-Witness-Prod requires n2.223 time, or n7/3−o(1)
time if ω = 2 (on a Word-RAM with O(log n)-bit words).
Earlier papers [LPV20, CVX21] were unable to obtain such a reduction from u-dir-APSP to Min-Witness-Prod
(Chan, Vassilevska W. and Xu [CVX21] were only able to reduce from u-dir-APSP to Min-Witness-Eq-Prod,
but not Min-Witness-Prod). We similarly obtain higher lower bounds for all the other problems, as indi-
cated in Table 1. Our results thus show that the u-dir-APSP Hypothesis is far more versatile for proving
conditional lower bounds than what previous papers [LPV20, CVX21] were able to show.
Lastly, to deal with convolution-type problems, we introduce a similar strong version of the Min-Plus
Convolution Hypothesis:
Hypothesis 3 (Strong Min-Plus Convolution Hypothesis). In the Word-RAM model with O(log n)-bit words,
Min-Plus convolution between two length n arrays with entries in [n] requires n2−o(1) randomized time.
The best algorithm for Min-Plus-Convolution with numbers in [M ] runs in O(M e n) time (by combining
Alon, Galil and Margalit’s technique [AGM97] and Fast Fourier transform), and no truly subquadratic time
algorithm is known for M = n.
We do not know any direct relationship between the Strong APSP Hypothesis and the Strong Min-Plus
Convolution Hypothesis (although when there are no restrictions on weights, it was known that Min-Plus-Convolution
reduces to Min-Plus-Product [BCD+ 14]).
We prove the first conditional lower bounds for one intermediate convolution problem, Min-Equal-Convolution,
under the Strong APSP Hypothesis, the u-dir-APSP Hypothesis or the Strong Min-Plus Convolution Hypoth-
esis:
Theorem 1.3. Min-Equal-Convolution requires n1+1/6−o(1) time under the Strong APSP Hypothesis; or
n1+ρ/2−o(1) time under the u-dir-APSP Hypothesis; or n1+1/11−o(1) time under the Strong Min-Plus Con-
volution Hypothesis.
In the above theorem, the lower bound under the Strong Min-Plus Convolution Hypothesis is the most
interesting, requiring the use of one of our new variants of the BSG Theorem.
7
We should emphasize that the significance of Theorem 1.3 stems from the relative rarity of nontrivial
lower bounds for convolution and related string problems. Some problems may have nω/2 lower bounds by
reduction from BMM (e.g. there was a well-known reduction from BMM to pattern-to-text Hamming dis-
tances, attributed to Indyk – see also [GU18]), but such bounds become meaningless when ω = 2. A recent
paper [CVX22] obtained an n1.5−o(1) lower bound under the OV Hypothesis for a problem called “pattern-
to-text distinct Hamming similarity”, which is far less natural and basic than Min-Equal-Convolution.
Lower Bounds Hypotheses Strong Min-Plus

Strong APSP u-dir-APSP
Problems Convolution
u-dir-APSP n7/3−o(1) Cor. 3.5 n5/2−o(1) trivial N/A
Min-Witness-Prod n11/5−o(1) Cor. 6.6 n7/3−o(1) Cor. 7.2 N/A
undir-APSLP n11/5−o(1) Cor. 6.8 n7/3−o(1) Cor. 7.3 N/A
Batch-Range-Mode n7/6−o(1) Cor. 6.10 n5/4−o(1) Cor. 7.4 N/A
Min-Equal-Convolution n7/6−o(1) Thm. 11.1 n5/4−o(1) Thm. 11.2 n12/11−o(1) Thm. 11.3
Table 1: New conditional lower bounds obtained under the Strong APSP Hypothesis, the u-dir-APSP Hy-
pothesis and the Strong Min-Plus Convolution Hypothesis, assuming ω = 2. (We also have nontrivial lower
bounds even if ω > 2; see the corresponding theorems/corollaries for the precise bounds.)
1.3 Equivalence of Counting and Detection

Here we summarize our main results on counting vs detection.
Equivalence between counting and detection for 3SUM. In #3SUM we are given three sets of numbers
A, B, C and we want to count the number of triples a ∈ A, b ∈ B, c ∈ C such that a + b = c.6 More
complex variants include #All-Nums-3SUM in which we want to count for every c ∈ C, the number of
a ∈ A, b ∈ B such that a + b = c. All of 3SUM, #3SUM and #All-Nums-3SUM can be solved in
O(n2 ) time, via the folklore algorithm. The 3SUM Hypothesis (see [Vas18]) asserts that (in the word-RAM
model of computation with O(log n)-bit words), n2−o(1) time is needed to solve 3SUM. As no reduction
from #3SUM to 3SUM is known till now, a priori it could be that the 3SUM Hypothesis is false, but that
#3SUM still requires n2−o(1) time. We prove:
Theorem 1.4. #3SUM and 3SUM are equivalent under truly subquadratic randomized fine-grained reduc-
tions.
Counting of APSP and Min-Plus-Product. It is known that APSP in n-node graphs is equivalent to
Min-Plus-Product of n × n matrices [FM71]. The counting variant #APSP of APSP asks to count for
every pair of nodes in the given graph, the number of shortest paths between them. The counting variant
of Min-Plus-Product asks, given two matrices A, B to count for all pairs i, j, the number of k such that
Aik + Bkj = minℓ (Aiℓ + Bℓj ), i.e. the number of witnesses of the Min-Plus-Product.
While APSP and Min-Plus-Product are equivalent, #APSP and #Min-Plus-Product are not known to
be. The main issue is that the shortest paths counts can be exponential, and operations on such large numbers
are costly in any reasonable model of computation such as the Word-RAM (see the discussion in Section
1.4). Because of this the fastest known algorithm for #APSP is actually quartic in the number of nodes, and
6
There are several equivalent definitions of 3SUM: the predicate can be a + b = c, or a + b + c = 0; the integers could come
from the same set A, or the input could consist of three sets A, B, C and we require a ∈ A, b ∈ B, c ∈ C. We work with the
version with three input sets and predicate a + b = c.
8
not cubic. While we are able to improve upon the quartic running time (see Section 1.4), the running time is
still supercubic, so it is unclear whether a tight reduction is possible from #APSP to #Min-Plus-Product.
Chan, Vassilevska W. and Xu [CVX21] defined several variants of #APSP that mitigate the existence of
exponential counts. One variant, #mod U APSP computes the counts modulo any O(poly log n)-bit integer
U , and thus no computations with large integers are necessary. The problem can be solved in O(ne 3 ) time,
and can be reduced to #Min-Plus-Product (see Appendix A.3). We prove:
Theorem 1.5. #Min-Plus-Product and Min-Plus-Product are equivalent under truly subcubic fine-grained
reductions. For any O(poly log n)-bit integer U ≥ 2, #mod U APSP and APSP are equivalent under truly
subcubic fine-grained reductions.
Counting Exact Triangles. The Exact Triangle Problem (Exact-Tri) is: given an n-node graph with
O(log n)-bit integer edge weights, to determine whether the graph contains a triangle whose edge weights
sum to some target value t. This problem is known to be at least as hard as both 3SUM and APSP [Păt10,
VW13, VW18], so that if its brute-force cubic time algorithm can be improved upon, then both the 3SUM
Hypothesis and the APSP Hypothesis would be false. Exact-Tri is among the hardest problems in fine-
grained complexity. The counting variant #Exact-Tri asks for the number of triangles with weight t. We
prove:
Theorem 1.6. Exact-Tri and #Exact-Tri are equivalent under truly subcubic fine-grained reductions.
Abboud, Feller and Weimann [AFW20] previously considered the problem of computing the parity of
the number of zero-weight triangles, and it is equivalent to the problem of computing the parity of the
number of triangles with weight t for a given t. Let’s call this Parity-Exact-Tri. They showed that Exact-Tri
can be reduced to Parity-Exact-Tri via a randomized subcubic fine-grained reduction. They were not able to
obtain an equivalence, as it is not obvious at all that the decision problem should be able to solve the parity
problem. Since #Exact-Tri can easily solve Parity-Exact-Tri (if one knows the count, one can take it mod
2), we also get that Parity-Exact-Tri is equivalent to Exact-Tri.
More equivalence results can be found in Section 8 and Appendix A. See Section 8.4 for further discus-
sion on possible implications of our results.
New nondeterministic and quantum counting algorithms. Using our new techniques, we can also pro-
vide efficient nondeterministic algorithms for #Neg-Tri (counting the number of triangles with negative
weights), #Exact-Tri and #3SUM even when there are real-valued inputs. Ours are the first nondetermin-
istic algorithms for these problems that beat their essentially cubic and quadratic deterministic algorithms.
See Section 9.4 for the complexity-theoretic implications of these results.
Our equivalence results can be used to obtain new quantum algorithms. It is known that 3SUM can be
e
solved in O(n) quantum time by an algorithm that uses Grover search (see e.g. [AL20]). However, it is not
clear how to adapt that algorithm to solve #3SUM since Grover search cannot count the number of solutions
e
exactly. By combining the O(n) quantum time 3SUM algorithm with our subquadratic equivalence between
3SUM and #3SUM in a black box way, we immediately obtain the first truly subquadratic time quantum
algorithm for #3SUM.
1.4 New Variants of the BSG Theorem with Algorithmic Applications

As a way to address question Q4, we formulate a new “Triangle Decomposition Theorem”, which fol-
lows from our techniques. The theorem can be stated roughly as follows:
For any weighted n-node graph G and parameter s, there exist O(s3 ) subgraphs G(λ) such that
the set of all triangles of weight zero (or any fixed target value) in G can be decomposed as the
9
union of the set of all triangles in G(λ) , plus a small remainder set of O(n3 /s) triangles. (All
triangles in each G(λ) are guaranteed to have weight zero.)
Thus, the set of all zero-weight triangles in a graph is highly structured in some sense (in particular, if
there are many zero-weight triangles, one can extract large subgraphs in which all triangles are zero-weight
triangles). See Theorem 5.1 for the precise statement. From this theorem, we can easily rederive a subcubic
algorithm for monotone Min-Plus product (as shown in Appendix B), if we don’t care about optimizing the
exponent in the running time. The theorem also leads to several new algorithms, as listed below.
This Triangle Decomposition Theorem resembles a covering version of the BSG Theorem, as formulated
by Chan and Lewenstein [CL15], which can be roughly stated as follows:
For any sets A, B, C in an abelian group and parameter s, there exist O(s) pairs of subsets
(A(λ) , B (λ) ) such that {(a, b) ∈ A×B : a+b ∈ C} can be covered by the union of A(λ) ×B (λ) ,
plus a small remainder set of O(n2 /s) pairs, where the total size of the sum sets7 A(λ) + B (λ)
is O(s6 n).
Thus, the set of all solutions to 3SUM is highly structured in some sense (in particular, if there are many
solutions to 3SUM, one can extract pairs of large subsets (A(λ) , B (λ) ) whose sum sets are small). See
Section 10 for the precise statement and for more background on the BSG Theorem.
We show that our approach, combined with some extra ideas, can actually prove a form of the BSG
Theorem, namely, with the above O(s6 n) bound replaced by O(s e 2 n3/2 ). At first, this new bound appears
weaker—indeed, researchers in additive combinatorics were concerned more with obtaining a linear bound
in n on the sum set size, and less with the dependency on s. However, Chan and Lewenstein’s algorithmic
applications require nonconstant choices of s, and the new bound lowering the s6 factor to s2 turns out to
yield better results in at least one application below (namely, 3SUM with preprocessed universes).
We now mention a few applications of the new theorems:
#APSP for arbitrary weighted graphs. Recall the #APSP problem (counting the number of shortest
paths from u to v, for every pair of nodes u, v in a weighted graph). This basic problem has applications to
betweenness centrality. As mentioned earlier, because counts may be exponentially big, known algorithms
run in near-n4 time in the standard Word-RAM model with O(log n)-bit words. An O(n e 3 )-time algorithm
for #APSP was given by Chan, Vassilevska W. and Xu [CVX21], but only for unweighted graphs. We give
the first truly subquartic #APSP algorithm for arbitrary weighted graphs with running time O(n3.83 ) (or
e 15/4 ) if ω = 2).
O(n
3SUM in preprocessed universes. Chan and Lewenstein [CL15] studied a “preprocessed universe” set-
ting for the 3SUM problem: preprocess three sets A, B, C of n integers (the universes) so that for any given
query subsets A′ ⊆ A, B ′ ⊆ B, and C ′ ⊆ C, we can solve 3SUM on the instance (A′ , B ′ , C ′ ) more quickly
than from scratch. (The problem was first stated by Bansal and Williams [BW12].) Chan and Lewenstein
showed intriguingly that after preprocessing the universes in O(n e 2 ) expected time, 3SUM on the given
e 13/7 ). Our techniques yield a new,
query subsets can be solved in time truly subquadratic in n, namely, O(n
e 2 ) expected preprocessing time and an improved query time of O(n
simpler solution with O(n e 11/6 ). Fur-
e
thermore, with the same O(n 11/6 ) query time, the new algorithm can solve a slight generalization where
′
the query subset C can be an arbitrary set of n integers (i.e., the universe C needs not be specified). Here,
our improvement is even bigger: Chan and Lewenstein’s previous solution for this generalized case required
e 19/10 ) query time.
O(n
7
Throughout the paper, A + B denotes the sum set {a + b : a ∈ A, b ∈ B}, and A − B denotes the difference set {a − b : a ∈
A, b ∈ B}.
10
We also obtain the first result for the analogous problem of Exact-Tri in preprocessed universes: we can
preprocess any weighted n-node graph in O(n e 3 ) time, so that for any given query subgraph, we can solve
e 11/4 ) if ω = 2). This result can be viewed
Exact-Tri in time truly subcubic in n, namely, O(n2.83 ) (or O(n
as a generalization of the result for 3SUM, by known reductions from Exact-Tri to 3SUM (for integers).
3SUM for monotone sets in [n]d . Chan and Lewenstein [CL15] also studied a special case of 3SUM
for monotone sets A, B, C in [n]d for a constant dimension d, where a set of points is monotone if it is of
the form {a1 , . . . , an } where the j-th coordinates of a1 , . . . , an is a monotone sequence for each j. The
problem is related to an important special case of Min-Plus convolution for monotone sequences in [n] (or
integer sequences with bounded differences), which reduces to monotone 3SUM in 2 dimensions. This
is also related to a data structure problem for strings known as jumbled indexing, where d corresponds to
the alphabet size. Chan and Lewenstein gave the first truly subquadratic algorithm for the problem, with
running time of the form O(n2−1/(d+O(1)) ) using randomization. (See [ACLL14, HU17] for conditional
lower bounds on this and related problems.) However, they obtained subquadratic deterministic algorithms
only for d ≤ 7 under the current fast matrix multiplication bounds. Our techniques give the first deterministic
algorithm for all constant d, with running time O(n2−1/O(d) ). Although the new bound is not better (and for
monotone Min-Plus convolution, a recent paper by Chi, Duan, Xie and Zhang [CDXZ22] presented even
faster randomized algorithms), the new approach is simpler, besides being deterministic.
1.5 Paper Organization

In Section 2, we define notations and problems. The rest of the paper has three main threads:
• Conditional lower bounds for problems with intermediate complexity: In Section 3, we illustrate our
approach by proving the first superquadratic lower bound for u-dir-APSP under the Strong APSP Hy-
pothesis. In Sections 6–7, we prove lower bounds for other problems, including Min-Witness-Prod,
under both the Strong APSP Hypothesis and u-dir-APSP Hypothesis.
• Equivalences between counting and detection problems: In Section 4, we illustrate our basic idea by
proving the subcubic equivalence between #Exact-Tri and Exact-Tri. In Section 8, we prove more results
of this kind, including the subquadratic equivalence between #3SUM and 3SUM. (Still more examples
can be found in Appendix A.) In Section 9, we further adapt these ideas to obtain new nondeterministic
and quantum algorithms for counting problems.
• BSG-related theorems: In Section 5, we present our new Triangle Decomposition Theorem and describe
its applications. In Section 10, we describe our new variants of the BSG theorem. (Still more applications
and variants can be found in Appendices B and C.) Finally, in Section 11, we prove the conditional lower
bounds for Min-Equal-Convolution, the most sophisticated of which use one of our new BSG theorems.
Although the paper is lengthy, Sections 3, 4 and 5 should suffice to give the readers an overview of our
main proof techniques. (Readers interested in diving deeper into any of the above threads may proceed to
the subsequent sections.)
2 Problem Definitions
For integer n ≥ 1, we use [n] to denote {1, 2, . . . , n} and use [±n] to denote {−n, −(n − 1), . . . , n}.
For a predicate X, we use [X] to denote a {0, 1} value which evaluates to 1 if X is true, and 0 otherwise.
11
We use M (n1 , n2 , n3 ) to denote the (randomized) running time to multiply an n1 × n2 matrix with an
n2 × n3 matrix. We consider the Word-RAM model of computation with O(log n)-bit words, and all input
numbers are O(log n)-bit integers, unless otherwise stated.
Problem 2.1 (Min-Plus-Product). Given an n1 × n2 matrix A and an n2 × n3 matrix B, compute the

n1 × n3 matrix C where Cij = mink∈[n2] (Aik + Bkj ). We denote C by A ⋆ B.
Furthermore, we use M ∗ (n1 , n2 , n3 | ℓ) to denote the (randomized) running time to compute the Min-
Plus product between an n1 × n2 matrix and an n2 × n3 matrix where the entries of both matrices are from
[ℓ] ∪ {∞}.
Problem 2.2 (APSP). Given an edge-weighted n-node graph, compute its all-pairs shortest path distances.
Problem 2.3 (Min-Plus-Convolution). Given two length n arrays A, B, compute the length n array C,
where Ci = mink∈[i−1] (Ak + Bi−k ).
Problem 2.4 (Exact-Tri). Given an edge-weighted n-node graph G and a target value t, determine whether
G contains a triangle whose edge weights sum up to t. In its All-Edges version (AE-Exact-Tri), we need to
determine for every edge e, whether e is contained in a triangle whose edge weights sum up to t.
Problem 2.5 (Neg-Tri). Given an edge-weighted n-node graph G, determine whether G contains a triangle
whose edge weights sum up to a negative value. In its All-Edges version (AE-Neg-Tri), we need to determine
for every edge e, whether e is contained in a triangle whose edge weights sum up to a negative value.
Problem 2.6 (Exact-k-Clique). Given an edge-weighted n-node graph G and a target value t, determine
whether G contains a k-clique whose edge weights sum up to t.
Problem 2.7 (Min-k-Clique). Given an edge-weighted n-node graph G, determine the minimum edge
weight sum over all k-cliques in G.
Problem 2.8 (3SUM). Given three sets of numbers A, B, C of size n, determine if there exist a ∈ A, b ∈
B, c ∈ C such that a + b = c. In its All-Numbers version (All-Nums-3SUM), we need to determine for
each c ∈ C, whether there exist a ∈ A, b ∈ B such that a + b = c.
Problem 2.9 (3SUM-Convolution). Given three arrays of numbers A, B, C of lengths n, determine if there
exist i, j ∈ [n] such that Ai + Bj = Ci+j . In its All-Numbers version (All-Nums-3SUM-Convolution), we
need to determine for each k ∈ [n], whether there exist i, j ∈ [n] such that i + j = k and Ai + Bj = Ck .
Problem 2.10 (Mono-Convolution). Given three arrays of numbers A, B, C of length n, determine for
every k ∈ [n], whether there exist i, j ∈ [n] such that i + j = k and Ai = Bj = Ck .
All the problems defined above have natural counting variants, which will be denoted by adding a “#”
prefix to the problems’ names. For a detection problem that outputs a single bit or multiple bits, each
bit represents whether an object that satisfies some requirements exists among a set of candidates; in its
counting variant, we need to output a number in place of each bit to denote the number of objects that satisfy
the requirements among the same set of candidates. For instance, #Exact-Tri asks to count the number of
triangles in the graph whose edge weights sum up to t, and #AE-Exact-Tri asks to count, for each edge
e, the number of triangles containing e whose edge weights sum up to t. For a minimization problem that
outputs one or multiple optimal values, its counting variant needs to output the number of optimal solutions
in place of each optimal value. For instance, #Min-Plus-Product asks for each (i, j) ∈ [n1 ] × [n3 ], the
12
number of k where Aik + Bkj = (A ⋆ B)ij . One special problem is the #mod U APSP problem, in which we
need to compute the number of shortest paths mod U for every pair of nodes.
Clearly, for any of the detection problems, its counting variant is (not necessarily strictly) harder than its
original version. The same is not obviously true for minimization problems.
Problem 2.11 (u-dir-APSP). Given an unweighted directed graph, compute its all-pairs shortest path dis-
tances.
Problem 2.12 (Min-Witness-Prod). Given an n1 × n2 Boolean matrix A and an n2 × n3 Boolean matrix
B, compute min{k ∈ [n2 ] : Aik ∧ Bkj } for every (i, j) ∈ [n1 ] × [n3 ].
Problem 2.13 (Equality-Product). Given an n1 × n2 matrix A and an n2 × n3 matrix B, compute the
number of k ∈ [n2 ] such that Aik = Bkj for every (i, j) ∈ [n1 ] × [n3 ].
Problem 2.14 (Min-Witness-Eq-Prod). Given an n1 × n2 matrix A and an n2 × n3 matrix B, compute
min{k ∈ [n2 ] : Aik = Bkj } for every (i, j) ∈ [n1 ] × [n3 ].
Problem 2.15 (Min-Equal-Prod). Given an n1 ×n2 matrix A and an n2 ×n3 matrix B, compute min{Aik :
k ∈ [n2 ] ∧ Aik = Bkj } for every (i, j) ∈ [n1 ] × [n3 ].
Problem 2.16 (undir-APSLP1,2). Given an n-node undirected graph whose edge weights are either 1 or
2, compute for every pair of nodes s, t, the smallest number of edges required to travel from s to t, and the
minimum weight over all paths using the smallest number of edges.
Problem 2.17 (undir-APLSP1,2). Given an n-node undirected graph whose edge weights are either 1 or 2,
compute for every pair of nodes s, t, the shortest path distance from s to t, and the smallest number of edges
among all shortest paths.
Problem 2.18 (Batch-Range-Mode). Given an length N array and Q intervals of the array, compute the
element that appears the most (breaking ties arbitrarily) for each of the intervals.
Problem 2.19 (Min-Equal-Convolution). Given two length n arrays A, B, compute the length n array C,
where Ci = min{Aj : j ∈ [i − 1] ∧ Aj = Bi−j }.
3 Conditional Lower Bounds for Problems with Intermediate Complexity:

u-dir-APSP under the Strong APSP Hypothesis
In this section, we apply the idea of combining Fredman’s trick with Equality Product to prove condi-
tional lower bounds for problems with intermediate complexity. To illustrate the idea, we focus on lower
bounds for the u-dir-APSP problem under the Strong APSP Hypothesis, but the approach also leads to
lower bounds for Min-Witness-Prod and other problems, under the Strong APSP Hypothesis as well as
the u-dir-APSP Hypothesis, as we will explain later in Sections 6–7. As noted earlier, the Strong APSP
Hypothesis is equivalent to the hypothesis that M ∗ (n, n, n | n3−ω ) is not truly subcubic. Thus, it suffices to
describe fine-grained reductions from the Min-Plus-Product problem for two n × n matrices with bounded
integer entries from [n3−ω ], to the u-dir-APSP problem.
When devising a reduction from one problem to another (as in typical NP-hardness proofs), we of-
ten concentrate on understanding the power of the latter problem. In our new reductions, we will focus
mostly on the former problem instead, interestingly: we will attempt to design an algorithm to solve the
Min-Plus-Product problem for bounded integers in subcubic time, and only at the end, reveal how an
oracle for u-dir-APSP (or Min-Witness-Prod and other problems) could help.
13
3.1 Preliminaries: Generalized Equality Products
By Matoušek’s technique for dominance product [Mat91, Yus09], the equality product of an n1 × n2
and an n2 × n3 matrix can be computed in time O e minr (n1 n2 n3 /r + M (n1 , rn2 , n3 )) . For example, in
e
the case n1 = n2 = n3 = n, the bound is at most O(min 3 ω e (3+ω)/2 ), as we have stated
r (n /r + rn )) = O(n
before, if we don’t use rectangular matrix multiplication exponents. We begin with the following lemma
describing a straightforward generalization, which will be useful later:
Lemma 3.1. Given n1 × n2 matrices A and A′ , and n2 × n3 matrices B and B ′ , define the generalized
equality product of (A, A′ ) and (B, B ′ ) to be the following n1 × n3 matrix E:
Eij := min (A′ik + Bkj

′
).
k: Aik =Bkj
Suppose that all matrix entries of A′ and B ′ are in [±ℓ] ∪ {∞}. For any r, we can compute E in time

e n1 n2 n3 /r + M ∗ (n1 , rn2 , n3 | ℓ) .
O
Proof. Sort Bk = (Bkj )j∈[n3 ] , i.e., the k-th column of B. Let Fk be the set of elements that have frequency
more than n3 /r in Bk . Note that |Fk | ≤ r. We divide into two cases, computing two n1 × n3 matrices E L
and E H ; the answers will be Eij = min{Eij L , E H }.
ij
L
– Low-frequency case: computing Eij = min ′
(A′ik + Bkj L = ∞. For each
). Initially set Eij
k: Aik =Bkj 6∈Fk
i ∈ [n1 ] and k ∈ [n2 ], if Aik 6∈ Fk , we examine each of the at most n3 /r indices j with Aik = Bkj , and
L = min{E L , A′ + B ′ }. All this takes O(n n · n /r) time.
reset Eij ij ik kj 1 2 3
H
– High-frequency case: computing Eij = min ′
(A′ik + Bkj ). For each i ∈ [n1 ], k ∈ [n3 ], and p ∈
k: Aik =Bkj ∈Fk
Fk , let Âi,(k,p) = A′ik if Aik = p, and Âi,(k,p) = ∞ otherwise. For each k ∈ [n3 ], j ∈ [n2 ], and p ∈ Fk , let
′ if B H
B̂(k,p),j = Bkj kj = p, and B̂(k,p),j = ∞ otherwise. We let Eij = mink∈[n2 ],p∈Fk (Âi,(k,p) + B̂(k,p),j ).
This can be computed by a Min-Plus product in O(M ∗ (n1 , rn2 , n3 | ℓ)) time.

3.2 The Key Reduction

We now present an approach to solve the Min-Plus-Product problem for two matrices with integer
entries in [ℓ], by reducing it to instances that have simultaneously a smaller inner dimension s and smaller
integer entries in [t] with t ≤ ℓ:
Theorem 3.2. For any r, s, t with s ≤ n2 and t ≤ ℓ,

e (n2 /s)M ∗ (n1 , s, n3 | t) + sn1 n2 n3 /r + sM ∗ (n1 , rn2 , n3 | ℓ/t) .
M ∗ (n1 , n2 , n3 | ℓ) = O
Proof. Let g = ⌈ℓ/t⌉. Let A be an n1 × n2 matrix and B be an n2 × n3 matrix, where all matrix entries
are in [ℓ] ∪ {∞}. We describe an algorithm to compute the Min-Plus product of A and B. Without loss
of generality, we may assume that (Aik mod g) < g/2 for all i, k with Aik finite, since we can separate
the problem into two instances (A< , B) and (A≥ , B) for two matrices A< and A≥ , where A<ik = Aik and
≥ < ≥
Aik = ∞ if (Aik mod g) < g/2, and Aik = ∞ and Aik = Aik − g/2 if (Aik mod g) ≥ g/2. Similarly, we
may assume that (Bkj mod g) < g/2 for all k, j with Bkj finite.
14
For each i, k, write Aik as A′ik g + A′′ik with 0 ≤ A′ik ≤ t and 0 ≤ A′′ik < g/2. Similarly, for each k, j,
write Bkj as Bkj ′ g + B ′′ with 0 ≤ B ′ ≤ t and 0 ≤ B ′′ < g/2. (Set A′ = A′′ = ∞ if A = ∞, and
kj kj kj ik ik ik
′ ′′
Bkj = bkj = ∞ if Bkj = ∞.)
We first compute the Min-Plus product C ′ of A′ and B ′ (i.e., Cij ′ = min (A′ + B ′ )), in time
k ik kj
O(M ∗ (n1 , n2 , n3 | t)) ≤ O((n2 /s) · M ∗ (n1 , s, n3 | t)).
Let Wij = {k ∈ [n2 ] : A′ik + Bkj ′ = C ′ }; the elements in W are the witnesses for C ′ . The Min-Plus
ij ij ij
product C of A and B is given by Cij = Cij ′ g + C ′′ , where
ij
′′
Cij := min (A′′ik + Bkj
′′
).
k∈Wij
It suffices to describe how to compute C ′′ . We divide into two cases:
• Few-witnesses case: computing Cij ′′ for all i, j with |W | ≤ n /s. For each such (i, j), we will
ij 2
explicitly enumerate all witnesses in Wij . This can be done by standard techniques for witness find-
ing [AGMN92, Sei95]: first, observe that if the witness is unique (i.e., |Wij | = 1), it can be found by per-
forming O(log n2 ) Min-Plus products (namely, for each ℓ ∈ [log n2 ], the ℓ-th bit of the witness for Cij′ is 1
′ ′
iff mink∈Kℓ (Aik + Bkj ) = Cij , where Kℓ := {k ∈ [n2 ] : the ℓ-th bit of k is 1}). We take a random subset
′ )
R ⊆ [n2 ] of s indices, and find witnesses for the Min-Plus product of (A′ik )i∈[n1 ],k∈R and (Bkj k∈R,j∈[n3 ]
if the witnesses are unique. This takes O(M e ∗ (n1 , s, n3 | t)) time. Fix i, j with |Wij | ≤ n2 /s. For a
fixed element w ∈ Wij , the probability that w is found, i.e., w is in R but no other element of Wij is in
R, is Ω((s/n2 ) · (1 − s/n2 )n2 /s ) = Ω(s/n2 ). By repeating O((n2 /s) log(n1 n2 n3 )) times (with different
choices of R), all witnesses in Wij would be found w.h.p. Once the entire witness set Wij is found, we
can compute each Cij ij
e 2 /s) · M ∗ (n1 , s, n3 | t)).
′′ naively in O(|W |) time. The total running time is O((n
• Many-witnesses case: computing Cij ′′ for all i, j with |W | > n /s. Pick a random subset H of size
ij 2
c0 s log(n1 n2 n3 ) for a sufficiently large constant c0 . Then H hits (i.e., intersects) every witness set Wij
with |Wij | > n2 /s w.h.p.
′ (i.e.,
We do the following: for each k0 ∈ H and for each i ∈ [n1 ], j ∈ [n3 ], if A′ik0 + Bk′ 0 j = Cij
k0 ∈ Wij ), set
′′
Cij = ′ ′
min ′ ′
′′
(A′′ik + Bkj ). (1)
k: Aik −Aik =Bk −Bkj
0 0j
Correctness of (1) follows immediately from “Fredman’s trick”: A′ik − A′ik0 = Bk′ 0 j − Bkj
′ is equivalent
′ ′ ′ ′ ′ ′
to Aik + Bkj = Aik0 + Bk0 j , which is equivalent to k ∈ Wij , assuming Aik0 + Bk0 j = Cij ′ . Thus, the
above correctly computes Cij ′′ for every i, j with |W | > n /s, since H hits W .
ij 2 ij
Finally, we observe that for a fixed k0 , the right-hand side in (1) corresponds precisely to a generalized
equality product! By Lemma 3.1, they can be computed in O(n e 1 n2 n3 /r + M ∗ (n1 , rn2 , n3 | g)) time, for
e
each of the O(s) choices of k0 .

Note that Fredman’s trick was originally introduced to solve APSP or compute Min-Plus products for
arbitrary real-valued inputs. It is interesting that the trick is useful even when input values are in a restricted
integer range (in [t]).
Note also that a more naive attempt to prove the above theorem is to just bound M ∗ (n1 , n2 , n3 | ℓ) by
(n2 /s)M ∗ (n1 , s, n3 | ℓ), and once the inner dimension n2 is reduced to s in the subproblems, try to use
hashing to reduce the range of the integers to, say, [O(se 2 )]. However, while such a hashing approach might
work for equality-type problems (e.g., AE-Exact-Tri), it does not work at all for Min-Plus-Product.
15
3.3 Consequences
In Theorem 3.2, we can directly bound the third term by using existing matrix multiplication results [AGMN92],
leading to the following corollary:
Corollary 3.3. For any constant 0 < β ≤ (3 − ω)/2, if M ∗ (n, nβ , n | n2β ) = O(n2+β−ε ), then
M ∗ (n, n, n | n3−ω ) = O(n3−Ω(ε) ).
Proof. By Theorem 3.2,

e (n/s)M ∗ (n, s, n | t) + sn3 /r + (sn3−ω /t)M (n, rn, n)
M ∗ (n, n, n | n3−ω ) = O

e (n/s)M ∗ (n, s, n | t) + sn3 /r + rsn3 /t .
≤ O
′
Setting r = nβ , s = nβ−ε , and t = n2β with ε′ = ε/2 yields M ∗ (n, n, n | n3−ω ) = O(n3−Ω(ε) ).
The above corollary establishes a conditional lower bound of n2+β−ε for the subproblem of Min-Plus
product for rectangular matrices of dimension n × nβ and nβ × n for integers bounded by n2β , under the
Strong APSP Hypothesis. This lower bound is tight in the sense that O(n2+β ) is an obvious upper bound
(though the range of allowed integer values [n2β ] may not be tight). We will now use this corollary to derive
conditional lower bounds for u-dir-APSP.
Let U - DIR - APSP (n) be the time complexity of u-dir-APSP on n-node graphs. More generally, let
U - DIR - APSP (n, m) be the time complexity of APSP on an unweighted directed graph with n nodes and
m edges. Chan, Vassilevska W., and Xu [CVX21] have given a simple reduction of Min-Plus product for
rectangular matrices of certain inner dimensions and integer ranges to u-dir-APSP, as summarized by the
following lemma (it is easy to check that the graph in their reduction has O(nx) edges).
Lemma 3.4. For any x, y, we have M ∗ (n, x, n | y) = O(U - DIR - APSP (n, nx)) if xy ≤ n.
Combining Corollary 3.3 and Lemma 3.4 immediately gives the following:
Corollary 3.5. For any constant β ≤ min{1/3, (3 − ω)/2}, if U - DIR - APSP (n, n1+β ) = O(n2+β−ε ), then
M ∗ (n, n, n | n3−ω ) = O(n3−Ω(ε) ).
By setting β = 1/3, we have thus proved that u-dir-APSP cannot be solved in O(n7/3−ε ) time under
the Strong APSP Hypothesis, assuming 1/3 ≤ (3 − ω)/2 (this assumption can be removed, as we observe
later in Section 7). In particular, if ω = 2, this implies that u-dir-APSP is strictly harder than unweighted
undirected APSP, as the latter problem can be solved in O(n e ω ) time [Sei95].
Furthermore, u-dir-APSP for a graph with n1+β edges cannot be solved in O(n2+β−ε ) time for any
β ≤ 1/3 under the same hypothesis, assuming 1/3 ≤ (3 − ω)/2 (again this assumption can be removed).
In other words, the naive algorithm by repeated BFSs is essentially optimal for sufficiently sparse graphs.
If we assume a weaker hypothesis that APSP does not have truly subcubic algorithms for edge weights
in [n] instead of [n3−ω ], it can be checked that we still get a lower bound near n2+(3−ω)/3 for u-dir-APSP.
In fact, assuming that APSP does not have truly subcubic algorithms for edge weights in [nλ ], we can still
obtain a super-quadratic lower bound for u-dir-APSP for λ as large as 1.99, if ω = 2. For simplicity, we
will concentrate only on the version of the Strong APSP Hypothesis with λ = 3 − ω throughout the paper.
In Sections 6–7, we will use the same approach to derive further conditional lower bounds for Min-Witness-Prod,
undir-APLSP1,2, and Batch-Range-Mode, from both the Strong APSP Hypothesis and the u-dir-APSP
Hypothesis.
16
4 Equivalences Between Counting and Detection Problems:
#Exact-Triangle vs. Exact-Triangle
In this section, we describe a simple approach to proving equivalence between counting and detection
problems, by combining Fredman’s trick with Equality Product. To illustrate the basic idea, we focus on the
equivalence of #AE-Exact-Tri and AE-Exact-Tri. With more work and additional ideas, the approach can
also establish the equivalence of #3SUM and 3SUM, as we will later explain in Section 8.
We use G to denote the input graph of an AE-Exact-Tri or #AE-Exact-Tri instance, we use w to denote
the weight function, and we use Wij to denote the set of k where (i, j, k) forms a triangle whose edge
weights sum up to the target value t.
Lemma 4.1. Given an n node graph G, a target value t, and a subset S ⊆ V (G), we can compute a matrix
e
D in O(|S| · n(3+ω)/2 ) time such that Dij = |Wij | whenever S ∩ Wij 6= ∅.
(s)
Proof. For every s ∈ S, we do the following. Let A(s) be a matrix where Aik = w(i, k) − w(i, s) and
(s) e (3+ω)/2 )
Bkj = w(s, j) − w(k, j). Then we use compute the equality product C (s) of A(s) and B (s) in O(n
(s)
time for each s [Mat91]. Finally, if there exists s ∈ S such that Ais + Bsj + w(i, j) = t, we let Dij be Cij
for an arbitrary s with the property; otherwise, we let Dij be 0 (we don’t care about its value in this case).
e
The running time for computing D is clearly O(|S| · n(3+ω)/2 ).
(s)
Suppose S ∩ Wij 6= ∅ for some (i, j). Then Dij equals Cij for some s where Ais + Bsj + w(i, j) = t.
(s) (s)
By Fredman’s trick, Aik = Bkj if and only if w(i, k) + w(k, j) + w(i, j) = w(i, s) + w(s, j) + w(i, j) = t.
(s)
Therefore, Dij = Cij = |Wij |.
Theorem 4.2. If AE-Exact-Tri for n-node graphs has an O(n3−ε ) time algorithm for some ε > 0, then
′
#AE-Exact-Tri for n-node graphs has an O(n3−ε ) time algorithm for some ε′ > 0
Proof. Given a #AE-Exact-Tri instance on an n-node graph G, we first list up to n0.99 elements in Wij for
every i, j. By well-known techniques (e.g. [VW18]), an O(n3−ε ) time AE-Exact-Tri algorithm implies an
′′
O(n3−ε ) for ε′′ > 0 time algorithm for listing up to n0.99 witnesses for each (i, j) in an AE-Exact-Tri
instance.
If we list less than n0.99 elements for some (i, j), we can output the number of elements we list as the
e 2.99 ) time, we can
exact witness count for (i, j). By the standard greedy algorithm for hitting set, in O(n
e 0.01
find a set S of size O(n ) that intersect with Wij for the remaining pairs (i, j). Therefore, we can apply
Lemma 4.1 to compute the number of witnesses for these remaining (i, j) pairs in O(|S| e · n(3+ω)/2 ) ≤
2.70
O(n ) time.
e 3−ε′′ + n2.99 + n2.70 ), which is truly
The total running time for the #AE-Exact-Tri instance is thus O(n
subcubic.
The reduction from AE-Exact-Tri to #AE-Exact-Tri is trivial.
Remark 4.3. Given Theorem 4.2, it is simple to derive a subcubic equivalence between Exact-Tri and
#Exact-Tri. First, the reduction from Exact-Tri to #Exact-Tri is trivial. To reduce #Exact-Tri to Exact-Tri,
we first reduce #Exact-Tri to #AE-Exact-Tri in the trivial way, then use Theorem 4.2 to further reduce it to
AE-Exact-Tri, and finally reduce it to Exact-Tri by known reductions [VW18].
17
In Section 8, we will use a similar approach to obtain other equivalence results between counting and
detection problems. In particular, the proof of subquadratic equivalence between #3SUM and 3SUM will re-
quire further technical ideas: we will need to exploit or modify known reductions from All-Nums-3SUM-Convolution
to AE-Exact-Tri, and 3SUM to 3SUM-Convolution.
5 Alternative to BSG: A Triangle Decomposition Theorem and Its Applica-

tions
In this section, we introduce a decomposition theorem for zero-weight triangles, which encapsulates
some of the key ideas we have used, and which may be viewed as an alternative to the BSG Theorem. We
will describe applications of this decomposition theorem to some algorithmic problems that were previously
solved via the BSG Theorem by Chan and Lewenstein [CL15], as well as a new application to the #APSP
problem for arbitrary weighted graphs.
In a weighted tripartite graph G with node sets U , X, and V , let Triangles(G) denote the set of all
triangles in U × X × V in G, and let Zero-Triangles(G) denote the set of all zero-weight triangles in
U × X × V in G.
Theorem 5.1. (Triangle Decomposition) Given a real-weighted tripartite graph G with n1 , n2 , and n3
e 3)
nodes in its three parts U , X, and V , and given a parameter s, there exist a collection of ℓ = O(s
e 1 n2 n3 /s) triangles, such that
subgraphs G(1) , . . . , G(ℓ) of G, and a set R of O(n
ℓ
[
Zero-Triangles(G) = R ∪ Triangles(G(λ) ).
λ=1
e 1 n2 n3 + sn1 n2 +
The subsets in the union above are disjoint. The G(λ) ’s and R can be constructed in O(n
sn2 n3 + s n1 n3 ) deterministic time. And if the edge weights between U and V later change, then the G(λ) ’s
2
e 1 n2 n3 /s + s2 n1 n3 ) time.
and R can be updated in O(n
Furthermore, the subgraphs can be grouped into O(s) e e 2 ) subgraphs each, such that if
categories of O(s
an edge uv ∈ U × V is present in one subgraph, it is present in all subgraphs of the same category.
Proof. Let {ui : i ∈ [n1 ]}, {xk : k ∈ [n2 ]}, and {vj : j ∈ [n3 ]} be the nodes of the three parts. Let the
weight of ui xk be aik , the weight of xk vj be bkj , and the weight of ui vj be −cij .
As a preprocessing step, we sort the multiset {aik + bkj : k ∈ [n2 ]} for each i ∈ [n1 ] and j ∈ [n3 ], in
e 1 n2 n3 ) total time. Let Wij = {k ∈ [n3 ] : aik + bkj = cij }. Note that we can generate the elements in
O(n
e
Wij by searching in the sorted lists in O(1) time per element.
• Few-witnesses case. For each i, j with |Wij | ≤ n2 /s, add the triangle ui xk vj to R for every k ∈ Wij . The
number of triangles added to R is O(n1 n3 · n2 /s). The running time of this step is also O(ne 1 n3 · n2 /s).
• Many-witnesses case. Find a set H ⊆ [n2 ] of O(s log(n1 n2 n3 )) nodes that hit all Wij with |Wij | >
n2 /s. Here, we can use the standard greedy algorithm for hitting sets, which is deterministic and takes
time linear in the total size of the sets Wij ; as we can reduce each set’s size to n2 /s before running the
e 1 n2 n3 /s) time. For each i, j with |Wij | > n2 /s, let k0 [i, j] be some
hitting set algorithm, this takes O(n
k0 ∈ Wij ∩ H.
For each k0 ∈ H and k ∈ [n2 ], let Lk0 k be the multiset {bk0 j − bkj : j ∈ [n3 ]}, and let Fk0 k be the
elements that have frequency more than n3 /r in Lk0 k . Note that |Fk0 k | ≤ r.
18
– Low-frequency case. For each k0 ∈ H and i ∈ [n1 ] and k ∈ [n2 ], if aik − aik0 6∈ Fk0 k , we examine
each of the at most n3 /r indices j with aik −aik0 = bk0 j −bkj , and add ui xk vj to R if it is a zero-weight
triangle and |Wij | > n2 /s. The number of triangles added to R is O(sne e
1 n2 · n3 /r) = O(n1 n2 n3 /s),
2 e
by choosing r := s . The running time of this step is also bounded by O(n1 n2 n3 /s).
– High-frequency case. For each k0 ∈ H and p ∈ [r], create a subgraph G(k0 ,p) of G:
* For each i ∈ [n1 ] and j ∈ [n3 ], keep edge ui vj iff k0 [i, j] = k0 (in particular, aik0 + bk0 j = cij ).
* For each i ∈ [n1 ] and k ∈ [n2 ], keep edge ui xk iff aik − aik0 is the p-th element of Fk0 k .
* For each j ∈ [n3 ] and k ∈ [n2 ], keep edge xk vj iff bk0 j − bkj is the p-th element of Fk0 k .
Note that if ui xk vj is a triangle in G(k0 ,p) , then aik − aik0 = bk0 j − bkj and aik0 + bk0 j = cij , and by
Fredman’s trick, aik + bkj = aik0 + bk0 j = cij , implying that ui xk vj is a zero-weight triangle. The
running time of this step is bounded by O(sn e 1 n2 + sn2 n3 + rn1 n3 ).
e
The number of subgraphs created is O(sr) e 3 ).
= O(s
Correctness. Consider a zero-weight triangle ui xk vj in G. If |Wij | ≤ n2 /s, the triangle is in R due to
the “few-witnesses” case. So assume |Wij | > n2 /s. Let k0 = k0 [i, j]. We know that aik + bkj = cij =
aik0 + bk0 j and by Fredman’s trick, aik − aik0 = bk0 j − bkj . If aik − aik0 6∈ Fk0 k , then the triangle is
in R due to the “low-frequency” case. Otherwise, it is a triangle in G(k0 ,p) for some p ∈ [r] due to the
“high-frequency” case.
Update edge weights. If the edge weights between U and V change, we only need to rerun the parts
of our algorithms that use the weights c. In particular, we need to rerun the low-frequency case, which
e 1 n2 n3 /s), and the first subcase of the high-frequency case, which has running time
has running time O(n
e
O(rn e 2
1 n3 ) = O(s n1 n3 ). Overall, the running time for updating the edge weights between U and V is
e 1 n2 n3 /s + s2 n1 n3 ).
O(n
5.1 Application 1: Exact Triangle in Preprocessed Universes

As one simple application of the decomposition theorem, we can solve AE-Exact-Tri in truly subcubic
time in a “preprocessed universe” setting, where the input is a subgraph of a preprocessed graph. It is
convenient to define a variant of the problem, AE-Exact-Tri’, which is AE-Exact-Tri where the input graph
is tripartite with three parts U , X, and V , but we only need to report if each edge between U and V is in an
exact triangle. If |U | = |V | = |X|, AE-Exact-Tri’ is equivalent to AE-Exact-Tri.
Corollary 5.2. Given a real-weighted tripartite graph G with n1 , n2 , and n3 nodes in its three parts U , X,
e 1 n2 n3 + sn1 n2 + sn2 n3 + s2 n1 n3 ) time, so that for any
and V , and given s, we can preprocess G in O(n
given subgraph G of G, we can solve AE-Exact-Tri’ on G′ in O(n
′ e 1 n2 n3 /s + sM (n1 , s2 n2 , n3 )) time.
e 3 ) time, we can solve AE-Exact-Tri on
For example, for n1 = n2 = n3 = n, after preprocessing in O(n
e 11/4 ).
G′ in time O(n2.83 ), or if ω = 2, O(n
Proof. During preprocessing, we apply Theorem 5.1 to compute the subgraphs G(λ) and R in O(n e 1 n2 n3 /s+
2
sn1 n2 + sn2 n3 + s n1 n3 ) time.
During a query for a given subgraph G′ and a target value t, if t has changed, we first subtract t from
all the edge weights between U and V to transform the problem to detecting zero-weight triangles. We can
e 1 n2 n3 /s + s2 n1 n3 ) time.
update the G(λ) ’s and R in O(n
Next, for each λ, we check whether after removing edges not present in G′ , the subgraph G(λ) has a
triangle (which would automatically be a zero-weight triangle) through each edge in U × V . Since triangle
e 3 M (n1 , n2 , n3 )).
finding (without weights) reduces to matrix multiplication, the running time is O(s
19
We can do slightly better using the grouping of the subgraphs: for each category Λ, define a tripartite
graph G(Λ) with parts U , X × Λ, and V , and for each λ ∈ Λ, include an edge between u and (x, λ) if
ux ∈ G(λ) ∩ G′ , and between (x, λ) and v if xv ∈ G(λ) ∩ G′ . For each category Λ, we check whether the
subgraph G(Λ) has a triangle through each edge. The running time becomes O(sMe (n1 , s2 n2 , n3 )). Also, the
term O(s2 n1 n3 ) for the update cost can be lowered to O(n1 n3 ) now when working with the G(Λ) ’s instead
of the G(λ) ’s, as can be checked from our construction (since each edge ui vj occurs in one category).
For n1 = n2 = n3 = n, we choose s = n0.17+ε (using the fact that ω(1, 1.34, 1) < 2.657 [LU18]), or
if ω = 2, s = n1/4 .
5.2 Application 2: 3SUM in Preprocessed Universes

The above implies also a new algorithm for 3SUM with a preprocessed universe, previously studied by
Chan and Lewenstein [CL15], who obtained O(n e 13/7 ) query time, after preprocessing in O(n e 2 ) randomized
e
time or O(n ω ) deterministic time. Our query time is strictly better if ω is 2 (or is sufficiently close to 2), and
e 2 ) preprocessing time regardless of ω.
we also obtain deterministic O(n
e 2 ) deterministic time, so that
Corollary 5.3. We can preprocess sets A, B, C of n integers in [nO(1) ] in O(n
given any subsets A ⊆ A, B ⊆ B, and C ⊆ C, we can solve All-Nums-3SUM on (A′ , B ′ , C ′ ) in time
′ ′ ′
e 11/6 ).
O(n1.891 ), or if ω = 2, O(n
Proof. We use a known reduction from (All-Nums-) 3SUM to (All-Nums-) 3SUM-Convolution (one of
the reductions by Chan and He [CH20] is deterministic and increases running time only by polylogarith-
mic factors when the input numbers are polynomially bounded), in combination with a known reduction
from All-Nums-3SUM-Convolution to AE-Exact-Tri [VW09]. The problem is reduced to O(1) e instances
of the problem in Corollary 5.2 with n1 = n/q, n2 = q and n3 = n for a parameter q (the original re-
√
duction [VW09] has q = n, but we will do better with a different choice of q). It is straightforward
to check that these reductions carry over to the preprocessed universe setting. We then obtain prepro-
e 2 + snq + s2 n2 /q) = O(n
cessing time O(n e 2 + s3 n1.5 ) and query time O(n
e 2 /s + sM (n/q, s2 q, n)) =
√ √ √
e /s + sM (s n, s n, n)) by setting q = n/s. We choose s = n0.109+ε (using the fact that
2
O(n
ω(0.609, 0.609, 1) < 1.781 [LU18]), or if ω = 2, s = n1/6 .
Remark 5.4. Corollary 5.2 and Corollary 5.3 can also be used to solve #AE-Exact-Tri and #All-Nums-3SUM
in the preprocessed universe. This is because Theorem 5.1 provides a decomposition, so we can sum up the
counts in all the cases (In Corollary 5.3, we also have to be careful when applying the reduction in [CH20],
to make sure we do not over count, by using inclusion-exclusion). Prior method for 3SUM in the prepro-
cessed universe [CL15] cannot compute counts, because it relies on the BSG theorem, which only provides
a covering.
In Section 10.2, we will describe a still better solution to 3SUM in preprocessed universes, with ran-
domization, using FFT instead of fast matrix multiplication.
5.3 Application 3: A Deterministic 3SUM Algorithm for Bounded Monotone Sets

Another interesting application is the following:
Corollary 5.5. Given monotone sets A, B, C ⊆ [n]d for any constant d ≥ 2, we can solve All-Nums-3SUM
on A, B, C in O(n2−1/O(d) ) deterministic time.
Proof. Chan and Lewenstein [CL15] observed that 3SUM for bounded monotone sets in any constant
dimension d reduces to 3SUM for “clustered” sets, which in turn reduces to 3SUM with preprocessed
20
universes on O(n/ℓ) elements and O(ℓ3d ) queries. Using Corollary 5.3 gives total deterministic time
e
O((n/ℓ) e 2−0.218/(3d+0.109) ) by setting ℓ = n0.109/(3d+0.109) . (The ex-
2 + ℓ3d (n/ℓ)1.891 ), which is O(n
ponent here is certainly improvable, by solving the problem using our techniques more directly, instead of
applying a black-box reduction to 3SUM with preprocessed universes.)
The above result provides the first truly subquadratic deterministic algorithm for bounded monotone
3SUM in arbitrary constant dimensions—Chan and Lewenstein [CL15] gave subquadratic randomized al-
gorithms with O(n2−1/(d+O(1)) ) running time, but they had nontrivial deterministic algorithms only for
d ≤ 7 under the current matrix multiplication bounds.
We can also apply the triangle decomposition theorem to obtain subquadratic algorithms for monotone or
bounded-difference Min-Plus convolution (which were first obtained by Chan and Lewenstein [CL15], and
followed by [CDXZ22]), and subcubic algorithms for monotone or bounded-difference Min-Plus products
(which were first obtained by Bringmann et al. [BGSV16], and followed by [VX20b, GPVX21, CDX22,
CDXZ22]). Since previous algorithms have been found for these problems, we will omit the details here
and refer to Appendix B.
The main message is that many of the results in Chan and Lewenstein’s paper can be obtained alterna-
tively using our decomposition theorem, which is simpler and more elementary than the BSG Theorem, if
we are interested in subquadratic algorithms but don’t care about the precise values in the exponents. The
advantage is simplicity—additive combinatorics is not needed after all! (However, the BSG Theorem is still
potentially useful in optimizing those exponents.)
5.4 Application 4: A Truly Subquartic #APSP Algorithm

As another simple, interesting application of the triangle decomposition theorem, we obtain the first
truly subquartic algorithm for #APSP for arbitrary weighted graphs:
Theorem 5.6. #APSP for n-node graphs with positive edge weights has an algorithm running in O(n3.83 )
e 15/4 ) time.
time, or if ω = 2, O(n
define the following “funny” matrix product ⊗: if (C, C ′ ) = (A, A′ ) ⊗ (B, B ′ ), then C = A ⋆ B
Proof. We P
and Cij = k∈[n]:Cij =Aik +Bkj A′ik Bkj
′ ′ .
Claim 5.7. Let (A, A′ ) and (B, B ′ ) be two pairs of n×n matrices where the entries of A′ and B ′ are (large)
e 3 + ℓn2.83 ), or if ω = 2, O(n
ℓ-bit integers. Then we can compute (A, A′ ) ⊗ (B, B ′ ) in time O(n e 3 + ℓn11/4 ).
Proof. First compute C = A ⋆ B naively in O(n3 ) time. Initialize the entries of C ′ to 0. Consider the
tripartite graph with nodes {ui : i ∈ [n]}, {xk : k ∈ [n]}, and {vj : j ∈ [n]}, where ui xk has weight Aik ,
and xk vj has weight Bkj , and ui vj has weight −Cij . Apply Theorem 5.1 to obtain subgraphs G(λ) and a
e 3 + s2 n2 ) time.
set R in O(n
We first examine each triangle ui xk vj ∈ R and add A′ik Bkj e 3 /s) time.
′ to C ′ . This takes O(ℓn
ij
P
Next, for each λ, we compute k [ui xk ∈ G(λ) ]A′ik · [xk vj ∈ G(λ) ]Bkj ′ and add it to C ′ for every
ij
(λ) e ω
ui vj ∈ G . This reduces to a standard matrix product on ℓ-bit integers and takes O(ℓn ) time for each
e · s3 nω ). As before, we can do slightly better using the grouping of the subgraphs,
λ. The total time is O(ℓ
which improve the running time to O(ℓe · sM (n, s2 n, n)).
As in Corollary 5.2, we choose s = n0.17+ε , or if ω = 2, s = n1/4 .
i
Given an input graph G with positive edge weights, let D (=2 ) be the distance matrix for paths of
i
(unweighted) length exactly 2i , and D ′(=2 ) be the number of paths of (unweighted) length exactly 2i that
21
i i
match the distance in D (=2 ) . Similarly, we define D (<2 ) as the distance matrix for paths of (unweighted)
i
length less 2i , and define D ′(<2 ) similarly.
For i = 0, it is easy to see that D (=1) is exactly the weight matrix of G, and D ′(=1) is the adjacency
matrix of G. Also, D (<1) is the matrix whose diagonal entries are all 0, and other entries are all ∞. Finally
D ′(<1) equals the n × n identity matrix.
For i > 0, we can use the following recurrences:
i i i−1 ) i−1 ) i−1 ) i−1 )
(D (=2 ) , D ′(=2 ) ) = (D (=2 , D ′(=2 ) ⊗ (D (=2 , D ′(=2 ),
(<2i ) ′(<2i ) (<2i−1 ) ′(<2i−1 ) (=2i−1 ) ′(=2i−1 )
(D ,D ) = (D ,D ) ⊗ (D ,D ).
It is not difficult, though a bit tedious, to verify the correctness of these recurrences.
i
The matrix D ′(<2 ) gives the result for #APSP when 2i > n. Therefore, #APSP reduces to O(log n)
instances of the funny product ⊗, when the matrices A′ and B ′ are O(n)-bit e numbers. Then applying
Claim 5.7 with ℓ = O(n) e yields the theorem.
In Section 10, we will return to the BSG Theorem and describe more variants and applications.
6 More Lower Bounds under the Strong APSP Hypothesis

Continuing the approach in Section 3, we now derive conditional lower bounds for more problems
with intermediate complexity from the Strong APSP Hypothesis (which as noted before is equivalent to the
hypothesis that M ∗ (n, n, n | n3−ω ) is not truly subcubic). We begin with a useful lemma:
Lemma 6.1. For any positive constants β, γ, c with 0 < γ < β, if M ∗ (n, nβ , n | ncβ ) = O(n2+β−ε ), then
M ∗ (n, nγ , n | ncγ ) = O(n2+γ−Ω(ε) ).
Proof.
M ∗ (n, nγ , n | ncγ ) ≤ O(n2(1−γ/β) M ∗ (nγ/β , nγ , nγ/β | ncγ ))
e 2(1−γ/β) · (nγ/β )2+β−ε ) = O(n2+γ−Ω(ε) ).
≤ O(n

Lemma 6.1 allows us to remove the assumption β ≤ (3 − ω)/2 in Corollary 3.3 (since we can replace
β with any sufficiently small positive constant γ). Consequently, Corollary 3.5 holds for all β ≤ 1/3,
regardless of the value of ω. For convenience, we repeat the statement of Corollary 3.3 below, with the
assumption removed:
Corollary 6.2. For any constant β > 0, if M ∗ (n, nβ , n | n2β ) = O(n2+β−ε ), then M ∗ (n, n, n | n3−ω ) =
O(n3−Ω(ε) ).
Remark 6.3. By applying Lemma 6.1 with β = 1, γ = 1/2, and c = 1, we see that M ∗ (n, n, n |
√ √
n) = O(n3−ε ) implies M ∗ (n, n, n | n) = O(n5/2−Ω(ε) ). If ω = 2, Chan, Vassilevska W. and
√ √
Xu [CVX21] showed that the u-dir-APSP Hypothesis is equivalent to the claim that M ∗ (n, n, n | n)
is not in O(n5/2−ε ) for any ε > 0. Consequently, the u-dir-APSP Hypothesis implies the Strong APSP
Hypothesis when ω = 2.
Remark 6.4. In the definition of the Strong APSP Hypothesis, it does not matter whether the input graph is
undirected or directed—the directed version is also equivalent to the statement that M ∗ (n, n, n | n3−ω ) is
not truly subcubic: Directed APSP for integer weights in [n3−ω ] can be solved by Zwick’s algorithm [Zwi02,
22
e
CVX21] in time O(max ∗
ℓ M (n, n/ℓ, n | ℓn
3−ω )). If M ∗ (n, n, n | n3−ω ) = O(n3−ε ), then for ℓ ≤ nδ , we
e
have M ∗ (n, n/ℓ, n | ℓn3−ω ) ≤ O(M ∗ (n, n, n | n3−ω+δ ) ≤ O(n(3−ε)(3−ω+δ)/(3−ω) ) = O(n3−Ω(ε) ) for a
sufficiently small δ. On the other hand, for ℓ > nδ , we trivially have M ∗ (n, n/ℓ, n | ℓn3−ω ) ≤ O(n3 /ℓ) =
O(n3−δ ).
6.1 Min-Witness Product

Let MIN - WITNESS - PROD (n) be the time complexity of computing min-witness product for two n × n
Boolean matrices. More generally, let MIN - WITNESS - PROD (n1 , n2 , n3 ) be the time complexity of Min-Witness-Prod
for an n1 × n2 Boolean matrix and an n2 × n3 Boolean matrix. We observe that Min-Plus product for rectan-
gular matrices of sufficiently small inner dimensions and integer ranges can be reduced to Min-Witness-Prod:
Lemma 6.5. For any x, y, we have M ∗ (n, x, n | y) = O(MIN - WITNESS - PROD (n, xy 2 , n)).
Proof. Suppose that we are given an n × x matrix A and an x × n matrix B where all matrix entries are in
[y]. For each i ∈ [n], k ∈ [x], and u, v ∈ [y], let A′i,(k,u,v) = 1 if Aik = u, and A′i,(k,u,v) = 0 otherwise. For
′
each j ∈ [n], k ∈ [x], u, v ∈ [y], let B(k,u,v),j ′
= 1 if Bkj = v, and B(k,u,v),j = 0 otherwise. The number of
2
triples τ = (k, u, v) is xy ≤ n. By sorting the triples in increasing order of u + v, the Min-Plus product of
A and B can be computed from the Min-Witness product of (A′i,τ )i∈[n],τ ∈[x]×[y]2 and (Bτ,j ′ )
τ ∈[x]×[y]2 ,j∈[n] .

Corollary 6.6. If MIN - WITNESS - PROD (n, n5β , n) = O(n2+β−ε ), then M ∗ (n, n, n | n3−ω ) = O(n3−Ω(ε) ).
By setting β = 1/5, we have thus proved that Min-Witness-Prod of two n × n Boolean matrices cannot
be computed in O(n11/5−ε ) time under the Strong APSP Hypothesis. In particular, if ω = 2, this implies
that Min-Witness-Prod is strictly harder than Boolean matrix multiplication.
Furthermore, by setting β = γ/5, Min-Witness-Prod of an n × nγ and an nγ × n Boolean matrix
cannot be computed in O(n2+γ/5−ε ) time for any γ under the same hypothesis. This implies that for any
γ ≤ 0.3138, Min-Witness-Prod is strictly harder than Boolean matrix multiplication, as ω(1, 0.3138, 1) =
2 [LU18]. This result interestingly rules out the possibility that the polynomial method [Wil18, AVY15,
CW21] could be used to transform the Min-Witness product of an n × d and a d × n matrix into a stan-
dard product of an n × dO(1) and a dO(1) × n matrix. It also contrasts Min-Witness-Prod with, for
example, Dominance-Product or Equality-Product, which does have near-quadratic time complexity
when the inner dimension d is smaller than n0.1569 (since as mentioned in Section 3.1, Matoušek’s tech-
e
nique [Mat91, Yus09] yields a time bound of O(min 2 e 2 2
r (dn /r + M (n, dr, n)) ≤ O(n + M (n, d , n))).
6.2 All-Pairs Shortest Lightest Paths

Let UNDIR - APSLP 1,2 (n, m) be the time complexity of undir-APSLP1,2 on graphs with n nodes and m
edges. We observe the following:
Lemma 6.7. For any x, y, we have M ∗ (n, x, n | y) = O(UNDIR - APSLP 1,2 (n, nx)) if xy 2 ≤ n.
[y]. We construct an undirected graph as follows:
• For each i ∈ [n], create a node s[i].
• For each k ∈ [x] and u ∈ [y], create a node w1 [k, u].
23
• For each k ∈ [x], create a node w2 [k].
• For each k ∈ [x] and v ∈ [y], create a node w3 [k, v].
• For each j ∈ [n], create a node t[j].
• For each i ∈ [n] and k ∈ [x], create an edge s[i]w1 [k, u] of weight 1 where u = aik .
• For each k ∈ [x] and u ∈ [y], create a path between w1 [k, u] and w2 [k] that has u edges of weight 2 and
y − u edges of weight 1 (so that the path has y edges and weight y + u).
• For each k ∈ [x] and v ∈ [y], create a path between w2 [k] and w3 [k, v] that has v edges of weight 2 and
y − v edges of weight 1 (so that the path has y edges and weight y + v).
• For each j ∈ [n] and k ∈ [x], create an edge w3 [k, v]t[j] of weight 1 where v = bkj .
The number of nodes in the graph is O(n+xy 2 ) = O(n), and the number of edges is O(nx+xy 2 ) = O(nx).
For each i, j ∈ [n], all lightest paths from s[i] to t[j] have 2y + 2 edges, and among them, the shortest has
weight 2y + 2 + mink (aik + bkj ).
Corollary 6.8. For any constant β ≤ 1/5, if UNDIR - APSLP 1,2 (n, n1+β ) = O(n2+β−ε ), then M ∗ (n, n, n |
n3−ω ) = O(n3−Ω(ε) ).
By setting β = 1/5, we have thus proved that undir-APSLP1,2 cannot be solved in O(n11/5−ε ) time
under the Strong APSP Hypothesis. If ω = 2, this implies that undir-APSLP1,2 is strictly harder than
undirected APSP for such weighted graphs. (The current best algorithm for undir-APSLP1,2 has running
e 2.58 ), or if ω = 2, O(n
time O(n e 5/2 ) [CVX21].) The same result holds for undir-APLSP1,2.
Furthermore, undir-APSLP1,2 with n1+β edges cannot be solved in O(n2+β−ε ) time for any β ≤ 1/5
under the same hypothesis. Thus, the naive algorithm is essentially optimal for sufficiently sparse graphs.
The same results hold for the similar problem of undir-APLSP1,2 (in the proof of Lemma 6.7, we just
modify the path from w1 [k, u] and w2 [k] to use y − u edges of weight 2 and 2u edges of weight 1, so that
the path has y + u edges and weight 2y).
6.3 Batched Range Mode

For a different application about data structures, we next consider the range mode problem, which has
received considerable attention and has been extensively studied in the literature [KMS05, CDL+ 14, VX20b,
GH22, JX22]. Let BATCHED - RANGE - MODE (N, Q | σ) be the total time complexity of Batch-Range-Mode
for Q range mode queries in an array of N elements in [σ].
Lemma 6.9. For any x, y, we have M ∗ (n, x, n | y) = O(BATCHED - RANGE - MODE (nxy, n2 | x)).
[y]. We create an array holding a string S over the alphabet [x], defined as follows:
• Let σi = 1Ai1 2Ai2 · · · xAix and σi′ = 1y−Ai1 2y−Ai2 · · · xy−Aix , which have length O(xy).
• Let τj = 1B1j 2B2j · · · xBxj and τi′ = 1y−B1j 2y−B2j · · · xy−Bxj , which have length O(xy).
• Let S = σn σn′ · · · σ2 σ2′ σ1 σ1′ τ1′ τ1 τ2′ τ2 · · · τn′ τn , which has length O(nxy).
24
For each i, j ∈ [n], consider the substring Sij = σi′ σi−1 σi−1
′ · · · σ1 σ1′ τ1′ τ1 · · · τj−1
′ τ ′
j−1 τj . For each
k ∈ [x], the frequency of k in Sij is precisely iy + jy − Aik − Bkj . Thus, the mode of Sij is an index k
minimizing Aik + Bkj . So, the Min-Plus product can be computed by answering O(n2 ) range mode queries
on S.
Corollary 6.10. For any constant β, if BATCHED - RANGE - MODE (n1+3β , n2 | nβ ) = O(n2+β−ε ), then
M ∗ (n, n, n | n3−ω ) = O(n3−Ω(ε) ).
√
By setting β = 1/3 and n = N , we have thus proved that Batch-Range-Mode for N queries on
N elements cannot be solved in O(N 7/6−ε ) time under the Strong APSP Hypothesis. Previously, Chan et
al. [CDL+ 14] gave a reduction from Boolean matrix multiplication implying a better, near-N 3/2 conditional
lower bound for combinatorial algorithms (under the Combinatorial BMM Hypothesis); this matched upper
bounds of known combinatorial algorithms [KMS05, CDL+ 14]. However, for noncombinatorial algorithms,
their lower bound was near N ω/2 , which is trivial if ω = 2. The distinction between combinatorial vs.
noncombinatorial algorithms is especially important for the range mode problem, as it is actually possible to
beat N 3/2 using fast matrix multiplication, as first shown by Vassilevska W. and Xu [VX20b]. The current
fastest algorithm by Gao and He [GH22] runs in O(N 1.4797 ) time. Our new lower bound reveals that there
is a limit on how much fast matrix multiplication can help.
(For still more recent work on range mode, see [JX22] for a conditional lower bound for the dynamic
version of the range mode problem, but again this is only for combinatorial algorithms.)
Furthermore, by using the fact that BATCHED - RANGE - MODE (n1+3β , n2 | nβ ) ≤ O(n1−3β )·
BATCHED - RANGE - MODE (n1+3β , n1+3β | nβ ) and setting N = n1+3β and γ = β/(1 + 3β), we see that
Batch-Range-Mode for N queries on N elements in a universe of size σ = N γ cannot be answered in
O(N 1+γ−ε ) time for any γ ≤ 1/6, under the same hypothesis. This lower bound is tight, since an O(N σ)
upper bound is known [CDL+ 14]. √ √
Furthermore, by setting n = Q and nβ = (N/ Q)1/3 , Batch-Range-Mode for Q queries on N
elements cannot be solved in O(Q5/6 N 1/3−ε ) time for any Q ≤ N 2 under the same hypothesis. For
example, for Q = N 1.6 , the lower bound is near N 1.666 (in other words, we need at least N 0.066 time
per query). In contrast, the previous
√ reduction
√ √ by Chan et al. [CDL+ 14] from Boolean matrix multipli-
cation gives a lower bound of M ( Q, N/ Q, Q), which is only near linear in Q √ when Q = N 1.6 , as
ω(0.8, 0.2, 0.8) = 1.6. (Known combinatorial algorithms have running time near O( QN ) as a function
of N and Q [KMS05, CDL+ 14].)
The same results hold for the similar problem of range minority [CDSW15] (finding a least frequent
element in a range).
6.4 Dynamic Shortest Paths in Unweighted Planar Graphs

As another example of an application to data structure problems, we now consider the dynamic shortest
path problem for unweighted planar graphs. Let PLANAR - DYN - SP (N, Q, U ) be the time complexity of
performing an offline sequence of Q shortest path distance queries and U edge updates on an unweighted,
undirected planar graph with N nodes.
Lemma 6.11. For any α, β ≤ 1,
M ∗ (n, nβ , n | X) = O(n1−α · PLANAR - DYN - SP ((n2α+β + nα+2β )X, n1+α , n1+β )).
25
Proof. Abboud and Dahlgaard [AD16, Proof of Theorem 1] reduced the computation of the Min-Plus prod-
uct of an n × nβ and an nβ × nα matrix with entries from [X], to the problem of performing an offline
sequence of O(n1+α ) shortest path queries and O(n1+β ) edge-weight changes on a weighted planar graph
with O(nα+β ) nodes. In their graph construction, all edges have integer weights bounded by O(nα X),
except for O(nβ ) edges having integer weights bounded by O(nα+β X). (The X factor was stated as X 2 in
their paper, but as they remarked at the end of their Section 2, X 2 can be lowered to X + 1.) The weighted
graph can be turned into an unweighted graph, simply by subdividing each edge. More precisely, we create a
path πe of length ℓ for an edge e with weight upper-bounded by ℓ. Whenever the weight of e changes, we can
redirect an endpoint of e to an appropriate node in the path πe , using O(1) updates in this unweighted graph.
The resulting unweighted planar graph has O((n2α+β + nα+2β )X) nodes. Hence, Abboud and Dahlgaard’s
reduction implies that M ∗ (n, nβ , nα | X) = O(PLANAR - DYN - SP ((n2α+β + nα+2β )X, n1+α , n1+β )).
The lemma then follows, as M ∗ (n, nβ , n | X) = O(n1−α · M ∗ (n, nβ , nα | X)).
Combining Corollary 6.2 and Lemma 6.11 (with α = β) immediately gives the following:
Corollary 6.12. For any constant β, if PLANAR - DYN - SP (n5β , n1+β , n1+β ) = O(n1+2β−ε ), then M ∗ (n, n, n |
n3−ω ) = O(n3−Ω(ε) ).
By setting β = 1/4 and N = n5/4 , we have thus proved that an offline sequence of N shortest
path queries and N updates on an unweighted, undirected N -node planar graph cannot be processed in
O(N 6/5−ε ) time, under the Strong APSP Hypothesis. This rules out the existence of data structures with
N o(1) time per operation. Abboud and Dahlgaard [AD16] proved a better lower bound near N 3/2 in the
weighted case under the APSP Hypothesis, or near N 4/3 in the unweighted case under the OMv Hypoth-
esis [HKNS15], but the latter bound under the OMv Hypothesis holds only for online queries and updates,
when considering general noncombinatorial algorithms. We have obtained the first conditional lower bounds
for the unweighted case that hold in the offline setting.
Gawrychowski and Janczewski [GJ21] have adapted Abboud and Dahlgaard’s technique to prove condi-
tional lower bounds for certain dynamic data structure versions of the longest increasing subsequence (LIS)
problem. In the unweighted case, their reduction was again based on the OMv Hypothesis and applicable
only for the online setting. Our approach should similarly yield new conditional lower bounds in the offline
setting for their problem.
The preceding applications are not meant to be exhaustive, but the applications to Batch-Range-Mode
and dynamic planar shortest paths should suffice to illustrate the potential usefulness of our technique to (un-
weighted) data structure problems in general. A common way to obtain conditional lower bounds for such
data structure problems is via reduction from Boolean matrix multiplication, which is useful only for com-
binatorial algorithms, or via the OMv Hypothesis, which is only for online settings. Our technique provides
a new avenue, allowing us to obtain (weaker, but still nontrivial) lower bounds for general noncombinatorial
algorithms in offline or batched settings: namely, it suffices to reduce from rectangular Min-Plus products
when the inner dimension and the integer range are both small.
6.5 Min-Witness Equality Product

Lastly, we revisit the Min-Witness-Eq-Prod problem. Let MIN - WITNESS - EQ - PROD (n) be the time
complexity of Min-Witness-Eq-Prod for n × n matrices. Chan, Vassilevska W., and Xu [CVX21] showed
e MIN - WITNESS - EQ - PROD (n)), so we immediately obtain a near-n7/3 lower bound
that U - DIR - APSP (n) = O(
for Min-Witness-Eq-Prod under the Strong APSP Hypothesis. However, the following corollary gives an
alternative lower bound, which is worse if ω = 2, but is better if the current bound of ω turns out to be close
to tight. Note that (2ω + 5)/4 is strictly larger than ω for all ω ∈ [2, 2.373).
26
Corollary 6.13. If MIN - WITNESS - EQ - PROD (n) = O(n(2ω+5)/4−ε ), then M ∗ (n, n, n | n3−ω ) = O(n3−Ω(ε) ).
Proof. Let GEN - EQ - PROD (n1 , n2 , n3 | ℓ) be the time complexity of the generalized equality product prob-
lem in Lemma 3.1. Let EQ - PROD (n1 , n2 , n3 ) be the time complexity of ∃Equality-Product, which is
a variant of Equality-Product where we only need to determine if each of the outputs of the standard
Equality-Product is nonzero or not. It is easy to see that GEN - EQ - PROD (n1 , n2 , n3 | ℓ) ≤ O(ℓ2 ·
EQ - PROD (n1 , n2 , n3 )).
The proof of Theorem 3.2 shows that for any s ≤ n2 and t ≤ ℓ,

e (n2 /s)M ∗ (n1 , s, n3 | t) + s · GEN - EQ - PROD (n1 , n2 , n3 | ℓ/t) .
M ∗ (n1 , n2 , n3 | ℓ) = O
Consequently, for any t ≤ n3−ω ,

e (n/s)M ∗ (n, s, n | t) + s(n3−ω /t)2 · EQ - PROD (n, n, n) .
M ∗ (n, n, n | n3−ω ) = O
Set t = n/s. Chan, Vassilevska W. and Xu [CVX21] gave a reduction showing that M ∗ (n, s, n | n/s) =
O(MIN - WITNESS - EQ - PROD (n)). Trivially, EQ - PROD (n, n, n) = O(MIN - WITNESS - EQ - PROD (n)). It fol-
lows that M ∗ (n, n, n | n3−ω ) = O((n/s + s3 /n2ω−4 ) · MIN - WITNESS - EQ - PROD (n)). The result follows
by setting s = n(2ω−3)/4 .
7 Lower Bounds under the u-dir-APSP Hypothesis

We can similarly apply our key reduction in Theorem 3.2 to obtain (better) lower bounds under the
u-dir-APSP Hypothesis, using the following corollary:
Corollary 7.1. Let ρ be such that ω(1, ρ, 1) = 1 + 2ρ. Fix any constant σ > ρ, and let κ = ω(1,σ,1)−1−2ρ
σ−ρ .
∗ β
For any constant β, if M (n, n , n | n (1+κ)β ) = O(n 2+β−ε ), then U - DIR - APSP (n) = O(n 2+ρ−Ω(ε) ).
Proof. Chan, Vassilevska W. and Xu [CVX21] have shown that U - DIR - APSP (n) = O(n2+ρ−Ω(ε) ) is equiv-
alent to M ∗ (n, nρ , n | n1−ρ ) = O(n2+ρ−Ω(ε) ).
By Theorem 3.2,

M ∗ (n, nρ , n | n1−ρ ) = O e (nρ /s)M ∗ (n, s, n | t) + sn2+ρ /r + (sn1−ρ /t)M (n, rnρ , n) .
′
Setting s = nβ−ε , t = n(1+κ)β , and r = nβ with ε′ = ε/2 yields

e nρ−β+ε′ M ∗ (n, nβ , n | n(1+κ)β ) + n2+ρ−ε′ + n1−ρ−κβ+ω(1,ρ+β,1)−ε′ ,
M ∗ (n, nρ , n | n1−ρ ) = O
e 2+ρ−Ω(ε) ), since ω(1, ρ + β, 1) ≤ ω(1, ρ, 1) + ω(1,σ,1)−ω(1,ρ,1) β = 1 + 2ρ + κβ, by convexity

which is O(n σ−ρ
of ω(1, ·, 1).
The above assumes (1 + κ)β ≤ 1 − ρ and β ≤ σ − ρ, but this assumption may be removed, since
Lemma 6.1 allows us to replace β with any sufficiently small positive constant γ.
We pick σ = 0.85. By known bounds [LU18], ω(1, 0.85, 1) < 2.258317. Since ρ ≥ 0.5, we have
κ ≤ 1.258317−2ρ
0.85−ρ < 0.7381. If ω = 2, then κ = 0.
27
7.1 Min-Witness Product
Corollary 7.2. Let ρ and κ be as in Corollary 7.1. For any constant β, if MIN - WITNESS - PROD (n, n(3+2κ)β , n) =
O(n2+β−ε ), then U - DIR - APSP (n) = O(n2+ρ−Ω(ε) ).
By setting β = 1/(3 + 2κ) (which is 1/3 if ω = 2, or < 0.223 regardless), we have thus proved that
Min-Witness-Prod of two n × n Boolean matrices cannot be computed in O(n7/3−ε ) time if ω = 2, or
O(n2.223 ) time regardless of the value of ω, under the u-dir-APSP Hypothesis. (This is better than the near-
n11/5 lower bound we obtained from the Strong APSP Hypothesis.) The question of proving lower bounds
for Min-Witness-Prod from the u-dir-APSP Hypothesis was left open in the paper by Chan, Vassilevska
W. and Xu [CVX21] (they were only able to do so for Min-Witness-Eq-Prod).
Furthermore, by setting β = γ/(3 + 2κ), Min-Witness-Prod of an n × nγ and an nγ × n Boolean
matrix cannot be computed in O(n2+0.223γ−ε ) time for any γ ≤ 1 under the same hypothesis.
7.2 All-Pairs Shortest Lightest Paths

1
Corollary 7.3. Let ρ and κ be as in Corollary 7.1. For any constant β ≤ 3+2κ and UNDIR - APSLP 1,2 (n, n1+β ) =
O(n2+β−ε ), then U - DIR - APSP (n) = O(n2+ρ−Ω(ε) ).
By setting β = 1/(3 + 2κ), we have thus proved that undir-APSLP1,2 cannot be solved in O(n7/3−ε )
time if ω = 2, or O(n2.223 ) time regardless of the exact value of ω, under the u-dir-APSP Hypothesis. (This
is better than the near-n11/5 lower bound we obtained from the Strong APSP Hypothesis.) The same result
holds for undir-APLSP1,2. Previously, Chan, Vassilevska W. and Xu [CVX21] proved a still better near-nρ
lower bound for {0, 1}-weighted undir-APLSP from the same hypothesis, but their proof crucially relied on
zero-weight edges and also did not work for undir-APSLP (leaving open the question of finding nontrivial
conditional lower bounds for both undir-APLSP1,2 and undir-APSLP1,2, which we answer here).
7.3 Batched Range Mode

By combining Chan, Vassilevska W. and Xu’s observation e
that U - DIR - APSP (n) = O(max ∗
ℓ M (n, n/ℓ, n |
√
ℓ)) [CVX21, Zwi02] with Lemma 6.9, and setting n = N , we see that Batch-Range-Mode for N queries
on N elements cannot be solved in O(N 5/4−ε ) time if ω = 2, or in O(N 1+ρ/2−ε ) time regardless, under
the u-dir-APSP Hypothesis.
To obtain a lower bound for general Q, we combine Corollary 7.1 and Lemma 6.9:
Corollary 7.4. Let ρ and κ be as in Corollary 7.1. For any constant β, if BATCHED - RANGE - MODE (n1+(2+κ)β , n2 |
nβ ) = O(n2+β−ε ), then U - DIR - APSP (n) = O(n2+ρ−Ω(ε) ).
√ √
Thus, by setting n = Q and nβ = (N/ Q)1/(2+κ) , Batch-Range-Mode of Q queries on N elements
cannot be answered in O(Q3/4 N 1/2−ε ) time if ω = 2, or O(Q1−0.365/2 N 0.365−ε ) time regardless, for any
Q ≤ N 2 , under the u-dir-APSP Hypothesis. For example, for Q = N 1.6 , the lower bound is near N 1.7
if ω = 2, or near N 1.673 regardless. (This is slightly better than the lower bound we obtained in previous
section from the Strong APSP Hypothesis.)
28
7.4 Dynamic Shortest Paths in Planar Graphs
Combining Corollary 7.1 and Lemma 6.11 with α = β immediately gives the following:
Corollary 7.5. Let ρ and κ be as in Corollary 7.1. For any constant β, if PLANAR - DYN - SP (n(4+κ)β , n1+β , n1+β ) =
O(n1+2β−ε ), then U - DIR - APSP (n) = O(n2+ρ−Ω(ε) ).
By setting β = 1/(3 + κ) and N = n1+β , we have thus proved that an offline sequence of N shortest
path queries and N updates on an unweighted, undirected N -node planar graph cannot be processed in
O(N 5/4−ε ) time if ω = 2, of O(N 1.211 ) time regardless, under the u-dir-APSP Hypothesis. (This is slightly
better than the lower bound we obtained in previous section from the Strong APSP Hypothesis.)
7.5 An Equivalence Result

We also obtain an interesting equivalence result:
Corollary 7.6. Let α be such that ω(1, α, 1) = 2. For any constants β, γ ∈ (0, α), there exists ε > 0 such
that M ∗ (n, nβ , n | nβ ) = O(n2+β−ε ) if and only if there exists ε′ > 0 such that M ∗ (n, nγ , n | nγ ) =
′
O(n2+γ−ε ).
Proof. W.l.o.g., assume γ < β. The “only if” direction is shown in Lemma 6.1. For the “if” direction,
′
suppose M ∗ (n, nγ , n | nγ ) = O(n2+γ−ε ). By Theorem 3.2,

M ∗ (n, nβ , n | nβ ) = Oe (nβ /s)M ∗ (n, s, n | t) + sn2+β /r + (snβ /t)M (n, rnβ , n) .
′
Setting s = nγ−ε , t = nγ , and r = nγ , with ε = ε′ /2 yields M ∗ (n, nβ , n | nβ ) = O(n2+β−Ω(ε ) ),
assuming that γ ≤ α − β.
This assumption may be removed, since Lemma 6.1 allows us to replace γ with a sufficiently small
positive constant γ ′ .
Corollary 7.7. If ω = 2, then for any constant β ∈ (0, 1), there exists ε > 0 such that U - DIR - APSP (n) =
e 2+β−ε′ ).
O(n2.5−ε ) iff there exists ε′ > 0 such that M ∗ (n, nβ , n | nβ ) = O(n
Proof. If ω = 2, then ρ = 1/2 and then Chan, Vassilevska W. and Xu’s result [CVX21] showed that
√ √ ′
U - DIR - APSP (n) = O(n2.5−ε ) for some ε > 0 is equivalent to M ∗ (n, n, n | n) = O(n2.5−ε ) for some
ε′ > 0. Since ω = 2 implies α = 1, we can apply the preceding corollary for any β ∈ (0, 1) and γ = 1/2.

Let Φ(β) be the claim that M ∗ (n, nβ , n | nβ ) is not in O(n2+β−ε ) for any ε > 0. If ω = 2, Φ(1)
is just the Strong APSP Hypothesis, but intriguingly, by the above corollary, Φ(0.99) is equivalent to the
u-dir-APSP Hypothesis, which has given us strictly better conditional lower bound results.
8 More Equivalences Between Counting and Detection Problems

Continuing the approach in Section 4 for proving equivalence between #AE-Exact-Tri and AE-Exact-Tri,
we now derive more equivalence results between other counting and detection problems.
29
8.1 Min-Plus Product
In this section, we use A and B to denote the inputs to a Min-Plus-Product or #Min-Plus-Product
instance, and we use Wij to denote the set of k where Aik + Bkj = (A ⋆ B)ij , i.e., the set of witnesses for
(i, j).
e
Lemma 8.1. Given two n × n matrices A, B and a subset S ⊆ [n], we can compute a matrix D in O(|S| ·
n (3+ω)/2 ) time such that Dij = |Wij | for pairs of (i, j) where S ∩ Wij 6= ∅.
(s) (s)
Proof. For every s ∈ S, we do the following. Let A(s) be a matrix where Aik = Aik − Ais and Bkj =
e (3+ω)/2 ) time for each s
Bsj − Bkj . Then we compute the equality product C (s) of A(s) and B (s) in O(n
(s)
using Matoušek’s algorithm [Mat91]. Finally, let Dij be Cij where Ais + Bsj is the smallest over all s ∈ S
e
(breaking ties arbitrarily). The running time for computing D is clearly O(|S| · n(3+ω)/2 ).
(s)
Suppose S ∩ Wij 6= ∅ for some (i, j). Then Dij equals Cij where Ais + Bsj = (A ⋆ B)ij . For
(s) (s)
any k, Aik = Bkj if and only if Aik + Bkj = Ais + Bsj = (A ⋆ B)ij by Fredman’s trick. Therefore,
(s)
Dij = Cij = |Wij |.
Theorem 8.2. If Min-Plus-Product for n × n matrices has an O(n3−ε ) time algorithm for some ε > 0,
′
then #Min-Plus-Product for n × n matrices has an O(n3−ε ) time algorithm for some ε′ > 0.
Proof. Given a #Min-Plus-Product instance on n × n matrices A, B, we first list up to n0.99 elements in

Wij for every i, j. By well-known techniques (e.g. [VW18]), an O(n3−ε ) time Min-Plus-Product algo-
′′
rithm implies an O(n3−ε ) for ε′′ > 0 time algorithm for listing up to n0.99 witnesses for each (i, j) in a
Min-Plus-Product instance.
If we list less than n0.99 elements for some (i, j), then these elements are all the elements in Wij . Thus
we can output the number of elements we list as the exact witness count for (i, j). For each of the remaining
pairs of (i, j), we have found n0.99 witnesses. By the standard greedy algorithm for hitting set, in O(ne 2.99 )
e
time, we can find a set S of size O(n 0.01 ) that intersects with Wij for each of these remaining (i, j) pairs.
Therefore, we can apply Lemma 8.1 to compute the number of witnesses for these remaining (i, j) pairs in
e
O(|S| · n(3+ω)/2 ) ≤ O(n2.70 ) time.
The total running time for the #Min-Plus-Product instance is thus O(n e 3−ε′′ + n2.99 + n2.70 ), which is
truly subcubic.
We then show the reduction in the other direction. The proof is similar to the reduction from a certain
version of Min-Plus product to certain versions of APSP counting in unweighted directed graphs [CVX21].
Theorem 8.3. If #Min-Plus-Product for n × n matrices has an O(n3−ε ) time algorithm for some ε > 0,
′
then Min-Plus-Product for n × n matrices has an O(n3−ε ) time algorithm for some ε′ > 0.
Proof. Let A, B be two n × n matrices of a Min-Plus-Product instance. Let A′ be another n × n matrix

where A′ik = M · Aik + k for some large enough integer M (say M > n). Similarly, we create an
n × n matrix B ′ where Bkj′ = M · B . This way, for every i, j, there exists exactly one k such that
kj ij
′ ′ ′ ′
Aikij + Bkij j = (A ⋆ B )ij . Furthermore, we clearly also have Ai,kij + Bkij ,j = (A ⋆ B)ij .
Then for each integer p ∈ [⌈log(n)⌉], we do the following. Let A′(p) be a copy of A′ , but we duplicate all
columns k where the p-th bit in k’s binary representation is 1. We similarly create B ′(p) which is a copy of
B ′ but we duplicate all rows k where the p-th bit in k’s binary representation is 1. Then we run the O(n3−ε )
30
time #Min-Plus-Convolution algorithm for A′(p) and B ′(p) . Suppose for some i, j, the number of witnesses
is 2, then we know that the p-th bit of kij is 1; otherwise, the p-th bit of kij is 0.
After all [⌈log(n)⌉] rounds, we can compute kij for every i, j. Since Ai,kij + Bkij ,j = (A ⋆ B)ij , we
can then compute the Min-Plus product between A and B in O(n e 2 ) time.
8.2 3SUM Convolution and Min-Plus Convolution

Let A, B and C be the inputs of an All-Nums-3SUM-Convolution or #All-Nums-3SUM-Convolution
instance, and let Wk be the set of i where Ai + Bk−i = Ck , i.e., the set of witnesses for k.
Lemma 8.4. Given a #All-Nums-3SUM-Convolution instance for length n arrays A, B, C, we can com-
e (9+ω)/4 /L) randomized time.
pute |Wk | for every k such that |Wk | ≥ L in O(n
e
Proof. First, we find an arbitrary prime p between 2n and 4n, which can be done in O(n) time. Also, let x
be a uniformly random number sampled from Fp \ {0} and let y be a uniformly random number sampled
from Fp . Then we create three arrays A′ , B ′ and C ′ , indexed by Fp as follows:

′ Ax−1 (i−y) mod p : x−1 (i − y) mod p ∈ [n]
Ai = ,
M : otherwise

′ Bx−1 (i+y) mod p : x−1 (i + y) mod p ∈ [n]
Bi = ,
M : otherwise

′ Cx−1 i mod p : x−1 i mod p ∈ [n]
Ci = ,
3M : otherwise
where M is a large enough number (say M is larger than 10 times the largest absolute value of the input
numbers).
If we use Wk′ to denote the set of i such that A′i + B(k−i)
′ ′
mod p = Ck , then it is not difficult to verify
′
that |Wxk ′ ′
mod p | = |Wk |. Thus, from now on, we aim to compute |Wk | for indices k where |Wk | ≥ L.
We start with the following claim.
√ √
Claim 8.5. Let I ⊆ Fp be any fixed interval of length Θ( n) and let 1 ≤ L′ ≤ n be a fixed value. Then
e (5+ω)/4 /L′ ) time algorithm that computes |W ′ ∩ I| for every k such that |W ′ ∩ I| ≥ L′ ,
there exists an O(n k k
with high probability. Furthermore, for other values of k, we either also compute |Wk′ ∩ I| correctly, or
declare that we don’t know the value of |Wk′ ∩ I|.
Proof. We first reduce the problem of computing |Wk′ ∩ I| to an instance of #AE-Exact-Tri. Similar reduc-
tions from convolution problems to matrix-product type problems were known before [BCD+ 14, VW13].
√
Without loss of generality, we assume I = {0, 1, . . . , ℓ − 2, ℓ − 1} for some ℓ = Θ( n), by subtracting
min I from all indices of A′ and adding min I to all indices of B ′ .
We then create the following tripartite weighted graph G with three parts I, J, T , where |I| = ℓ, |J| =
[⌈p/ℓ⌉] and |T | = 2ℓ − 1. We use Ii to denote the i-th node in I, Jj to denote the j-th node in J and Tt to
denote the t-th node in T . We then add the following edges to the graph:
• For every i ∈ [|I|], t ∈ [|T |] such that i − t + ℓ − 1 ∈ I, we add an edge between Ii and Tt with weight
w(Ii , Tt ) = A′i−t+ℓ−1 .
′
• For every t ∈ [|T |] and j ∈ [|J|], we add an edge between Tt and Jj with weight w(Tt , Jj ) = B((j−1)ℓ+t−ℓ) mod p .
31
• For every i ∈ [|I|], j ∈ [|J|] such that (j − 1)ℓ + i − 1 < p, we add an edge between Ii and Jj with weight
′
w(Ii , Jj ) = −C(j−1)ℓ+i−1 .
Consider any (i, j) ∈ [ℓ] × [⌈p/ℓ⌉] such that (j − 1)ℓ + i − 1 < p. The nodes Tt such that i − t + ℓ − 1 ∈ I
form triangles with edge (Ii , Jj ). The multiset of the weights of these triangles is
n oi+ℓ−1
i+ℓ−1
{w(Ii , Tt ) + w(Jj , Tt ) + w(Ii , Jj )}t=i = A′i−t+ℓ−1 + B((j−1)ℓ+t−ℓ)
′
mod p − C ′
(j−1)ℓ+i−1
t=i
n oℓ−1
= A′r + B((j−1)ℓ+(i−1)−r)
′ ′
mod p − C(j−1)ℓ+i−1 .
r=0
Thus, the number of triangles with weight 0 containing edge (Ii , Jj ) in G is exactly |Wk′ ∩ I| for k =
(j−1)ℓ+i−1. In particular, if |Wk′ ∩I| ≥ L′ , then the number of witnesses for (Ii , Jj ) in the #AE-Exact-Tri
instance on graph G and target value 0 is also at least L′ . Now let S be a random subset of V (G) of size
Cn0.5 log n/L′ for a sufficiently large constant C. Then with high probability, S intersects with the set of
witnesses for every edge (Ii , Jj ) for which k = (j − 1)ℓ + (i − 1) has at least L′ witnesses in I. Now we
can apply Lemma 4.1 on graph G, target value 0 and set S to compute the number of witnesses for these
e √ (3+ω)/2 e (5+ω)/4 /L′ ) time.
edges (Ii , Jj ) in O(|S| n ) = O(n
If S does not intersect with the witnesses for some edge (Ii , Jj ) (which is easy to check in O(|S|n) total
time), we declare that we don’t know the value of |Wk′ ∩ I| for k = (j − 1)ℓ + (i − 1).
Claim 8.6. Let I ⊆ Fp be any fixed interval and k ∈ [n] be any fixed index. If |Wk | ≥ L, then

′ L|I| n
Pr Wxk (mod p) ∩ I ≤ = O( ).
x∼Fp \{0} 2p L|I|
y∼Fp
′
Proof. First, note that Wxk = {xw+y (mod p) : w ∈ Wk }. Let X be the random variable denoting
(mod p)
′
Wxk ∩ I . First, since for any w ∈ Wk , xw + y (mod p) is uniformly at random, Pr[xw + y
(mod p)
(mod p) ∈ I] = |I| p , and consequently E[X] =
|Wk ||I|
p .
For any w, w ∈ Wk where w 6= w , the probability that both xw + y (mod p) and xw′ + y (mod p)
′ ′
fall in I can be expressed as

X
Pr[xw + y ≡ i1 (mod p) ∧ xw′ + y ≡ i2 (mod p)].
i1 ,i2 ∈I
If i1 = i2 , then xw + y ≡ i1 (mod p) and xw′ + y ≡ i2 (mod p) cannot both happen; otherwise, there
exists at most one pair (x, y) ∈ Fp × Fp for which xw + y ≡ i1 (mod p) and xw′ + y ≡ i2 (mod p) are
|I|2
both true. Thus, Pr[xw + y (mod p) ∈ I ∧ xw′ + y (mod p) ∈ I] ≤ |I|(|I|−1)
p(p−1) ≤ p2 . Thus,

|I|2
2 2
Var[X] = E[X ] − E[X] ≤ |Wk |(|Wk | − 1) 2 + E[X] − E[X]2 ≤ E[X].
p
Also,
L|I| |Wk ||I| 1
Pr X ≤ ≤ Pr X ≤ ≤ Pr |X − E[X]| ≤ E[X] .
2p 2p 2
Var[X] n
By Chebyshev’s inequality, this probability can be upper bounded by ( 12 E[X])2
= O( L|I| ).
32
√
We now describe our algorithm for computing |Wk |. First, we split Fp into ℓ = Θ( n) intervals
√ ′
I1 , I2 , . . . , Iℓ , each of size Θ( n). Then it suffices to compute |Wxk mod p ∩ Ii | for each i ∈ [ℓ], since
′
P ℓ ′
|Wk | = |Wxk mod p | = i=1 |Wxk mod p ∩ Ii |.
We first run the algorithm in Claim 8.5 for each i with L′ = L|Ii | , which takes O(n
2p
e (5+ω)/4 /L′ ) =
e (7+ω)/4 /L) time. Claim 8.5 computes |W ′ ′ ′
O(n xk mod p ∩ Ii |√as long as |Wxk mod p ∩ Ii | ≥ L . For each fixed
′
k, it fails to compute |Wxk mod p ∩ Ii | with probability O( n/L) by Claim 8.6. For these k, we enumerate
′
over j ∈ Ii , check if j ∈ Wxk ′
√ mod p , and then compute |Wxk mod p ∩ Ii |. In expectation, the cost of these k
is O( n·L n · |Ii |) = O(n2 /L).
e (9+ω)/4 /L).
Summing over all i ∈ [ℓ], the total expected running time of the algorithm is O(n
Theorem 8.7. If All-Nums-3SUM-Convolution for length n arrays has an O(n2−ε ) time algorithm for
′
some ε > 0, then #All-Nums-3SUM-Convolution for length n arrays has an O(n2−ε ) time randomized
algorithm for some ε′ > 0
Proof. Similar as before, given an #All-Nums-3SUM-Convolution instance on length n arrays A, B, C, we
′′
can count the number of witnesses for Ck that have at most n0.99 witnesses in O(n2−ε ) time by well-known
techniques [VW18] when All-Nums-3SUM-Convolution has a truly subquadratic algorithm.
For the rest values of k, we run the algorithm in Lemma 8.4 which runs in O(n e (9+ω)/4 /n0.99 ) =
O(n1.86 ) time.
′′
Overall, the algorithm for #All-Nums-3SUM-Convolution runs in O(n2−ε + n1.86 ) time, which is
truly subquadratic.
As in Remark 4.3, Theorem 8.7 implies that 3SUM-Convolution is subquadratically equivalent to

#3SUM-Convolution.
Theorem 8.8. If Min-Plus-Convolution for length n arrays has an O(n2−ε ) time algorithm for some ε > 0,
′
then #Min-Plus-Convolution for length n arrays has an O(n2−ε ) time randomized algorithm for some
ε′ > 0.
Proof. Given an #Min-Plus-Convolution instance for length n arrays A, B, we first run the assumed
Min-Plus-Convolution algorithm to compute the Min-Plus convolution C of A and B in O(n2−ε ) time.
Similar as before, given the O(n2−ε ) time algorithm for Min-Plus-Convolution, we can count the
′′
number of witnesses for Ck that have at most n0.99 witnesses in O(n2−ε ) time by well-known tech-
niques [VW18].
For the rest values of k, we run the algorithm in Lemma 8.4 with arrays A, B, C and L = n0.99 which
e (9+ω)/4 /n0.99 ) = O(n1.86 ) time.
runs in O(n
′′
Overall, the algorithm for #Min-Plus-Convolution runs in O(n2−ε + n2−ε + n1.86 ) time, which is
truly subquadratic.
We then show a reduction from Min-Plus-Convolution to #Min-Plus-Convolution.

Theorem 8.9. If #Min-Plus-Convolution for length n arrays has an O(n2−ε ) time algorithm for some
′
ε > 0, then Min-Plus-Convolution for length n arrays has an O(n2−ε ) time randomized algorithm for
some ε′ > 0.
Proof. Let A and B be two length n arrays for a Min-Plus-Convolution instance. Let C be their Min-Plus
convolution. As in proof of Theorem 8.3, we can assume |Wk | = 1 for every k, i.e., there exists a unique ik
such that Aik + Bk−ik = Ck .
33
For each p ∈ [⌈log(n)⌉], we perform the following round. Let A′ be a length 2n array such that
A′2i−1 = Ai for every i ∈ [n], A′2i = Ai for every i ∈ [n] whose p-th bit in its binary representation is 1,
and A′2i = ∞ for the rest of i. Also, let B ′ be a length 2n array such that B2i−1
′ = B2i ′ = B for every
i
i ∈ [n]. Now we use the #Min-Plus-Convolution algorithm for arrays A and B . Suppose C ′ is the Min-
′ ′
Plus convolution between A′ and B ′ . Clearly, for any k ∈ [n], C2k−1

′ = Ck . Also, suppose C2k−1′ has 2
′
witnesses, then we know that A2ik = Aik and thus the p-th bit in ik -th binary representation is 1; otherwise
the p-th bit in ik -th binary representation is 0.
Thus, after the ⌈log(n)⌉ rounds, we can compute ik for each k ∈ [n], which can then be used to compute
the Min-Plus convolution C between A and B in O(n) time.
8.3 All-Numbers 3SUM

Theorem 8.10. If All-Nums-3SUM for sets of n numbers has an O(n2−ε ) time algorithm for some ε > 0,
′
then #All-Nums-3SUM for sets of n numbers has an O(n2−ε ) time randomized algorithm for some ε′ > 0.
Proof. If All-Nums-3SUM for sets of n numbers has an O(n2−ε ) time algorithm for ε > 0, then so does
All-Nums-3SUM-Convolution for length n arrays, since All-Nums-3SUM-Convolution is not harder
than All-Nums-3SUM. Then by Theorem 8.7, #All-Nums-3SUM-Convolution for length n arrays has
′′
an O(n2−ε ) time algorithm for some ε′′ > 0. Therefore, it suffices to reduce #All-Nums-3SUM to
#All-Nums-3SUM-Convolution. Some previous reductions from 3SUM to 3SUM-Convolution actu-
ally work for the counting variants as well [Păt10, CH20]. Arguably the simplest such reduction is given
in [CH20, Section 3]. Applying their reduction finishes the proof.
As in Remark 4.3, Theorem 8.10 implies that 3SUM is subquadratically equivalent to #3SUM.
Still more equivalence results for other counting and detection problems are given in Appendix A.
8.4 Discussion
Abboud, Feller and Weimann [AFW20] showed that counting the number of Negative Triangles in a
graph (even mod 2) can solve Exact-Tri, thus presenting a barrier to showing that the Negative Triangle
problem (Neg-Tri) is equivalent to its counting variant: Vassilevska W. and Williams [VW18] showed that
Neg-Tri is equivalent to APSP under subcubic fine-grained reductions; then if #Neg-Tri can be reduced
to Neg-Tri, one can also reduce it to APSP, and since there are fine-grained reductions from 3SUM to
Exact-Tri [VW13], and from Exact-Tri to #Neg-Tri [AFW20], one would get a very surprising reduction
from 3SUM to APSP. There is some evidence that such a reduction would be difficult to obtain: for
instance, while APSP has a superlogarithmic improvement over its simple cubic algorithm [Wil18], the
best improvement over the simple quadratic algorithm of 3SUM only shaves two logarithmic factors (e.g.
[BDP08])!
Our equivalences between Min-Plus-Product (and thus Minimum Weight Triangle) and Exact-Tri re-
spectively with their counting variants exhibit a strange phenomenon: Neg-Tri seems different from these
problems! Or, perhaps, if we believe that Neg-Tri is like these problems and is equivalent to #Neg-Tri, then
we should be more optimistic about the existence of a fine-grained reduction from 3SUM to APSP.
Another line of work in which counting variants of fine-grained problems have been considered is in
worst-case to average-case reductions and fine-grained cryptography [BRSV17, BRSV18, BBB19, GR20,
DLV20, LLV19, Mer78]: building cryptographic primitives from worst-case fine-grained assumptions that
might still hold even if P = NP. The known techniques for worst-case to average-case reductions for fine-
grained problems only work for counting problems, whereas the design of fine-grained public key protocols
[LLV19, Mer78] seem to require that the decision variants are hard on average.
34
Suppose that one can use the known toolbox for worst-case to average-case reductions for counting
problems to show that #Exact-Tri or #3SUM is hard on average. Then via our reductions back to Exact-Tri
and 3SUM, one would get some distributions for which these decision problems are actually hard. This
could pave the way to new public-key protocols.
9 Counting Algorithms in Other Models

9.1 Co-Nondeterministic Algorithms for Counting Problems with Integer Inputs
First, we show how to modify the co-nondeterministic algorithms in [CGI+ 16] to work for the counting
versions of AE-Exact-Tri, AE-Neg-Tri and All-Nums-3SUM.
Before we show these algorithms, we start with the following simple observation.
Claim 9.1. For any integers a, b, c, a + b + c ≤ 0 if and only if one of the followings is true:
• ⌈ a2 ⌉ + ⌈ 2b ⌉ + ⌈ 2c ⌉ ≤ 0;
• At least 2 of a, b, c are odd and ⌈ a2 ⌉ + ⌈ 2b ⌉ + ⌈ 2c ⌉ = 1.

It has the following immediate consequence:
Lemma 9.2. Given a #AE-Neg-Tri instance on an n-node tripartite graph G with node partitions A, B, and
C and with edge weights in [±nO(1) ], we can reduce it to O(log n) instances of #AE-Exact-Tri on graphs
e 2 ) time. Furthermore,
with the same vertex set and with edge weights in [±nO(1) ] deterministically in O(n
the count for an edge in the original #AE-Neg-Tri instance can be written as a sum of the count for the
corresponding edge among the #AE-Exact-Tri instances (if this edge does not exist in a certain instance,
then the count is simply 0).
Proof. Without loss of generality, we assume that the instance G has weight function w and we would like
to compute for every a ∈ A, c ∈ C, the number of b ∈ B such that w(a, b) + w(b, c) + w(a, c) < 0. Then
we add 1 to the weight of every edge (a, c) ∈ A × C. Now we need to count the number of number of b ∈ B
such that w(a, b) + w(b, c) + w(a, c) ≤ 0.
Consider recursing with respect to the bound [±W ] on the edge weights, initially W = nO(1) .
1. If W = O(1). Then we enumerate all possible combinations of 3 weights w1 , w2 , w3 ∈ [±W ] such
that w1 + w2 + w3 ≤ 0, and for each combination, we create an #AE-Exact-Tri instance with all edges
(a, b) ∈ A × B s.t. w(a, b) = w1 , all edges (b, c) ∈ B × C s.t. w(b, c) = w2 and all edges (c, a) ∈ C × A
s.t. w(c, a) = w3 . In these instances, we make all edge weights 0, so that all triangles have weights 0.
Thus, the total number of exact triangles through edge (a, c) in these O(1) instances is exactly the number
of non-positive-weight triangles through edge (a, c) in the original instance.
2. Otherwise, consider Claim 9.1. A triangle (a, b, c) has w(a, b) + w(b, c) + w(c, a) ≤ 0 if and only if
⌈ w(a,b) w(b,c) w(c,a) w(a,b)
2 ⌉+⌈ 2 ⌉+⌈ 2 ⌉ ≤ 0 or at least 2 of w(a, b), w(b, c), w(c, a) are odd and ⌈ 2 ⌉+⌈ 2 ⌉+
w(b,c)
⌈ w(c,a)
2 ⌉ = 1. Note that these two cases are disjoint, so we can separately consider them and sum up the
counts. For the first case, we can create a graph G′ by replacing the weight w(u, v) of each edge with
⌈ w(u,v) ′
2 ⌉ and recursively solve the problem on G . For the second case, we enumerate which subset of
w(a, b), w(b, c) and w(c, a) are odd (there should be at least two of them) and keep the corresponding
edges in G′ only if they meet the parity condition. This way, we will create 4 sub-problems, where each
sub-problem is an exact triangle instance with target value 1.
35
Overall, we will create O(log n) #AE-Exact-Tri instances as initially W = nO(1) .
Remark 9.3. Lemma 9.2 implies that #AE-Neg-Tri subcubically reduces to #AE-Exact-Tri. As shown
in [AFW20], #Exact-Tri reduces to #Neg-Tri, and the same reduction works from #AE-Exact-Tri to #AE-Neg-Tri.
Thus, #AE-Neg-Tri and #AE-Exact-Tri are subcubically equivalent. By Theorem 4.2, they are also subcu-
bically equivalent to AE-Exact-Tri.
Theorem 9.4. #AE-Exact-Tri for graphs with integer weights in [±nα ] for any constant α is in (N ∩
e (3+ω)/2 )]. The same bound holds for #AE-Neg-Tri.
coN)TIME[O(n
Proof. Suppose we are given an instance of #AE-Exact-Tri. Without loss of generality, we assume the
instance is a weighted graph G with node partitions A, B, and C and weight function w, and we would like
to compute for every a ∈ A, c ∈ C, the number of b ∈ B such that w(a, b) + w(b, c) + w(a, c) = 0. We will
actually more generally count the number of negative-weight, zero-weight, and positive-weight triangles
through each edge in A × C.
Both the prover and the verifier run the reduction in Lemma 9.2 on G to get #AE-Exact-Tri instances
(1) (O(log n))
G< , . . . , G< . They also similarly run the reduction in Lemma 9.2 on G but with all edge weights
(1) (O(log n))
negated to get #AE-Exact-Tri instances G> , . . . , G> . Without loss of generality, we can assume
these #AE-Exact-Tri instances all have target value 0. By construction in Lemma 9.2, all these graphs have
the same vertex set A ∪ B ∪ C. Also, let q be a constant to be fixed later.
(1) (O(log n)) (1) (O(log n))
The prover provides the following for each of the graphs G, G< , . . . , G< , G> , . . . , G>
(we refer to it by G′ ):
• A prime p in the interval [n1−q , Cn1−q log n] for a large enough constant C.
The prime p is supposed to be such that the number of triangles in G′ that are zero mod p but are nonzero
otherwise is at most O(n2+q ). Such a prime is guaranteed to exist since each triangle that has nonzero
weight in [−nα , nα ] is zero mod at most log(3nα )/ log(n1−q ) primes in the interval [n1−q , Cn1−q log n].
n3 log(3nα )
The interval contains at least n1−q primes, so some prime must give rise to at most log(n 1−q )·n1−q =
2+q
O(n ) fake zero triangles.
• A set R of O(n2+q ) triangles. These are supposed to be the triangles in G′ that are nonzero but are zero
mod p.
The verifier first checks that R contains only triangles whose weights are nonzero but are zero mod p.
This takes O(|R|) = O(n2+q ) time. Then it counts in Õ(n1−q+ω ) time (using [AGM97]) the number of
triangles that are zero mod p through each edge (a, c). Call it tG′ (a, c). For each edge (a, c), the verifier
subtracts the number of triangles through it in R from tG′ (a, c). Now, notice that tG′ (a, c) must be an upper
bound on the number of zero triangles through (a, c); also, if R contains all the triangles whose weights
are nonzero but are zero mod p as it is supposed to, tG′ (a, c) will be equal to the number of zero triangles
through (a, c).
Now, the verifier applies the second partP of Lemma 9.2 to sum up the corresponding counts. For every
edge (a, c) ∈ A × C, it will get T< (a, c) = i tG(i) (a, c), which is supposed to be the number of negative-
P <
weight triangles through (a, c); T> (a, c) = i tG(i) (a, c), which is supposed to be the number of positive-
>
weight triangles through (a, c); and tG (a, c), which is supposed to be the number of zero-weight triangles
through (a, c).
e ω ) time.
Then the algorithm counts the number of triangles s(a, c) through each edge (a, c) in O(n
36
Finally, the algorithm verifies s(a, c) = T< (a, c) + T> (a, c) + tG (a, c) and then outputs tG (a, c) for
every edge (a, c).
The correctness of the algorithm relies on the following: Suppose s< (a, c), s= (a, c), s> (a, c) are the
actual number of negative-weight, zero-weight, and positive-weight triangles through each edge (a, c). Then
by previous discussion, the algorithm is sure that s< (a, c) ≤ T< (a, c), s= (a, c) ≤ tG (a, c) and s> (a, c) ≤
T> (a, c), and trivially s(a, c) = s< (a, c) + s= (a, c) + s> (a, c). These two combined with s(a, c) =
T< (a, c) + T> (a, c) + tG (a, c) implies s< (a, c) = T< (a, c), s= (a, c) = tG (a, c) and s> (a, c) = T> (a, c).
The running time is O(ne 2+q + n1−q+ω ) and is minimized for q = (ω − 1)/2. The verifier’s running
e
time is thus O(n (3+ω)/2 ).
The above algorithm clearly also works for #AE-Neg-Tri.
Similarly, we can design a sub-quadratic time co-nondeterministic algorithm for #All-Nums-3SUM.

Theorem 9.5. #All-Nums-3SUM for size-n sets A, B, C of integers in [±nα ] for any constant α is in
e 1.5 )].
(N ∩ coN)TIME[O(n
Proof sketch. The proof is essentially the same as the proof of Theorem 9.4, with the following modifica-
tions.
Without loss of generality, we assume #All-Nums-3SUM asks to count the number of (a, b) ∈ A × B
where a + b + c = 0 for every fixed c ∈ C.
Then, similar to Lemma 9.2, we can reduce counting the number of (a, b) ∈ A × B where a + b + c < 0
and counting the number of (a, b) ∈ A×B where a+b+c > 0 to O(log n) instances of #All-Nums-3SUM.
For each of these instances on (A′ , B ′ , C ′ ), the prover provides a prime p in the interval [n1+q , Cn1+q log n]
for a large enough constant C and a constant q to be fixed. It also provides a set of R of O(n3 /n1+q ) =
O(n2−q ) triples (a, b, c), which are supposed to be the triples in A′ × B ′ × C ′ whose sums are nonzero but
are zero mod p.
For each of these instances, the verifier checks that R contains only triples whose sum is nonzero but is
e
zero mod p. Then it counts in O(p) e 1+q ) time (using FFT) the number of triples (a, b, c) whose sum
= O(n
is zero mod p for every c ∈ C ′ , and then subtracts the corresponding count in R from this count.
Similar to Theorem 9.4, by summing up the counts in all these instances for every c ∈ C, and checking
whether that equals |A||B|, the verifier will be sure to get the correct count of zero-sum triples involving
each c ∈ C.
e 1+q + n2−q ), which is O(n
The running time of the verifier is O(n e 1.5 ) by setting q = 0.5.

9.2 Co-Nondeterministic Algorithms for Counting Problems with Real Inputs

The co-nondeterministic algorithms in Section 9.1 heavily rely on the idea of modulo a prime p, which
will no longer be possible if the input numbers are reals. In this section, we study algorithms for #AE-Real-Exact-Tri,
#AE-Real-Neg-Tri and #All-Nums-Real-3SUM.
For problems with real inputs in the nondeterministic model, we assume the verifier has access to a
“Reasonable” Real RAM model, which was discussed in [CVX22]. In such models, only a restricted subset
of operations are allowed on the real-valued inputs, while any operations in the Word RAM model with
O(log n)-bit words are allowed for the integer parts of the computation. In particular, our algorithms work
in the Real RAM with low-degree predicates model. Also, if we slightly change the definition of Exact-Tri
to finding a triangle (a, b, c) such that w(a, b) + w(b, c) = w(a, c) (and similarly for Neg-Tri), our algo-
rithms will work in the Real RAM with 4-linear comparisons model. See [CVX22] for more details about
“Reasonable” Real RAM models.
37
Our ideas for proving equivalence between counting and detection problems can be used to obtain new
nondeterministic algorithms for counting problems with real inputs, as we show in this subsection.
We start with the following co-nondeterministic algorithm for Equality-Product and Dominance-Product.
Lemma 9.6. Equality-Product between an n1 × n2 matrix A and an n2 × n3 matrix B, where we only

need to determine the results on a given subset X ⊆ [n1 ] × [n3 ], is in
h i
(N ∩ coN)TIME O e |X|ns2 + M (n1 , n2−s , n3 )
2
time for any s ∈ [0, 1]. The same bound holds for Dominance-Product.
Proof. Without loss of generality, we can assume all entries of A and B are integers in O(n1 n2 + n2 n3 ), by
replacing each entry by its rank.
For every (i, j) ∈ [n1 ] × [n3 ], we use c= (i, j) to denote the number of k where Aik = Bkj , c< (i, j) to
denote the number of k where Aik < Bkj and c> (i, j) to denote the number of k where Aik > Bkj . Instead
of only computing c= (i, j) for (i, j) ∈ X, we will more generally compute c< (i, j) and c> (i, j) as well.
By known reductions from Dominance-Product to Equality-Product [LUWG19, Vas15], we can cre-
ate O(log n) instances of Equality-Product on matrices of the same dimensions, and use the sum of result-
ing values on entry (i, j) over these O(log n) instances to compute c< (i, j). It similarly holds for c> (i, j).
For each of the O(log n) instances of Equality-Product (the O(log n) instances generated above, and
the original instance), the prover provides the following (say the instance is on matrices A′ , B ′ ):
• A prime p in the interval [n1−s 1−s

2 , Cn2 log(n2 )] for a large enough constant C.
The prime p is supposed to be such that the number of triples (i, k, j) where (i, j) ∈ X, k ∈ [n2 ] and
′ while A′ ≡ B ′ (mod p) is at most O(|X|n
A′ik 6= Bkj e s ). Such a prime is guaranteed to exist since for
ik kj 2
e
every triple (i, k, j) with A′ 6= B ′ , A′ and B ′ are congruent mod at most O(1) primes in the interval
ik kj ik kj
[n1−s
2 , Cn
1−s
2 log(n
1−s
2 )]. The interval contains at least n2 primes, so some prime must give rise to at
most Oe |X|n 2 e
= O(|X|n s ) fake zero triangles.
1−s
n2 2
e
• A set R of O(|X|n s ) triples (i, k, j) where (i, j) ∈ X and k ∈ [n ]. These are supposed to be the triples
2
2
where Aik 6= Bkj while A′ik ≡ Bkj
′ ′ ′ (mod p).
The verifier is similar to the verifier in the proof of Theorem 9.4. For each Equality-Product instance,
after checking that the set R only contains triples (i, k, j) where (i, j) ∈ X, k ∈ [n2 ] and A′ik 6= Bkj ′
while A′ ≡ B ′ (mod p) in O(|R|) e time, it computes C ′ , the number of k ∈ [n2 ] such that A′ ≡ B ′
ik kj ij ik kj
e (n1 , pn2 , n3 )) time, by packing p instances of matrix
(mod p), for each pair of (i, j) ∈ [n1 ] × [n3 ] in O(M
multiplications of dimensions n1 × n2 × n3 together. Similar as before, by subtracting the number of triples
involving (i, j) in R from Cij ′ , C ′ becomes an upper bound on the number of k such that A′ = B ′ .
ij ik kj
Finally, by checking that the sum of Cij ′ over all the Equality-Product instances equals n for every
2
′ equals exactly the number of k such that A′ = B ′ in each of
(i, j) ∈ X, the verifier will be sure that Cij ik kj
the instances. In particular, it can compute c= (i, j), c< (i, j) and c> (i, j) for every (i, j) ∈ X.
e
The running time of the verifier is O(|R|+M e
(n1 , pn2 , n3 )) = O(|X|n s +M (n , n2−s , n )), as desired.
1 2 3
2

Next, we are ready to present our co-nondeterministic algorithm for #AE-Real-Exact-Tri and #AE-Real-Neg-Tri.
38
6+ω
e
Theorem 9.7. #AE-Real-Exact-Tri for n-node graphs is in (N ∩ coN)TIME[O(n 3 )]. The same bound
holds for #AE-Real-Neg-Tri.
Proof. Without loss of generality, we assume the instance is a weighted graph G with node partitions A, B,
and C and weight function w, and we would like to compute for every a ∈ A, c ∈ C, the number of b ∈ B
such that w(a, b) + w(b, c) + w(a, c) = 0. Also, let q be a constant to be fixed later.
The prover provides the following:
e q ).
• A subset R ⊆ [n] of size O(n
For every (a, c) ∈ A × C, let Lac = {w(a, b) + w(b, c) : b ∈ R}. Let pac be the index of the pre-
decessor of −w(a, c) (including −w(a, c)) in Lac , i.e., it is arg maxb∈R,w(a,b)+w(b,c)≤−w(a,c) (w(a, b) +
w(b, c)), and let sac be the index of the successor of −w(a, c) (excluding −w(a, c)) in Lac , i.e., it is
arg minb∈R,w(a,b)+w(b,c)>−w(a,c) (w(a, b) + w(b, c)). The set R is supposed to be that, if −w(a, c) 6∈ Lac ,
then the number of b ∈ B where w(a, pac ) + w(pac , c) < w(a, b) + w(b, c) < w(a, sac ) + w(sac , c) is
O(n1−q ). Such R exists because a random R satisfies these properties with high probability.
• For every b0 ∈ R, the prover uses the protocol in Lemma 9.6 to create outputs for the purpose of counting
|{w(a, b) − w(a, b0 ) = w(b0 , c) − w(b, c) : b ∈ B}| and |{w(a, b) − w(a, b0 ) ≤ w(b0 , c) − w(b, c) : b ∈ B}|,
where X is the set of (a, c) such that b0 = pac or b0 = sac .
• Finally, for every (a, c) with −w(a, c) 6∈ Lac , it provides a list ℓac ⊆ [n], which is supposed to contain
all indices b with w(a, pac ) + w(pac , c) < w(a, b) + w(b, c) < w(a, sac ) + w(sac , c). Note that |ℓac | =
O(n1−q ) by the choice of R.
e 2 |R|) = O(n
The verifier does the following. First, it computes Lac , pac , sac in O(n e 2+q ) time. Next, it
uses the algorithm in Lemma 9.6 to correctly count, for every b0 ∈ R, the values of
|{w(a, b) − w(a, b0 ) = w(b0 , c) − w(b, c) : b ∈ B}|
and
|{w(a, b) − w(a, b0 ) ≤ w(b0 , c) − w(b, c) : b ∈ B}|
for (a, c) such that b0 = pac or b0 = sac . Let xi be the size of X for the i-th call of Lemma 9.6. The running
time of the i-th call of Lemma 9.6 can be bounded by
√
Oe |X|ns + M (n, n2−s , n) = O e |X|ns + n1−s nω = O(n e ω+1 2 xi + nω ),
q P
1+ω
by setting ns = min n, n|X| . Then we notice that i xi = O(n2 ). Therefore, by convexity, the
running time can be bounded as
s !
X ω+1 √ ω+1 n 2 ω+3+q
e n 2 xi + n ω = O
O e |R| · n 2 + |R| · nω = O(ne 2 + nω+q ).
|R|
i
For some (a, c), if w(a, pac ) + w(pac , c) = −w(a, c), then by Fredman’s trick,
|{w(a, b) − w(a, b0 ) = w(b0 , c) − w(b, c) : b ∈ B}|

= |{w(a, b) + w(b, c) = w(a, b0 ) + w(b0 , c) : b ∈ B}|
= |{w(a, b) + w(b, c) + w(a, c) = 0 : b ∈ B}|
39
for b0 = pac , which is exactly the count we seek. Otherwise, the algorithm computes the number of b where
w(a, pac ) + w(pac , c) < w(a, b) + w(b, c) < w(a, sac ) + w(sac , c) via
|{w(a, b) − w(a, sac ) ≤ w(sac , c) − w(b, c) : b ∈ B}|

− |{w(a, b) − w(a, sac ) = w(sac , c) − w(b, c) : b ∈ B}|
− |{w(a, b) − w(a, pac ) ≤ w(pac , c) − w(b, c) : b ∈ B}| ,
where all three counts are computed earlier. Next, the verifier checks that the length of ℓac equals this count,
and for every b ∈ ℓac , the verifier checks w(a, pac ) + w(pac , c) < w(a, b) + w(b, c) < w(a, sac ) + w(sac , c).
If these checks pass, then ℓac contains exactly the set of b where w(a, pac )+w(pac , c) < w(a, b)+w(b, c) <
w(a, sac ) + w(sac , c). By the definition of pac and sac , it must be the case that w(a, pac ) + w(pac , c) <
−w(a, c) < w(a, sac ) + w(sac , c). Therefore, by reading the list ℓac and count how many b ∈ ℓac has
w(a, b) + w(b, c) = −w(a, c), the verifier can correctly compute the number of exact triangles through edge
(a, c). Overall, the cost of this step is the total length of ℓac , which is O(n3−q ).
e 2+q + n ω+3+q
The running time of the verifier is O(n + nω+q + n3−q ). By setting q = 3−ω
3 , it becomes
2
6+ω
e
O(n 3 ).
To adapt the algorithm to #AE-Real-Neg-Tri, for every (a, c), we also count the number of b such that
w(a, b) + w(b, c) < w(a, pac ) + w(pac , c) via
|{w(a, b) − w(a, pac ) ≤ w(pac , c) − w(b, c) : b ∈ B}|

− |{w(a, b) − w(a, pac ) = w(pac , c) − w(b, c) : b ∈ B}| .
If −w(a, c) ∈ Lac , then w(a, pac ) + w(pac , c) = w(a, c), so the above count is exactly the number of
negative triangles through (a, c). Otherwise, every b ∈ B with w(a, b) + w(b, c) < w(a, pac ) + w(pac , c) <
−w(a, c) forms a negative triangle with (a, c). All other negative triangles can be found via searching ℓac .
Before we present our co-nondeterministic algorithm for #All-Nums-Real-3SUM, we first introduce

the following lemma.
Lemma 9.8. Given an n1 × n2 matrix A, an n3 × n2 matrix B, and a subset X ⊆ [n1 ] × [n3 ], counting the
number of (k, ℓ) where Aik = Bjℓ for every (i, j) ∈ X is in
h i
(N ∩ coN)TIME O e |X|ns2 + M (n1 , n2−s , n3 )
2
time for any s ∈ [0, 1]. Consequently, counting the number of (k, ℓ) where Aik ≤ Bjℓ for every (i, j) ∈ X
is also in h i
(N ∩ coN)TIME O e |X|ns + M (n1 , n2−s , n3 )
2 2
time for any s ∈ [0, 1].
Proof. First, we can assume all entries of A and B are integers bounded by O(n1 n2 + n2 n3 ), by replacing
each entry by its rank.
We enumerate b1 , b2 ∈ {0} ∪ [⌊log(n2 )⌋]. If the occurrence of a number on the i-th row of A has a 1 in
its binary representation on the bit corresponding to 2b1 , we keep one copy of this number on the i-th row of
A; otherwise, we drop this number. Similarly, if the occurrence of a number on the j-th row of B has a 1 in
its binary representation on the bit corresponding to 2b2 , we keep one copy of this number on the j-th row
40
of B; otherwise, we drop this number. We then solve the original problem on these two modified matrices.
Finally, we can sum up the results obtained on these modified matrices, weighted by 2b1 +b2 . This way, we
can assume all numbers on the i-th row of A are distinct for any i, and all numbers on the j-th row of B are
distinct for any j.
• A function h : [O(n1 n2 + n2 n3 )] → [n2 ].

The function is supposed to be that, for every i ∈ [n1 ], each value in [n2 ] matches at most O(1)e numbers
e
in the multiset {h(Aik ) : k ∈ [n2 ]}. Similarly, for every j ∈ [n3 ], each value in [n2 ] matches at most O(1)
numbers in the multiset {h(Bjk ) : k ∈ [n2 ]}. Such a function exists because a random function satisfies
these constraints with high probability.
e 2 ) matrix, where the columns of it is indexed by [n2 ] × [O(1)]
• Two matrices A′ , B ′ . A′ is an n1 × O(n e ×
e e ′
[O(1)]. For every i ∈ [n1 ], k ∈ [n2 ], s2 ∈ [O(1)], we set Ai,(h(Aik ),s1 ,s2 ) to Aik , where Aik is the s1 -th
number on row i that get mapped to h(Aik ). Similarly, B ′ is an n3 × O(n e 2 ) matrix, where the columns of
e e e
it is indexed by [n2 ] × [O(1)] × [O(1)]. For every j ∈ [n3 ], k ∈ [n2 ], s1 ∈ [O(1)], ′
we set Bj,(h(Bjk ),s1 ,s2 )
to Bjk , where Bjk is the s2 -th number on row j that get mapped to h(Bjk ). All other entries of A′ and B ′
are set to values distinct from any other values.
• Use the protocol in Lemma 9.6 to create outputs for the purpose of computing the equality product between
A′ and B ′T with the output set X.
The verifier does the following. First, it checks that h, A′ , B ′ are valid. Then it runs the algorithm in
Lemma 9.6 to compute, for every (i, j) ∈ X, the number of (k, s1 , s2 ) where A′i,(k,s1,s2 ) = Bj,(k,s′
1 ,s2 )
.
Note that this is exactly the number of (k, ℓ) where Aik = Bjℓ . The running time of the algorithm is same
as that of Lemma 9.6.
If we instead want to count the number of (k, ℓ) where Aik ≤ Bjℓ for every (i, j) ∈ X, we can use the
idea that reduces Dominance-Product to Equality-Product [LUWG19, Vas15] to create O(1) e instances
e
of the previous problem, so it only incurs an additional O(1) factor.
Given an All-Nums-3SUM instance on size-n sets A, B, C, we can first sort A and B and then divide
them to consecutive sub-lists A1 , . . . , An/d and B1 , . . . , Bn/d of size d. It is a well-known observation that,
in order to determine whether some c ∈ C is in a 3SUM solution, it suffices to search for c in Ai + Bj
for O(n/d) pairs of (i, j) ∈ [n/d]2 . For #All-Nums-Real-3SUM, the observation is still true: For every
c ∈ C, it suffices to count the number of c in Ai + Bj for O(n/d) pairs of (i, j) ∈ [n/d]2 . We can thus
determinitically reduce #All-Nums-Real-3SUM to the following problem (see its non-counting version in,
e.g., [Cha20]).
Problem 9.9. We are given two real (n/d) × d matrices A and B, and a set Cij of real numbers for every
P j). For every c 2∈ Cij , we are asked to count the number of (k, ℓ) such that Ai,k + Bj,ℓ = c. Additionally
(i,
i,j |Cij | = O(n /d).
3ω+3
e ω+3 )].
Theorem 9.10. #All-Nums-Real-3SUM for size-n sets is in (N ∩ coN)TIME[O(n
Proof. We first deterministically reduce #All-Nums-Real-3SUM to Problem 9.9. The proof then proceeds
similarly to the proof of Theorem 9.7.
41
e
• A subset R ⊆ [d] × [d] of size O(r).
For every i, j, let Lij = {Aik + Bjℓ : (k, ℓ) ∈ R}. Let (pkijc , pℓijc ) be the index of the predecessor of c
in Lij (including c) and let (skijc , sℓijc ) be the index of the successor of c in Lij (excluding c). The set
R is supposed to be that, for some c ∈ Cij and c 6∈ Lij , the number of (k, ℓ) with Ai,pkijc + Bj,pℓijc <
Aik + Bjℓ < Ai,skijc + Bj,sℓijc is O(d2 /r). Such a set exists because a random R satisfies these properties
with high probability.
• For every (k0 , ℓ0 ) ∈ R, it uses the protocol in Lemma 9.8 to create outputs for the purpose of counting
|{Aik − Aik0 = Bjℓ0 − Bjℓ }| and |{Aik − Aik0 ≤ Bjℓ0 − Bjℓ }|, where X is the set of (i, j) such that
there exists c ∈ Cij with (pkijc , pℓijc ) = (k0 , ℓ0 ) or (skijc , sℓijc ) = (k0 , ℓ0 ).
• Finally, for every (i, j) and c ∈ Cij with c 6∈ Lij , it provides a list ℓℓijc ⊆ [d] × [d], which is supposed
to contain all pairs (k, ℓ) with Ai,pkijc + Bj,pℓijc < Ai,k + Bj,ℓ < Ai,skijc + Bj,sℓijc . Note that |ℓℓijc | =
O(d2 /r) by the choice of R.
The verifier is almost identical to the verifier in Theorem 9.7, so we omit its details for conciseness. The
running time of the verifier is
 
2 X X
Oe d · |Cij | + xi ds + M
n 2−s n 
,d , ,
r d d
i,j i
1
where xi is the size of X in the i-th call of Lemma 9.8. By picking d = n 3−s , the running time can be
simplified to
 
2
X X
e n
3−s s 2−s
O · |Cij | + xi n 3−s + n 3−s ·ω  .
r
i,j i
P P 5−2s
We know i xi = O( i,j |Cij |) = O(n2 /d) = O(n 3−s ), and we call Lemma 9.8 a total of O(|R|) =
O(r) times. Thus, we can further upper bound the running time by
7−2s !
n 3−s 5−s 2−s
e
O + n 3−s + r · n 3−s ·ω .
r
3 3ω+3
By setting s = 2ω−3 e ω+3 ).
and r = n ω+3 , the running time becomes O(n
ω
Remark 9.11. The above approach also leads to new consequences in a certain unrealistic model of com-
putation: an unrestricted Real RAM, supporting standard arithmetic operation on real numbers with un-
bounded precision, but without the floor function. With the floor function, it is known that the model
enables PSPACE-hard problems to be solved in polynomial time [Sch79]. In a recent paper [CVX22], it
was noted that even without the floor function, the model may still be unreasonably powerful; for example,
there is a truly subcubic time algorithm for APSP for integer input under this model. But the question of
whether there are similarly subcubic algorithms for APSP and other related problems for real input was not
answered.
42
Our proof of Theorem 9.7 implies a truly subcubic randomized time algorithm for #AE-Real-Exact-Tri
in this unrestricted Real RAM without the floor function. This is because we use nondeterminism mainly
to generate all the witnesses. But using large numbers, we can do standard matrix product and have all the
witnesses represented as a long bit vector. When witnesses are needed for an output entry, we can generate
them one by one using a most-significant-bit operation, which can be simulated by binary search. Small
adaptation of the proof also shows a truly subcubic randomized time algorithm for the real-valued version
of #Min-Plus-Product. Similarly, Theorem 9.10 implies a truly subquadratic randomized time algorithm
for #All-Nums-Real-3SUM in the unrestricted Real RAM without the floor function.
9.3 Quantum Algorithms for Counting Problems

Our equivalences between counting and detection problems immediately imply faster quantum algo-
rithms for counting problems. For instance, we obtain the following
e 2−ε ) time quantum algorithm for #3SUM for some ε > 0.
Corollary 9.12. There exists an O(n
e
Proof. It is known that 3SUM can be solved in O(n) quantum time (see e.g. [AL20]). Applying Theo-
rem 8.10 finishes the proof.
9.4 Discussion
#CNF-SAT asks to count the number of satisfying assignments to a CNF formula, and it is considered
harder than CNF-SAT: the counting version #SETH of the Strong Exponential Time Hypothesis (SETH)
[IPZ01, CIP10] is considered even more believable than SETH (see [CM16]). Williams’ [Wil05] reduction
from CNF-SAT to OV preserves the counts, and thus is also a fine-grained reduction from #CNF-SAT to
#OV. Similar to the situation for CNF-SAT, there’s no known fine-grained reduction from #OV to OV.
Our techniques do not yet give equivalences between the decision and counting variants of CNF-SAT
and OV. If such equivalences do not exist, this would indicate that OV and CNF-SAT are different from the
other core problems in FGC.
Such an indication was already observed by Carmosino et al. [CGI+ 16] who studied the nondeterminis-
tic and co-nondeterministic complexity of fine-grained problems. They formulated NSETH that asserts that
there is no O((2 − ε)n ) time nondeterministic algorithm which can verify that a given CNF formula has
no satisfying assignment. They also exhibited nondeterministic algorithms for verifying the YES and NO
solutions of Exact-Tri, APSP in truly subcubic time and 3SUM in truly subquadratic time, and concluded
that if NSETH holds, then there can be no deterministic fine-grained reduction from CNF-SAT or OV to
any of Exact-Tri, APSP or 3SUM.
Because of our efficient nondeterministic algorithms for #Exact-Tri, #Neg-Tri and #3SUM, we then
get that under NSETH, there can be no deterministic fine-grained reductions from CNF-SAT or OV to
#Exact-Tri, #APSP or #3SUM.
Recently, Akmal, Chen, Jin, Raj and Williams [ACJ+ 22] showed, among other things, that #Exact-Tri
e 2 ) time Merlin-Arthur protocol. Their protocol crucially uses polynomial identity testing, and
has an O(n
hence it is not known how to derandomize it and make it nondeterministic. Our results yield a truly subcubic
nondeterministic protocol.
10 BSG Theorems Revisited

In Section 5, we have seen how our techniques may replace some of the previous uses of the BSG The-
orem in algorithmic applications. In this section, we show how our techniques can actually prove variants
43
of the BSG Theorem itself.
We begin with a quick review of the BSG Theorem. Many different versions of the theorem can be
found in the literature, and the following is one version that is easy to state:
Theorem 10.1. (BSG Theorem) Given subsets A, B, and C of size n of an abelian group, and a parameter
s, if |{(a, b) ∈ A × B : a + b ∈ C}| ≥ n2 /s, then there exist subsets A′ ⊆ A and B ′ ⊆ B both of size
Ω(n/s), such that
|A′ + B ′ | = O(s5 n).
The earliest version of the theorem, with super-exponential factors in s, was obtained by Balog and
Szemerédi [BS94], via the regularity lemma. Gowers [Gow01] was the first to obtain a version with poly-
nomial dependency on s. The version stated above was proved by Balog [Bal07] and Sudakov, Szemerédi
and Vu [SSV05]. Although the proof is not long and does not need advanced tools, it is clever and not easy
to think of; see [TV06, Lov17, Vio11] for various different expositions.
Chan and Lewenstein [CL15] gave algorithmic applications using the following variant which we will
call the “BSG Covering Theorem” (it was called the “BSG Corollary” in their paper). Instead of extracting a
single pair of large subsets (A′ , B ′ ), the goal is to construct a cover by multiple pairs of subsets (A(i) , B (i) ):
Theorem 10.2. (BSG Covering) Given subsets A, B, and C of size n of an abelian group, and a parameter
s, there exist a collection of ℓ = O(s) subsets A(1) , . . . , A(ℓ) ⊆ A and B (1) , . . . , B (ℓ) ⊆ B, and a set R of
O(n2 /s) pairs in A × B, such that
S
(i) {(a, b) ∈ A × B : a + b ∈ C} ⊆ R ∪ λ (A(λ) × B (λ) ), and
P
(ii) |A(λ) + B (λ) | = O(s5 n) for each λ (and so λ (|A(λ) + B (λ) |) = O(s6 n)).
The BSG Covering Theorem is not implied by the BSG Theorem as stated, but the known proofs by
Balog [Bal07] and Sudakov et al. [SSV05] established an extension of the BSG Theorem that involves an
input graph, and repeated applications of this theorem indeed provide multiple pairs of subsets satisfying
the stated properties.
10.1 A New Simpler Version

We will now show how our techniques, combined with some new extra ideas, can derive a version of the
BSG Covering Theorem where the O(s6 n) bound is weakened to O(s e 2 n3/2 ). Although the new bound is
superlinear in n, the lower polynomial dependency on s actually compensates to yield improved results in
some algorithmic applications of the Covering Theorem. In particular, O(s e 2 n3/2 ) is better when s ≫ n1/8 .
A key advantage of the new proof is its simplicity: it constructs a cover directly, instead of repeatedly
extracting subsets one at a time (thus, it avoids the need to extend the BSG Theorem with an input graph,
and thereby simplifies the algorithm considerably).
We focus on the setting where each input set A is of the form {(i, ai ) : i ∈ [n]} for a sequence
of integers or reals a1 , . . . , an . We call such a set an indexed set. This case turns out to be sufficient
for applications involving integer input (because known hashing-based techniques used in reductions from
3SUM to 3SUM-Convolution (e.g. [KPP16]) can map integer sets to indexed sets—3SUM-Convolution
is just 3SUM for indexed sets). Focusing on indexed sets makes the proof more intuitive, and also makes the
construction more efficient. (In Appendix C, we present a variant of the proof for general sets in an abelian
group.)
Note that in the theorem stated below, we work with “monochromatic” difference sets of the form A−A,
instead of bichromatic sum sets A + B. This form is actually more general, since given A and B, we can
44
reset A to A ∪ (−B). The reduction does not go the other way, since knowing that |A(λ) + B (λ) | is small
does not mean |(A(λ) ∪ (−B (λ) )) − (A(λ) ∪ (−B (λ) ))| is small. (The proofs for the known O(s6 n) bound
work only for the bichromatic sum sets but not for monochromatic difference sets, whereas Gower’s earlier
proof works for monochromatic difference sets.)
Theorem 10.3. (Simpler BSG Covering) Given indexed sets A and C of size n and a parameter s, there
e 3 ) subsets A(1) , . . . , A(ℓ) ⊆ A, and a set R of O(n
exist a collection of ℓ = O(s e 2 /s) pairs in A × A, such
that
S
(i) {(a, b) ∈ A × A : a − b ∈ C} ⊆ R ∪ λ (A(λ) × A(λ) ), and
P e 2 n3/2 ).
(λ) − A(λ) | = O(s
(ii) λ |A
e 2 ) Las Vegas randomized time.
The A(λ) ’s and R can be constructed in O(n
Proof. Let A = {(i, ai ) : i ∈ [n]} and C = {(k, ck ) : k ∈ [n]}. As a preprocessing step, we sort the
e 2 ) total time. Let Wk = {i ∈ [n] : ai+k − ai = ck }.
multiset {ai+k − ai : i ∈ [n]} for each i, in O(n
• Few-witnesses case. For each k with |Wk | ≤ n/s, add {((i + k, ai+k ), (i, ai )) : i ∈ Wk } to R. The
e · n/s).
number of pairs added to R is O(n · n/s). The running time of this step is also O(n
• Many-witnesses case. Pick a random subset H ⊆ [±n] of size c0 s log n for a sufficiently large constant
c0 . Let L(h) be the multiset {ai+h − ai : i ∈ [n]}. Let F (h) be the elements of frequency more than n/r
e
in L(h) . Note that |F (h) | ≤ r. These can be computed in O(sn) time.
– Low-frequency case. For each h ∈ H and i ∈ [n], if ai+h − ai 6∈ F (h) , we examine each of the at most
n/r indices j with ai+h − ai = aj+h − aj and add ((j, aj ), (i, ai )) to R. The number of pairs added
e
to R is O(sn e 2 /s) by choosing r := s2 . The running time of this step is also bounded by
· n/r) = O(n
e 2 /s).
O(n
– High-frequency case. For each h ∈ H and f ∈ F (h) , add the following subset to the collection:
A(h,f ) = {(i, ai ) ∈ A : ai+h − ai = f }.
e · r) = O(s
The number of subsets is O(s e 3 ). The running time of this step is
 
X
O e
|A(h,f ) | = O(sn).
h∈H, f ∈F (h)
Correctness. To verify (i), consider a pair ((i + k, ai+k ), (i, ai )) ∈ A × A with ai+k − ai = ck . If
|Wk | ≤ n/s, then ((i + k, ai+k ), (i, ai )) ∈ R due to the “few-witnesses” case. So assume |Wk | > n/s.
Then H hits Wk − i w.h.p., so there exists h ∈ H with i + h ∈ Wk , i.e., ai+h+k − ai+h = ck . Since
ai+k − ai = ck , we have ai+h+k − ai+k = ai+h − ai (by Fredman’s trick). Let f = ai+h − ai . If f 6∈ F (h) ,
then ((i + k, ai+k ), (i, ai )) ∈ R due to the “low-frequency” case. If f ∈ F (h) , then ((i + k, ai+k ), (i, ai )) ∈
A(h,f ) × A(h,f ) due to the “high-frequency” case.
Thus far, the proof ideas are similar to what we have seen before. However, to verify (ii),
P we will propose
a new probabilistic argument. Consider a fixed h. We want to bound the sum S (h) := f ∈F (h) |A(h,f ) −
A(h,f ) |. This is equivalent to bounding the number of triples (k, c, f ) such that f ∈ F (h) and ∃i with
ai+k − ai = c and ai+h − ai = ai+k+h − ai+k = f . Note that we also have ai+k+h − ai+h = c (by
Fredman’s trick). Let Yk,c = {i : ai+k − ai = c} and q be a parameter. Then S (h) is upper-bounded by the
number of triples (k, c, f ) satisfying
45
1. |Yk,c| > q and f ∈ F (h) , or
2. |Yk,c| ≤ q and ∃i with i ∈ Yk,c and i + h ∈ Yk,c and f = ai+h − ai .

P e 2 n2 /q). On the other
Since k,c |Yk,c| = O(n2 ), the number of triples of type 1 is O((n2 /q) · r) = O(s
hand, the expected number of triples of type 2 for a random h ∈ [±n] is
 
X
O |Yk,c | · |Yk,c |/n = O(n2 · q/n) = O(nq).
k,c: |Yk,c |≤q
e 3/2 ) by choosing q = s√n.

e 2 n2 /q + nq) = O(sn
Hence, E[S (h) ] = O(s P P
e 2 n3/2 ). Thus, the probability that the
The expected total sum h∈H f ∈F (h) |A(h,f ) − A(h,f ) | is O(s
desired bounds are not met is less than an arbitrarily small constant (by Markov’s inequality and a union
bound).
Note that the algorithm can be converted to Las Vegas, because we can verify correctness of the con-
struction by examining all O(n2 ) pairs and computing all difference sets A(h,f ) − A(h,f ) using known
e 2 + s2 n3/2 ) = O(n
output-sensitive algorithms [CH02, CL15, BFN22] in total time O(n e 2 ) (we may assume
1/4
s < n , for otherwise the theorem is trivial).
10.2 Application: Improved 3SUM in Preprocessed Universes

As one immediate application, we can solve 3SUM with preprocessed universe, improving Chan and
Lewenstein’s previous solution which required O(ne 13/7 ) query time [CL15], and also improving Corol-
lary 5.3 regardless of the value of ω: the query algorithm does not use fast matrix multiplication but uses
FFT instead, though randomization is now needed in the preprocessing algorithm.
e 2 ) Las Vegas randomized time,
Corollary 10.4. We can preprocess sets A, B, and C of n integers in O(n
so that given any subsets A ⊆ A, B ⊆ B, and C ⊆ C, we can solve All-Nums-3SUM on (A′ , B ′ , C ′ ) in
′ ′ ′
e 11/6 ) time.
O(n
Proof. Kopelowitz, Pettie and Porat [KPP16] gave a simple randomized reduction from 3SUM to O(log n)
instances of 3SUM-Convolution via hashing. The same approach works in the preprocessed universe set-
ting, and transform the input into O(log n) instances where A, B, and C are indexed sets.
During preprocessing, we apply Theorem 10.3 to A ∪ (−B), producing subsets A(λ) and a set R of
pairs.
During a query with given subsets A′ ⊆ A, B ′ ⊆ B, and C ′ ⊆ C, we first examine each pair (a, −b) ∈
e 2 /s) time.
R and check whether a ∈ A′ , b ∈ B ′ , and a + b ∈ C ′ . This takes O(n
Next, for each λ, we compute (A(λ) ∩ A′ ) + ((−A(λ) ) ∩ B ′ ) by known FFT-based algorithms [CH02,
CL15, BFN22]; the running time is near-linear in the output size, which is bounded by |A(λ) − A(λ) |. For
each output value c, we check whether c ∈ C ′ .
e 2 /s + s2 n3/2 ). Choosing s = n1/6 yields the theorem.
The total query time is O(n
We remark that the same O(n e 11/6 ) bound holds for a slightly more general case when C ′ is an arbitrary
set of n integers, i.e., the preprocessing does not need the set C. This is because in the proof of Theorem 10.3,
e 2 )-time preprocessing step is independent of C, and the rest of the construction takes O(n
the O(n e 2 /s + sn)
time. In contrast, Chan and Lewenstein’s paper obtained a weaker O(n e 19/10 ) bound for the same case
without C.
46
10.3 Reinterpreting Gower’s Version
In this section, we show that Gower’s proof [Gow01] (see also a related recent proof by Schoen [Sch15])
can also be modified to construct a cover directly.8 Gower’s proof requires more clever arguments; our new
presentation highlights the similarities and differences with the proof of Theorem 10.3. This variant of the
proof will be needed in a later application in Section 11 to conditional lower bounds—so ideas from additive
combinatorics will be useful after all!
In the following, we let popA (x) (the popularity of x) denote the number of pairs (a, b) ∈ A × A with
x = a − b; in other words, popA (x) = |{a ∈ A : a − x ∈ A}|.
e 3)
Theorem 10.5. Given indexed sets A and C of size n and a parameter s, there exist a collection of ℓ = O(s
(1) (ℓ) e /s) pairs in A × A, such that
subsets A , . . . , A ⊆ A, and a set R of O(n 2
S
e 6 n) for each λ (and so P |A(λ) − A(λ) | = O(s
(ii) |A(λ) − A(λ) | = O(s e 9 n)).
λ
e 2 ) Las Vegas randomized time.

Proof. We follow the proof in Theorem 10.3 but modify the handling of the “high-frequency” case. Let t
be a parameter. Define
G(h,f ) = {(a, b) ∈ A(h,f ) × A(h,f ) : popA (a − b) ≤ n/t}

Z (h,f ) = {a ∈ A(h,f ) : degG(h,f ) (a) > |A(h,f ) |/4}.
Here, degG(h,f ) (a) refers to the degree of a in G(h,f ) when viewed as a graph. For each h ∈ H and f ∈ F (h) ,
add Z (h,f ) × A(h,f ) and A(h,f ) × Z (h,f ) to R. For each h ∈ H and f ∈ F (h) , instead of adding the subset
A(h,f ) , we add the subset Ae(h,f ) := A(h,f ) \ Z (h,f ) to the collection.
Correctness. To analyze this modified construction, first observe that every pair previously covered by
A(h,f ) × A(h,f ) is now covered by A e(h,f ) × A e(h,f ) or by the extra pairs added to R (i.e., Z (h,f ) × A(h,f ) or
A(h,f ) × Z (h,f ) ).
We(h,fbound the expected number of extra pairs Padded to R. Consider a fixed h. Note that |Z (h,f ) | ≤
) P P
O |A|G(h,f ) |/4
|
, and so f |Z (h,f ) ||A(h,f ) | = O f |G
(h,f ) | . The sum
f |G
(h,f ) | is bounded by the
number of triples (i, j, f ) such that ai+h − ai = aj+h − aj = f and popA ((i − j, ai − aj )) ≤ n/t. This is
bounded by the number of pairs (i, j) such that popA ((i − j, ai − aj )) ≤ n/t and ai − aj = ai+h − aj+h
(by Fredman’s trick). For a fixed (i, j) with popA ((i − j, ai − aj )) ≤ n/t, the number of h’s with ai − aj =
ai+h − aj+h is at most n/t, and so the probability that ai − aj = ai+h − aj+h for a random h ∈ [±n] is
O(1/t). It follows that  
X
Eh  |Z (h,f ) ||A(h,f ) | = O(n2 · 1/t).
f
e · n2 /t) = O(n
Consequently, the expected number of extra pairs added to R is O(s e 2 /s) by setting t := s2 .
8
There are multiple different exposition of Gower’s and subsequent proofs of the BSG Theorem in the literature. For example,
some presentations [Bal07, SSV05, TV06, Vio11] cleanly separate the algebraic from the combinatorial components, by reducing
the problem to some combinatorial lemma about graphs (counting paths of length 2 or 3 or 4). But these versions of the proof do not
achieve our goal of computing a cover directly and efficiently. Our reinterpretation is nontrivial and requires examining Gower’s
proof from the right perspective.
47
Finally, we consider a fixed h and fixed f ∈ F (h) and provide an upper bound on |A e(h,f ) − A
e(h,f ) |.
For each c ∈ A e(h,f ) − A e(h,f ) , pick a lexicographically smallest (a, b) ∈ A e(h,f ) × A
e(h,f ) with c = a − b.
Consider all y ∈ A (h,f ) with (a, y), (b, y) 6∈ G (h,f ) ; the number of such y’s is at least |A(h,f ) | − |A(h,f ) |/4 −
|A (h,f ) |/4 = Ω(|A (h,f ) |) = Ω(n/r) (since |A (h,f ) | > n/r for f ∈ F (h) ). For each such y, examine each
(a′ , a′′ ) ∈ A × A with a − y = a′ − a′′ and each (b′ , b′′ ) ∈ A × A with b − y = b′ − b′′ , and mark the
quadruple (a′ , a′′ , b′ , b′′ ). Since (a, y), (b, y) 6∈ G(h,f ) , there are at least n/t choices of (a′ , a′′ ) and at least
n/t choices of (b′ , b′′ ) for each such y. Letting Q be the number of quadruples marked, we obtain

Q = Ω |A e(h,f ) − A e(h,f ) | · (n/r) · (n/t)2 .
On the other hand, each quadruple (a′ , a′′ , b′ , b′′ ), is marked once, since it uniquely determines the element
c = (a′ − a′′ ) − (b′ − b′′ ), from which (a, b) is uniquely determined and y = a − (a′ − a′′ ) is uniquely
determined. Thus, Q = O(n4 ). We conclude that

e(h,f ) e(h,f ) n4 e 6 n).
|A −A | = O = O(rt2 n) = O(s
(n/r) · (n/t)2
Construction time. One could naively construct the sets Z (h,f ) and A e(h,f ) in O(P |A(h,f ) |2 ) time,
h,f
but a faster way is to use random sampling. Given any value c, we can approximate popA (c) with additive
error δn/t w.h.p. by taking a random subset A′ ⊆ A of O((1/δ2 )t log n) = O(t) e elements and computing
|{a ∈ A : a − c ∈ A}| · |A|/|A | (by a standard Chernoff bound). We will not construct G(h,f ) explicitly.
′ ′
Instead, given any (a, b), we can test for membership in G(h,f ) in O(t) e time. Furthermore, given a, we
can approximate degG(h,f ) (a) with additive error δ|A (h,f ) | w.h.p. by taking a random subset A′′ ⊆ A(h,f )
2
of O((1/δ ) log n) = O(1)e elements and computing |{y ∈ A′′ : (a, y) ∈ G(h,f ) }| · |A(h,f ) |/|A′′ |. This
way, Z (h,f ) (and e(h,f ) ) can be generated in O(t|A
A e (h,f ) |) time for each h and f . The total time
e P thus(h,f ) e e 3
bound is O(t h,f |A |) = O(tsn) = O(s n), which is dominated by other costs (we may assume that
1/6
s < n , for otherwise the theorem is trivial). Due to these approximations, our earlier analysis needs small
adjustments in the constant factors, but is otherwise the same.
As before, the algorithm can be converted to Las Vegas in O(n e 2 ) additional time.
One important advantage of the above proof is that the running time is actually subquadratic, excluding
e 2 )-time preprocessing step, which is needed only in the “few-witnesses” case (ignoring the conver-
the O(n
sion to Las Vegas). In particular, we immediately obtain subquadratic running time for the following variant
of the theorem, which requires only the “many-witnesses” case (where we reset r and t to sŝ instead of s2 ).
This variant will be useful later.
Theorem 10.6. Given an indexed set A of size n and parameters s and ŝ, there exist a collection of ℓ =
e 2 ŝ) subsets A(1) , . . . , A(ℓ) ⊆ A, and a set R of O(n
O(s e 2 /ŝ) pairs in A × A, such that
S
(i) {(a, b) ∈ A × A : popA (a − b) > n/s} ⊆ R ∪ λ (A(λ) × A(λ) ), and
e 3 ŝ3 n) for each λ (and so P |A(λ) − A(λ) | = O(s
(ii) |A(λ) − A(λ) | = O(s e 5 ŝ4 n)).
λ
e 2 /ŝ + s2 ŝn) Monte Carlo randomized time.

Although we are able to reinterpret Gower’s proof, we are unable to modify the proof by Balog [Bal07]
or Sudakov et al. [SSV05] to achieve similar subquadratic construction time.
48
11 Lower Bounds for Min-Equality Convolution
In this section, we prove conditional lower bounds for the Min-Equal-Convolution problem under
Strong APSP Hypothesis, the u-dir-APSP Hypothesis, and the Strong Min-Plus Convolution Hypothesis.
The lower bound under the Strong APSP Hypothesis or u-dir-APSP Hypothesis follows just by combining
Corollary 3.5 with the known reduction from u-dir-APSP to Min-Witness-Eq-Prod [CVX21] (and notic-
ing that Min-Witness-Eq-Prod is easier than Min-Equal-Prod), and then using known ideas for reducing
matrix product problems to convolution problems (more specifically, the unpublished reduction from BMM
to pattern-to-text Hamming distances, attributed to Indyk – see e.g. [GU18]). The lower bound under the
Strong Min-Plus Convolution Hypothesis is more delicate: interestingly, we will combine ideas that we
have developed for conditional lower bounds for intermediate matrix product problems, with one of our new
versions of the BSG Theorem from the previous section.
Theorem 11.1. Under the Strong APSP Hypothesis, Min-Equal-Convolution for length n arrays requires
n1+1/6−o(1) time.
Proof. First, by Corollary 3.5, u-dir-APSP requires n7/3−o(1) time under the Strong Integer-APSP Hy-
pothesis. Zwick’s algorithm [Zwi02] can be seen as a reduction from u-dir-APSP to O(1) e instances of
α α
Min-Plus-Product between n × n matrices and n × n matrices with weights in [n 1−α ] for various values
of α ∈ [0, 1]. As shown in [CVX21], each instance of Min-Plus-Product in this form can be reduced to an
instance of Min-Witness-Eq-Prod for O(n) × O(n) matrices (they only stated the reduction for α = ρ for
a particular value of ρ, but their proof works for any α ∈ [0, 1]). Therefore, Min-Witness-Eq-Prod requires
n7/3−o(1) time under the Strong Integer-APSP Hypothesis.
We can easily reduce a Min-Witness-Eq-Prod instance to a Min-Equal-Prod instance. Suppose the
input of a Min-Witness-Eq-Prod instance is A, B. Assume all entries are in [2n] without loss of generality.
We can create two n × n matrices A′ and B ′ , where A′ik = Aik + 2nk and Bkj ′ = B + 2nk, then the
kj
Min-Witness Equality product between A and B can be computed in O(n e 2 ) time given the Min-Equality
product between A′ and B ′ .
Finally, we reduce Min-Equal-Prod to Min-Equal-Convolution, following the strategy of the unpub-
lished reduction from Boolean matrix multiplication to pattern-to-text Hamming distances, attributed to
Indyk, see e.g. [GU18].
Let A and B be the inputs of Min-Equal-Prod. W.l.o.g., we can assume all entries of A and B are
integers in [2n2 ]. We first create two length 2n2 arrays a and b, where initially all entries of a and b are ∞.
For every (i, k) ∈ [n] × [n], we set a(n+1)(i−1)+k to nAik + k − 1; for every (k, j) ∈ [n] × [n], we set bjn−k
to nBkj + k − 1.
Suppose the Min-Equality product between A and B is C and the Min-Equality convolution between a
and b is c, we will show that Cij = ⌊c(n+1)(i−1)+jn /n⌋ (and Cij = ∞ if c(n+1)(i−1)+jn is ∞), which will
complete the reduction.
To show the equality, first notice that

min a(n+1)(i−1)+k : k ∈ [n] ∧ a(n+1)(i−1)+k = bjn−k
(2)
= min {nAik + k − 1 : k ∈ [n] ∧ Aik = Bkj }
contributes to the minimization of c(n+1)(i−1)+jn . Also, no other terms less than ∞ can contribute: suppose
there exists some x such that ax = by < ∞ and x + y = (n + 1)(i − 1) + jn, then ax mod n must match
by mod n. Thus, there must exist i′ , j ′ , k′ ∈ [n] such that x = (n + 1)(i′ − 1) + k′ and y = j ′ n − k′ ,
49
so (n + 1)(i′ − 1) + j ′ n = (n + 1)(i − 1) + jn. Then it must be the case that i = i′ and j = j ′ , so ax
corresponds to one of the terms in Equation (2).
The reduction in the proof of Theorem 11.1 from u-dir-APSP to Min-Equal-Convolution also easily
imply the following:
Theorem 11.2. Under the u-dir-APSP Hypothesis, Min-Equal-Convolution for length n arrays requires
n1+ρ/2−o(1) time, where ρ is the constant satisfying ω(1, ρ, 1) = 1 + 2ρ, or n1.25−o(1) time if ω = 2.
We finally show the lower bound of Min-Equal-Convolution under the Strong Min-Plus Convolution
Hypothesis.
Theorem 11.3. Under the Strong Min-Plus Convolution Hypothesis, Min-Equal-Convolution for length n
arrays requires n1+1/11−o(1) time.
Proof. We will show that if Min-Equal-Convolution for length n arrays has an O(ne 1+δ ) time algorithm for
some δ ≥ 0, then Min-Plus-Convolution for length n arrays of entries that are bounded by O(n) has an
e 2− 1−11δ
O(n 21 ) time randomized algorithm.
Let A and B be the input arrays of a Min-Plus-Convolution instance. Let t be a parameter to be fixed
later, and let g = ⌈n/t⌉. Similar to the proof of Theorem 3.2, we can assume (Ai mod g) < g/2 and
(Bi mod g) < g/2 for each i ∈ [n]. Also, for each i, we can write Ai as A′i g + A′′i , for 0 ≤ A′i ≤ t and
0 ≤ A′′i < g/2. Similarly, we can write Bi as Bi′ g + Bi′′ .
e
We first compute the Min-Plus convolution C ′ of A′ and B ′ in O(tn) time. Let Wk = {i ∈ [n] : k − i ∈
[n] ∧ Ck = Ai + Bk−i }. Suppose we can compute C , which is defined as Ck′′ = mini∈Wk (A′′i + Bk−i
′ ′ ′ ′′ ′′ ),
then we can compute the Min-Plus convolution C of A and B as Ck = Ck′ g + Ck′′ .

We then compute Ck′′ by two methods depending on whether |Wk | is greater than n/s or not, for some
parameter s to be determined.
e 2 /ŝ+s5 ŝ4 n2 /t)
Claim 11.4. For any parameter ŝ, we can compute Ck′′ for every k where |Wk | > n/s in O(n
time.
Proof. We create the following indexed set A of size O(n):

(i, A′i ) : i ∈ [n] ∪ {(i + n, ∞) : i ∈ [n]} ∪ (3n + 1 − i, −Bi′ ) : i ∈ [n] .
Note that for any k ∈ [n], |Wk | = popA ((k − 3n − 1, Ck′ )). Then we apply Theorem 10.6 with the index
e 2 ŝ) subsets A(1) , . . . , A(ℓ) , and a set R of O(n
set A and parameters s, ŝ to find a collection of ℓ = O(s e 2 /ŝ)
e 2 /ŝ + s2 ŝn) randomized time. Furthermore, Theorem 10.6 guarantees that
pairs in A × A in O(n
S
(i) {(a, b) ∈ A × A : popA (a − b) > n/s} ⊆ R ∪ λ (A(λ) × A(λ) ). This further means that, for every k
where |Wk | > n/s,
[
(i, A′i ), (3n + 1 − (k − i), −Bk−i
′
) : i ∈ Wk ⊆ R ∪ (A(λ) × A(λ) ).
λ
P (λ) e 5 ŝ4 n).

− A(λ) | = O(s
(ii) λ |A
50
Then we first enumerate ((i1 , v1 ), (i2 , v2 )) ∈ R. If this pair corresponds to some A′i and Bj′ (i.e., this pair has
i1 = i, i2 = 3n + 1 − j), we use A′′i + Bj′′ to update Ci+j e
′′ if A′ + B ′ = C ′ . This takes O(|R|) e 2 /ŝ)
= O(n
i j i+j
time.
For each λ ∈ [ℓ], we consider the possible witnesses in A(λ) × A(λ) . We prepare a map f from A(λ)
to [g] ∪ {∞} as follows: if a ∈ A(λ) corresponds to some A′i (i.e., a = (i, A′i )), we set f (a) = A′′i ; if a
corresponds to some Bj′ (i.e., a = (3n + 1 − j, −Bj′ )), we set f (a) = Bj′′ ; otherwise, we set f (a) = ∞.
Then we compute the following Min-Plus “convolution” C (λ) :
Cc(λ) = min (f (a) + f (b)).

(a,b)∈A(λ) ×A(λ)
a−b=c
By known techniques for solving Min-Plus convolution with small integer weights [AGM97], we can instead
e
solve a normal convolution with weights bounded by 2O(g) . More specifically, let h(a) = M f (a) if f (a) 6=
∞ and f (a) = 0 otherwise, where M = |A(λ) | + 1. Then it suffices to compute the following for every c:
X
h(a) · h(b).
(a,b)∈A(λ) ×A(λ)
a−b=c
By known output-sensitive algorithms [CH02, CL15, BFN22], it takes O(|A e (λ) − A(λ) |) arithmetic oper-
ations to compute the above convolution, and each arithmetic operation takes O(g) e time. Thus, it takes
e (λ) (λ)
O(g|A − A |) time to compute C . (λ)
(λ)
After we compute C (λ) for every λ, we use the value of C(k−3n−1,C ′ ) to update Ck′′ for every k, λ.
k
e 2 /ŝ + s2 ŝn + g P |A(λ) − A(λ) |) = O(n
Overall, the running time is O(n e 2 /ŝ + s5 ŝ4 ng) = O(n
e 2 /ŝ +
i
5 4 2
s ŝ n /t).
Next we show the following algorithm for the rest values of k where |Wk | is small. Recall that we
e 1+δ ) time algorithm.
assumed Min-Equal-Convolution for length n arrays has an O(n
Claim 11.5. We can compute Ck′′ for every k where |Wk | ≤ n/s in O(e n · n1+δ ) time as long as t√s =
√ s
O( n).
Proof. Let I√ ⊆ [n] be a random subset of indices for which each index is kept in I independently with
probability √ns . Similarly let J ⊆ [n] be such a random subset as well. With high probability, |I|, |J| =
√
O( sn).
In the sparse Min-Plus convolution between the two sparse arrays A′I and BJ′ , we need to compute a
length n array D where Dk = mini∈I,k−i∈J (A′i + Bj′ ).
For every k ∈ [n] and i ∈ Wk , the probability that i ∈ I and k − i ∈ J is ns . Thus, for any particular
i ∈ Wk , the probability that i is the unique witness for Dk in the sparse Min-Plus convolution between A′I
e n)
and BJ′ is ns · (1 − ns )|Wk |−1 , which is Θ( ns ) if |Wk | ≤ n/s. Thus, if we keep sampling I and J for O( s
times, all indices in Wk will be the unique witness for Dk in at least one time with high probability, as in
standard sampling techniques (see e.g. [AGMN92, Sei95]).
Suppose i is indeed the unique witness for Dk , then we can find i by repeatedly computing some in-
stances of sparse Min-Plus convolutions. More specifically, in the p-th round, let I (p) be I but only keeping
the indices whose p-th bit in the binary representation is 1. Say the sparse Min-Plus convolution between
(p)
A′I (p) and BJ′ is D (p) . Then if Dk = Dk , then we know the p-th bit of i is 1, and otherwise it is 0. Thus,
we can recover i after O(log n) rounds. After we have i, we can use A′′i + Bk−i ′′ to update Ck′′ .
51
Therefore, it remains to show how to compute the sparse Min-Plus convolution between A′I and BJ′ for
√
|I|, |J| = O( sn).
Let F : [t] × [t] → [n] be a random function (independent to the choice of I, J). We then create two
arrays X, Y each of length O(n) as follows. Initially, all entries in X and Y are set to some distinct values
out side of [t] × [t]. For every i ∈ I and y ∈ [t], we set Xi+F (A′i ,y) to (A′i , y); for every j ∈ J and x ∈ [t],
we set Yj+n−F (x,Bj′ ) to (x, Bj′ ). Suppose all entries are only set at most once. Then we compute the Min-
Equality convolution Z between X and Y , where we compare two pairs (x, y) and (x′ , y ′ ) by comparing
x + y and x′ + y ′ and breaking ties arbitrarily. Then the sum of the two integers in the pair Zk+n equals
min (A′i + y) = min (A′i + Bj′ ).

(i,j,x,y)∈I×J×[t]×[t] (i,j)∈I×J
i+F (A′i ,y)+j+n−F (x,Bj′ )=k+n i+j=k
(A′i ,y)=(x,Bj′ )
Thus, computing a Min-Equal-Convolution instance gives the result of sparse Min-Plus convolution be-
tween A′I and BJ′ .
We then remove the assumption that each entry of X and Y is only set once by standard techniques.
Consider a fixed entry Xq . For every (x, y) ∈ t, we set Xq to (x, y) if and only if q − F (x, y) ∈ I and
A′q−F (x,y) = x. Since F (x, y) is sampled from [n] uniformly at random, the probability that we set Xq
|{i∈I:A′i =x}|
to (x, y) is at most n . Summing over all x, y, the expected number of times that Xq is set is
√ √ √
|I|t t√ s
O( n ) = O( n ) = O(1) since t s = O( n). Since the values of F (x, y) are independent for different
(x, y), by Chernoff bound, we conclude that Xq is set only O(log n) times with high probability. Similarly,
all indices in Y are set only O(log n) times with high probability. Thus, we can create O(log n) arrays
(a)
X (a) , where Xp equals the value of Xp when we attempt to set it the a-th time. We can similarly create
O(log n) arrays Y (b) . Then it suffices to compute the Min-Equal-Convolution between X (a) and Y (b) for
every (a, b) ∈ [O(log n)]2 .
Overall, the running time is O(e n · n1+δ ) as long as t√s = O(√n), assuming Min-Equal-Convolution
s
e 1+δ ) time algorithm.
for length n arrays has an O(n
e 2 /ŝ + s5 ŝ4 n2 /t + n · n1+δ ) time
By Claim 11.4 and Claim 11.5, we can compute C ′′ and thus C in O(n s
√ √ 10−5δ 1+10δ 1−11δ
e 2− 1−11δ
as long as t s = O( n). We can set t = n 21 , s = n 21 and ŝ = n 21 to get the O(n 21 )
randomized running time.
Note that in order for the proof for Theorem 11.3 to work, the more difficult BSG covering of Theo-
rem 10.6 that builds on Gower’s proof [Gow01] is necessary. If we instead use the simpler BSG covering of
Theorem 10.3, we will have an sO(1) n2.5 /t term from Claim 11.4, which cannot give subquadratic running
√ √
time considering the t s = O( n) requirement in Claim 11.5.
12 Acknowledgement
We would like to thank Lijie Chen for suggesting the quantum #3SUM problem.
References
[Abr87] Karl Abrahamson. Generalized string matching. SIAM J. Comput., 16(6):1039–1051, 1987.
5
52
[ACJ+ 22] Shyan Akmal, Lijie Chen, Ce Jin, Malvika Raj, and Ryan Williams. Improved Merlin-Arthur
protocols for central problems in fine-grained complexity. In Proc. 13th Innovations in Theo-
retical Computer Science Conference (ITCS), pages 3:1–3:25, 2022. 43
[ACLL14] Amihood Amir, Timothy M. Chan, Moshe Lewenstein, and Noa Lewenstein. On hardness
of jumbled indexing. In Proc. 41st International Colloquium on Automata, Languages, and
Programming (ICALP), pages 114–125, 2014. 6, 11
[AD16] Amir Abboud and Søren Dahlgaard. Popular conjectures as a barrier for dynamic planar graph
algorithms. In Proc. 57th Annual IEEE Symposium on Foundations of Computer Science
(FOCS), pages 477–486, 2016. 7, 26
[AFW20] Amir Abboud, Shon Feller, and Oren Weimann. On the fine-grained complexity of parity
problems. In Proc. 47th International Colloquium on Automata, Languages, and Program-
ming (ICALP), pages 5:1–5:19, 2020. 9, 34, 36
[AGM97] Noga Alon, Zvi Galil, and Oded Margalit. On the exponent of the all pairs shortest path
problem. J. Comput. Syst. Sci., 54(2):255–262, 1997. Preliminary version in FOCS 1991. 3,
6, 7, 36, 51
[AGMN92] Noga Alon, Zvi Galil, Oded Margalit, and Moni Naor. Witnesses for Boolean matrix multi-
plication and for shortest paths. In Proc. 33rd IEEE Symposium on Foundations of Computer
Science (FOCS), pages 417–426, 1992. 15, 16, 51
[AL20] Andris Ambainis and Nikita Larka. Quantum algorithms for computational geometry prob-
lems. In Proc. 15th Conference on the Theory of Quantum Computation, Communication and
Cryptography (TQC), pages 9:1–9:10, 2020. 9, 43
[AR18] Udit Agarwal and Vijaya Ramachandran. Fine-grained complexity for sparse graphs. In Proc.
50th Annual ACM Symposium on Theory of Computing (STOC), pages 239–252, 2018. 6
[AV14] Amir Abboud and Virginia Vassilevska Williams. Popular conjectures imply strong lower
bounds for dynamic problems. In Proc. 55th IEEE Symposium on Foundations of Computer
Science (FOCS), pages 434–443, 2014. 7
[AV21] Josh Alman and Virginia Vassilevska Williams. A refined laser method and faster matrix
multiplication. In Proc. 32nd ACM-SIAM Symposium on Discrete Algorithms (SODA), pages
522–539, 2021. 1
[AVY15] Amir Abboud, Virginia Vassilevska Williams, and Huacheng Yu. Matching triangles and bas-
ing hardness on an extremely popular conjecture. In Proc. 47th Annual ACM on Symposium
on Theory of Computing (STOC), pages 41–50, 2015. 23
[Bal07] Antal Balog. Many additive quadruples. In Additive Combinatorics, volume 43 of CRM Proc.
Lecture Notes, pages 39–49. Amer. Math. Soc., 2007. 44, 47, 48
[BBB19] Enric Boix-Adserà, Matthew S. Brennan, and Guy Bresler. The average-case complexity of
counting cliques in Erdős-Rényi hypergraphs. In Proc. 60th IEEE Annual Symposium on
Foundations of Computer Science (FOCS), pages 1256–1280, 2019. 34
53
[BCD+ 14] David Bremner, Timothy M. Chan, Erik D. Demaine, Jeff Erickson, Ferran Hurtado, John
Iacono, Stefan Langerman, Mihai Pǎtraşcu, and Perouz Taslakian. Necklaces, convolutions,
and X + Y . Algorithmica, 69(2):294–314, 2014. Preliminary version in ESA 2006. 7, 31
[BDP08] Ilya Baran, Erik D. Demaine, and Mihai Patrascu. Subquadratic algorithms for 3SUM. Algo-
rithmica, 50(4):584–596, 2008. Preliminary version in WADS 2005. 34
[BFN22] Karl Bringmann, Nick Fischer, and Vasileios Nakos. Deterministic and Las Vegas algorithms
for sparse nonnegative convolution. In Proc. 33rd Annual ACM-SIAM Symposium on Discrete
Algorithms (SODA), pages 3069–3090, 2022. 46, 51
[BGSV16] Karl Bringmann, Fabrizio Grandoni, Barna Saha, and Virginia Vassilevska Williams. Truly
sub-cubic algorithms for language edit distance and RNA-folding via fast bounded-difference
min-plus product. In Proc. 57th IEEE Annual Symposium on Foundations of Computer Sci-
ence (FOCS), pages 375–384, 2016. 2, 4, 21, 63
[BIS17] Arturs Backurs, Piotr Indyk, and Ludwig Schmidt. Better approximations for tree sparsity
in nearly-linear time. In Proc. 28th Annual ACM-SIAM Symposium on Discrete Algorithms
(SODA), pages 2215–2229, 2017. 2
[BKS18] Markus Bläser, Balagopal Komarath, and Karteek Sreenivasaiah. Graph pattern polynomi-
als. In Proc. 38th IARCS Annual Conference on Foundations of Software Technology and
Theoretical Computer Science (FSTTCS), pages 18:1–18:13, 2018. 2
[BRSV17] Marshall Ball, Alon Rosen, Manuel Sabin, and Prashant Nalini Vasudevan. Average-case fine-
grained hardness. In Proc. 49th Annual ACM Symposium on Theory of Computing (STOC),
pages 483–496, 2017. 34
[BRSV18] Marshall Ball, Alon Rosen, Manuel Sabin, and Prashant Nalini Vasudevan. Proofs of work
from worst-case assumptions. In Proc. 38th Annual International Cryptology Conference
(CRYPTO), pages 789–819, 2018. 34
[BS94] Antal Balog and Endre Szemerédi. A statistical theorem of set addition. Combinatorica,
14:263–268, 1994. 44
[BW12] Nikhil Bansal and Ryan Williams. Regularity lemmas and combinatorial algorithms. Theory
Comput., 8(1):69–94, 2012. 10
[CDL+ 14] Timothy M. Chan, Stephane Durocher, Kasper Green Larsen, Jason Morrison, and Bryan T.
Wilkinson. Linear-space data structures for range mode query in arrays. Theory Comput.
Syst., 55(4):719–741, 2014. 7, 24, 25
[CDM17] Radu Curticapean, Holger Dell, and Dániel Marx. Homomorphisms are a good basis for
counting small subgraphs. In Proc. 49th Annual ACM SIGACT Symposium on Theory of
Computing (STOC), pages 210–223, 2017. 2
[CDSW15] Timothy M. Chan, Stephane Durocher, Matthew Skala, and Bryan T. Wilkinson. Linear-space
data structures for range minority query in arrays. Algorithmica, 72(4):901–913, 2015. 25
54
[CDX22] Shucheng Chi, Ran Duan, and Tianle Xie. Faster algorithms for bounded-difference min-plus
product. In Proc. 33rd Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages
1435–1447, 2022. 2, 4, 21, 63
[CDXZ22] Shucheng Chi, Ran Duan, Tianle Xie, and Tianyi Zhang. Faster min-plus product for mono-
tone instances. In Proc. 54th Annual ACM Symposium on Theory of Computing (STOC), pages
1529–1542, 2022. 2, 11, 21, 63
[CGI+ 16] Marco L. Carmosino, Jiawei Gao, Russell Impagliazzo, Ivan Mihajlin, Ramamohan Paturi,
and Stefan Schneider. Nondeterministic extensions of the strong exponential time hypothesis
and consequences for non-reducibility. In Proc. 2016 ACM Conference on Innovations in
Theoretical Computer Science (ITCS), pages 261–270, 2016. 35, 43
[CH02] Richard Cole and Ramesh Hariharan. Verifying candidate matches in sparse and wildcard
matching. In Proc. 34th Annual ACM Symposium on Theory of Computing (STOC), pages
592–601, 2002. 46, 51
[CH20] Timothy M. Chan and Qizheng He. Reducing 3SUM to convolution-3SUM. In Proc. SIAM
Symposium on Simplicity in Algorithms (SOSA), pages 1–7, 2020. 20, 34
[Cha10] Timothy M. Chan. More algorithms for all-pairs shortest paths in weighted graphs. SIAM J.
Comput., 39(5):2075–2089, 2010. 4
[Cha20] Timothy M. Chan. More logarithmic-factor speedups for 3SUM, (median,+)-convolution,

and some geometric 3SUM-hard problems. ACM Trans. Algorithms, 16(1):7:1–7:23, 2020.
Preliminary version in SODA 2018. 4, 41
[CIP10] Chris Calabro, Russell Impagliazzo, and Ramamohan Paturi. On the exact complexity of
evaluating quantified k-CNF. In Proc. 5th International Symposium on Parameterized and
Exact Computation (IPEC), pages 50–59, 2010. 43
[CKL07] Artur Czumaj, Miroslaw Kowaluk, and Andrzej Lingas. Faster algorithms for finding lowest
common ancestors in directed acyclic graphs. Theor. Comput. Sci., 380(1-2):37–46, 2007. 1,
5
[CL09] Artur Czumaj and Andrzej Lingas. Finding a heaviest vertex-weighted triangle is not harder
than matrix multiplication. SIAM J. Comput., 39(2):431–444, 2009. 1
[CL15] Timothy M. Chan and Moshe Lewenstein. Clustered integer 3SUM via additive combina-
torics. In Proc. 47th Annual ACM Symposium on Theory of Computing (STOC), pages 31–40,
2015. 2, 10, 11, 18, 20, 21, 44, 46, 51, 65
[CM16] Radu Curticapean and Dániel Marx. Tight conditional lower bounds for counting perfect
matchings on graphs of bounded treewidth, cliquewidth, and genus. In Proc. 27th Annual
ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1650–1669, 2016. 43
[CMWW19] Marek Cygan, Marcin Mucha, Karol W˛egrzycki, and Michał Włodarczyk. On problems equiv-
alent to (min,+)-convolution. ACM Trans. Algorithms, 15(1):1–25, 2019. 2
55
[CVX21] Timothy M. Chan, Virginia Vassilevska Williams, and Yinzhan Xu. Algorithms, Reductions
and Equivalences for Small Weight Variants of All-Pairs Shortest Paths. In Proc. 48th Interna-
tional Colloquium on Automata, Languages, and Programming (ICALP), pages 47:1–47:21,
2021. 6, 7, 9, 10, 16, 22, 23, 24, 26, 27, 28, 29, 30, 49
[CVX22] Timothy M. Chan, Virginia Vassilevska Williams, and Yinzhan Xu. Hardness for triangle
problems under even more believable hypotheses: Reductions from real APSP, real 3SUM,
and OV. In Proc. 54th Annual ACM Symposium on Theory of Computing (STOC), pages
1501–1514, 2022. 6, 8, 37, 42
[CW21] Timothy M. Chan and R. Ryan Williams. Deterministic APSP, orthogonal vectors, and more:
Quickly derandomizing Razborov-Smolensky. ACM Trans. Algorithms, 17(1):2:1–2:14, 2021.
Preliminary version in SODA 2016. 23
[CY14] Keren Cohen and Raphael Yuster. On minimum witnesses for Boolean matrix multiplication.
Algorithmica, 69(2):431–442, 2014. 1, 5
[DJW19] Ran Duan, Ce Jin, and Hongxun Wu. Faster algorithms for all pairs non-decreasing paths
problem. In Proc. 46th International Colloquium on Automata, Languages, and Programming
(ICALP), pages 48:1–48:13, 2019. 4
[DL21] Holger Dell and John Lapinskas. Fine-grained reductions from approximate counting to deci-
sion. ACM Trans. Comput. Theory, 13(2):8:1–8:24, 2021. 2
[DLM20] Holger Dell, John Lapinskas, and Kitty Meeks. Approximately counting and sampling small
witnesses using a colourful decision oracle. In Proc. 31st ACM-SIAM Symposium on Discrete
Algorithms (SODA), pages 2201–2211, 2020. 2
[DLV20] Mina Dalirrooyfard, Andrea Lincoln, and Virginia Vassilevska Williams. New techniques
for proving fine-grained average-case hardness. In Proc. 61st IEEE Annual Symposium on
Foundations of Computer Science (FOCS), pages 774–785, 2020. 34
[DP09] Ran Duan and Seth Pettie. Fast algorithms for (max, min)-matrix multiplication and bottle-
neck shortest paths. In Proc. 20th Annual ACM-SIAM Symposium on Discrete Algorithms
(SODA), pages 384–391, 2009. 4
[FM71] Michael J. Fischer and Albert R. Meyer. Boolean matrix multiplication and transitive closure.
In Proc. 12th Annual Symposium on Switching and Automata Theory (SWAT), pages 129–131,
1971. 8
[Fre76] Michael L. Fredman. New bounds on the complexity of the shortest path problem. SIAM J.
Comput., 5(1):83–89, 1976. 4
[GH22] Younan Gao and Meng He. Faster Path Queries in Colored Trees via Sparse Matrix Multi-
plication and Min-Plus Product. In Proc. 30th Annual European Symposium on Algorithms
(ESA), pages 59:1–59:15, 2022. 7, 24, 25
[GJ21] Pawel Gawrychowski and Wojciech Janczewski. Conditional lower bounds for variants of
dynamic LIS. CoRR, abs/2102.11797, 2021. 26
56
[Gow01] William Timothy Gowers. A new proof of Szemerédi’s theorem. Geom. Funct. Anal., 11:465–
588, 2001. 44, 47, 52, 65
[GP18] Allan Grønlund and Seth Pettie. Threesomes, degenerates, and love triangles. J. ACM,
65(4):22:1–22:25, 2018. Preliminary version in FOCS 2014. 4
[GPVX21] Yuzhou Gu, Adam Polak, Virginia Vassilevska Williams, and Yinzhan Xu. Faster monotone
min-plus product, range mode, and single source replacement paths. In Proc. 48th Interna-
tional Colloquium on Automata, Languages, and Programming (ICALP), pages 75:1–75:20,
2021. 2, 7, 21, 63
[GR20] Oded Goldreich and Guy N. Rothblum. Worst-Case to Average-Case Reductions for Sub-
classes of P, pages 249–295. Springer, 2020. 34
[GU18] Paweł Gawrychowski and Przemysław Uznański. Towards unified approximate pattern match-
ing for Hamming and L1 distance. In Proc. 45th International Colloquium on Automata,
Languages, and Programming (ICALP), pages 62:1–62:13, 2018. 8, 49
[HKNS15] Monika Henzinger, Sebastian Krinninger, Danupon Nanongkai, and Thatchaphol Saranurak.
Unifying and strengthening hardness for dynamic problems via the online matrix-vector multi-
plication conjecture. In Proc. 47th Annual ACM Symposium on Theory of Computing (STOC),
pages 21–30, 2015. 2, 7, 26
[HU17] Chloe Ching-Yun Hsu and Chris Umans. On multidimensional and monotone k-SUM. In
Proc. 42nd International Symposium on Mathematical Foundations of Computer Science
(MFCS), pages 50:1–50:13, 2017. 11
[IPZ01] Russell Impagliazzo, Ramamohan Paturi, and Francis Zane. Which problems have strongly
exponential complexity? J. Comput. Syst. Sci., 63(4):512–530, 2001. 43
[JSV04] Mark Jerrum, Alistair Sinclair, and Eric Vigoda. A polynomial-time approximation algorithm
for the permanent of a matrix with nonnegative entries. J. ACM, 51(4):671–697, 2004. 2
[JX22] Ce Jin and Yinzhan Xu. Tight dynamic problem lower bounds from generalized BMM and
OMv. In Proc. 54th Annual ACM Symposium on Theory of Computing (STOC), pages 1515–
1528, 2022. 7, 24, 25
[KL05] Miroslaw Kowaluk and Andrzej Lingas. LCA queries in directed acyclic graphs. In Proc.
32nd International Colloquium on Automata, Languages and Programming (ICALP), pages
241–248, 2005. 1, 5
[KL21] Miroslaw Kowaluk and Andrzej Lingas. Quantum and approximation algorithms for maxi-
mum witnesses of Boolean matrix products. In Proc. 7th Annual International Conference on
Algorithms and Discrete Applied Mathematics (CALDAM), pages 440–451, 2021. 1, 5
[KMS05] Danny Krizanc, Pat Morin, and Michiel H. M. Smid. Range mode and range median queries
on lists and trees. Nord. J. Comput., 12(1):1–17, 2005. 24, 25
[Kos89] S. Rao Kosaraju. Efficient tree pattern matching. In Proc. 30th Annual Symposium on Foun-
dations of Computer Science (FOCS), pages 178–183, 1989. 5
57
[KPP16] Tsvi Kopelowitz, Seth Pettie, and Ely Porat. Higher lower bounds from the 3SUM conjec-
ture. In Proc. 27th ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1272–1287,
2016. 44, 46
[KPS17] Marvin Künnemann, Ramamohan Paturi, and Stefan Schneider. On the fine-grained complex-
ity of one-dimensional dynamic programming. In Proc. 44th International Colloquium on
Automata, Languages, and Programming (ICALP), pages 21:1–21:15, 2017. 2
[LLV19] Rio LaVigne, Andrea Lincoln, and Virginia Vassilevska Williams. Public-key cryptogra-
phy in the fine-grained setting. In Proc. 39th Annual International Cryptology Conference
(CRYPTO), pages 605–635, 2019. 34
[Lov17] Shachar Lovett. Additive combinatorics and its applications in theoretical computer science.
Theory Comput., 8:1–55, 2017. 44
[LPV20] Andrea Lincoln, Adam Polak, and Virginia Vassilevska Williams. Monochromatic triangles,
intermediate matrix products, and convolutions. In Proc. 11th Innovations in Theoretical
Computer Science Conference (ITCS), volume 151, pages 53:1–53:18, 2020. 1, 3, 5, 6, 7, 61
[LU18] François Le Gall and Florent Urrutia. Improved rectangular matrix multiplication using pow-
ers of the Coppersmith-Winograd tensor. In Proc. 29th ACM-SIAM Symposium on Discrete
Algorithms (SODA), pages 1029–1046, 2018. 1, 20, 23, 27
[LUWG19] Karim Labib, Przemyslaw Uznanski, and Daniel Wolleb-Graf. Hamming distance complete-
ness. In Proc. 30th Annual Symposium on Combinatorial Pattern Matching (CPM), volume
128, page 14, 2019. 4, 38, 41
[LWW18] Andrea Lincoln, Virginia Vassilevska Williams, and R. Ryan Williams. Tight hardness for
shortest cycles and paths in sparse graphs. In Proc. 29th Annual ACM-SIAM Symposium on
Discrete Algorithms (SODA), pages 1236–1252, 2018. 6
[Mat91] Jiří Matoušek. Computing dominances in E n . Inf. Process. Lett., 38(5):277–278, 1991. 4, 14,
17, 23, 30
[Mer78] Ralph C. Merkle. Secure communications over insecure channels. Commun. ACM, 21(4):294–
299, 1978. 34
[NP85] Jaroslav Nešetřil and Svatopluk Poljak. On the complexity of the subgraph problem. Com-
ment. Math. Univ. Carol., 026(2):415–419, 1985. 60
[Păt10] Mihai Pătraşcu. Towards polynomial lower bounds for dynamic problems. In Proc. 42nd
Annual ACM Symposium on Theory of Computing (STOC), pages 603–610, 2010. 9, 34
[Sch79] Arnold Schönhage. On the power of random access machines. In Proc. 6th Colloquium on
Automata, Languages and Programming (ICALP), pages 520–529, 1979. 42
[Sch15] Tomasz Schoen. New bounds in Balog–Szemerédi–Gowers Theorem. Combinatorica,

35:695–701, 2015. 47, 65
[Sei95] R. Seidel. On the all-pairs-shortest-path problem in unweighted undirected graphs. J. Comput.

Syst. Sci., 51(3):400–403, 1995. 1, 3, 15, 16, 51
58
[SSV05] Benny Sudakov, Endre Szemerédi, and Van Vu. On a question of Erdös and Moser. Duke
Math. J., 129:129–155, 2005. 44, 47, 48
[SYZ11] Asaf Shapira, Raphael Yuster, and Uri Zwick. All-pairs bottleneck paths in vertex weighted
graphs. Algorithmica, 59(4):621–633, 2011. 1, 5
[SZ99] Avi Shoshan and Uri Zwick. All pairs shortest paths in undirected graphs with integer weights.
In Proc. 40th Annual Symposium on Foundations of Computer Science (FOCS), pages 605–
614, 1999. 6
[Tak98] Tadao Takaoka. Subcubic cost algorithms for the all pairs shortest path problem. Algorithmica,
20(3):309–318, 1998. 4
[TV06] Terence Tao and Van Vu. Additive Combinatorics. Cambridge University Press, 2006. 44, 47
[Val79] Leslie G. Valiant. The complexity of computing the permanent. Theor. Comput. Sci., 8:189–
201, 1979. 2
[Vas10] Virginia Vassilevska Williams. Nondecreasing paths in a weighted graph or: How to optimally
read a train schedule. ACM Trans. Algorithms, 6(4):70:1–70:24, 2010. 4
[Vas15] Virginia Vassilevska Williams. Problem 2 on problem set 2 of CS367, October 15, 2015. 4,
38, 41
[Vas18] Virginia Vassilevska Williams. On some fine-grained questions in algorithms and complexity.
In Proceedings of the ICM, volume 3, pages 3431–3472. World Scientific, 2018. 8
[Vio11] Emanuele Viola. Selected Results in Additive Combinatorics: An Exposition. Number 3 in

Graduate Surveys. Theory of Computing Library, 2011. 44, 47
[VW09] Virginia Vassilevska and Ryan Williams. Finding, minimizing, and counting weighted sub-
graphs. In Proc. 41st Annual ACM Symposium on Theory of Computing (STOC), pages 455–
464, 2009. 20
[VW10] Virginia Vassilevska Williams and Ryan Williams. Subcubic equivalences between path, ma-
trix and triangle problems. In Proc. 51st IEEE Symposium on Foundations of Computer Sci-
ence (FOCS), pages 645–654, 2010. 1
[VW13] Virginia Vassilevska Williams and Ryan Williams. Finding, minimizing, and counting
weighted subgraphs. SIAM J. Comput., 42(3):831–854, 2013. Preliminary version in STOC
2009. 9, 31, 34
[VW18] Virginia Vassilevska Williams and R. Ryan Williams. Subcubic equivalences between path,
matrix, and triangle problems. J. ACM, 65(5):27:1–27:38, 2018. Preliminary version in FOCS
2010. 1, 2, 9, 17, 30, 33, 34, 60, 61, 62
[VWWY15] Virginia Vassilevska Williams, Joshua R. Wang, Richard Ryan Williams, and Huacheng Yu.
Finding four-node subgraphs in triangle time. In Proc. 26th Annual ACM-SIAM Symposium
on Discrete Algorithms (SODA), pages 1671–1680, 2015. 2
59
[VWY07] Virginia Vassilevska, Ryan Williams, and Raphael Yuster. All-pairs bottleneck paths for gen-
eral graphs in truly sub-cubic time. In Proc. 39th Annual ACM Symposium on Theory of
Computing (STOC), pages 585–589, 2007. 4
[VWY10] Virginia Vassilevska, Ryan Williams, and Raphael Yuster. Finding heaviest H-subgraphs in
real weighted graphs, with applications. ACM Trans. Algorithms, 6(3):44:1–44:23, 2010. 1, 5
[VX20a] Virginia Vassilevska Williams and Yinzhan Xu. Monochromatic triangles, triangle listing and
APSP. In Proc. 61st IEEE Symposium on Foundations of Computer Science (FOCS), pages
786–797, 2020. 1, 5, 6
[VX20b] Virginia Vassilevska Williams and Yinzhan Xu. Truly subcubic min-plus product for less
structured matrices, with applications. In Proc. 31st ACM-SIAM Symposium on Discrete Al-
gorithms (SODA), pages 12–29, 2020. 2, 7, 21, 24, 25, 63
[Wil05] Ryan Williams. A new algorithm for optimal 2-constraint satisfaction and its implications.
Theor. Comput. Sci., 348(2-3):357–365, 2005. Preliminary version in ICALP 2004. 43
[Wil18] R. Ryan Williams. Faster all-pairs shortest paths via circuit complexity. SIAM J. Comput.,
47(5):1965–1985, 2018. Preliminary version in STOC 2014. 3, 4, 23, 34
[Yus09] Raphael Yuster. Efficient algorithms on sets of permutations, dominance, and real-weighted
APSP. In Proc. 20th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages
950–957, 2009. 14, 23
[Zwi99] Uri Zwick. All pairs lightest shortest paths. In Proc. 31st Annual ACM Symposium on Theory
of Computing (STOC), pages 61–69, 1999. 7
[Zwi02] Uri Zwick. All pairs shortest paths using bridging sets and rectangular matrix multiplication.
J. ACM, 49(3):289–317, 2002. 1, 3, 5, 23, 28, 49
A Still More Equivalences Between Counting and Detection Problems

A.1 Exact k-Clique and Minimum k-Clique
Let G be the input graph of an Exact-k-Clique or #Exact-k-Clique instance, and let w be the weight
function of the graph. For every set I ⊆ V (G) of size k − 1, we use WI to denote the set of j where I ∪ {j}
forms a k-clique whose edge weights sum up to the required value t.
Theorem A.1. If Exact-k-Clique for n-node graphs has an O(nk−ε ) time algorithm for some ε > 0, then
′
#Exact-k-Clique for n-node graphs has an O(nk−ε ) time algorithm for some ε′ > 0
Proof. Similar as before, by well-known techniques [VW18], given a #Exact-k-Clique instance on a graph
G with n nodes and a target t, we can use the O(nk−ε ) time algorithm for Exact-k-Clique to list up to n0.99
′′
witnesses for every set I of k − 1 nodes, in O(nk−ε ) time for some ε′′ > 0.
Then we enumerate all possible subsets J ⊆ V (G) of size k − 3. For each J, we can reduce the problem
of counting witnesses for sets I ⊇ J of size k − 1 to a #AE-Exact-Tri instance (G′ , t′ ) in a standard
′
way [NP85]. The set of nodes
′
Pof G corresponds to V (G) \ J. Let′ the edge weight
P between i1 and i2
be w (i1 , i2 ) = 2w(i1 , i2 ) + j∈J (w(j, i1 ) + w(j, i2 )). Also, let t be 2t − 2 j1 ,j2 ∈J w(j1 , j2 ). It is
j1 <j2
60
not difficult to verify that (i1 , i2 , i3 ) forms an exact triangle in G′ if and only if J ∪ {i1 , i2 , i3 } forms an
exact k-clique in G. Thus, the number of witnesses of (i1 , i2 ) in G′ equals the number of witnesses of
J ∪ {i1 , i2 } in G. We then proceed similar to the proof of Theorem 4.2. If the number of witnesses of
′′
(i1 , i2 ) in G′ is less than n0.99 , then we have already listed all of their witnesses in the O(nk−ε ) time step.
e 0.01 ) that intersects with each of
For (i1 , i2 ) that has at least n0.99 witnesses, we can find a set S of size O(n
e 2.99
Wi1 ,i2 in O(n ) time, and then apply Lemma 4.1 to compute the witness count for these (i1 , i2 ) pairs in
e
O(|S| · n(3+ω)/2 ) ≤ O(n2.70 ) time.
Finally, summing up WI for every distinct set I of k − 1 nodes gives the total exact triangle count of
G. The overall running time for the #Exact-k-Clique instance is thus O(n e k−ε′′ + nk−3 · (n2.99 + n2.70 )) =
e k−min{ε ,0.01} ).
O(n
′′

We can similarly show that #Min-k-Clique reduces to Min-k-Clique. Since the proof is essentially the
same, we omit the proof of the following theorem for conciseness.
Theorem A.2. If Min-k-Clique for n-node graphs has an O(nk−ε ) time algorithm for some ε > 0, then
′
#Min-k-Clique for n-node graphs has an O(nk−ε ) time algorithm for some ε′ > 0
We then reduce Min-k-Clique to #Min-k-Clique.
Theorem A.3. If #Min-k-Clique for n-node graphs has an O(nk−ε ) time algorithm for some ε > 0, then
′
Min-k-Clique for n-node graphs has an O(nk−ε ) time algorithm for some ε′ > 0
Proof. Let G be the input graph for a Min-k-Clique instance. Without loss of generality, we can assume G
is a k-partite graph on node parts V1 ∪ · · · ∪ Vk .
We first multiply all the edge weights of G by a large enough number M ≥ 10k · nk . Then for each
i ∈ [k], vi ∈ Vi , we add vi · ni−1 to all edges adjacent to vi . After these transformations, there will be a
unique minimum weight k-clique in the graph. Furthermore, this k-clique must also be a minimum weight
k-clique in the original graph. We denote this minimum weight k-clique by (u1 , . . . , uk ) ∈ V1 × · · · × Vk .
Then for each i ∈ [k] and each p ∈ [⌈log(n)⌉], we do the following. Let G(i,p) be a copy of the graph
(after the weight changes), and we duplicate all nodes v ∈ Vi whose p-th bit in its binary representation is
1. We use the assumed #Min-k-Clique algorithm the count the number of minimum weight k-cliques in
in G(i,p) . If the number of minimum weight k-clique in G(i,p) is 2, then we know the p-th bit of ui is 1;
otherwise, the p-th bit of ui is 0.
After all k⌈log(n)⌉ rounds, we can recover (u1 , . . . , uk ), and thus compute the weight of the minimum
weight k-clique in the original graph.
A.2 Monochromatic Convolution

Theorem A.4. If Mono-Convolution for length n arrays has an O(n1.5−ε ) time algorithm for some ε > 0,
′
then #Mono-Convolution for length n arrays has an O(n1.5−ε ) time randomized algorithm for some ε′ > 0
Proof. If Mono-Convolution has an O(n1.5−ε ) time algorithm, then 3SUM has a truly subquadratic time
algorithm [LPV20], and consequently All-Nums-3SUM has a truly subquadratic time algorithm [VW18].
′′
Furthermore, by Theorem 8.10, #All-Nums-3SUM has an O(n2−ε ) time algorithm for some ε′′ > 0.
Thus, it suffices to reduce #Mono-Convolution to #All-Nums-3SUM. Lincoln, Polak, and Vassilevska
W. [LPV20] showed that a truly subquadratic time algorithm for All-Nums-3SUM implies a truly sub-n1.5
time algorithm for Mono-Convolution. It is not difficult to check that their reduction preserves the number
of solutions, and thus works for the counting versions as well.
61
A.3 All-Pairs Shortest Paths
Theorem A.5. If APSP for n-node graphs with positive edge weights has an O(n3−ε ) time algorithm
′
for some ε > 0, then #mod U APSP for n-node graphs with positive edge weights has an O(n3−ε ) time
e
algorithm for some ε′ > 0, for any O(1)-bit integer U ≥ 2.
e
Proof. Let (A, A′ ) and (B, B ′ ) be two pairs of n × n matrices where the entries of A′ and B ′ are O(1)-bit
integers. We define two “funny” matrix products (one of them was defined in the proof of Theorem 5.6).
If (C, C ′ ) = (A, A′ ) ⊕ (B, B ′ ), then Cij = min(Aij , Bij ), and Cij
′ = [A = C ]A′ + [B = C ]B ′ ,
ij ij ij ij ij ij
′ ′ e 2
where [·] denotes the indicator function. Clearly, we can compute (A, A ) ⊕ (B, B ) in O(n ) time.
Claim A.6. Suppose APSP on n-node graphs with positive edge weights has an O(n3−ε ) time algorithm
′′
for some ε > 0, then we can compute (A, A′ ) ⊗ (B, B ′ ) in O(n3−ε ) time for some ε′′ > 0 where the entries
e
of A′ and B ′ are O(1)-bit integers.
Proof. If APSP on n-node graphs with positive edge weights has an O(n3−ε ) time algorithm, then so
does Min-Plus-Product [VW18]. By Theorem 8.2, #Min-Plus-Product also has a truly subcubic time
algorithm. It remains to reduce computing (A, A′ ) ⊗ (B, B ′ ) to #Min-Plus-Product.
(p) (p)
For p ∈ [O(log n)], let A(p) be the matrix where Aij = Aij if the p-th bit of A′ij is 1, and Aij = ∞
otherwise. We can similarly define B (p) . Also, let J be the n × n matrix whose entries are all 1. It is then
not difficult to verify that
M
(A, A′ ) ⊗ (B, B ′ ) = (A(p) , 2p J) ⊗ (B (q) , 2q J).
p,q
To compute each term in the above “sum”, say (C (p,q), C ′(p,q) ) = (A(p) , 2p J) ⊗ (B (q) , 2q J), we first use
′(p,q)
the assumed Min-Plus-Product algorithm to compute C (p,q) = A(p) ⋆ B (q) in O(n3−ε ) time. Then Cij
is exactly the number of witnesses for C (p,q) in the previous Min-Plus product, multiplied by 2p+q , so we
can use the truly subcubic time algorithm for #Min-Plus-Product to compute C ′(p,q) .
As showed in the proof of Theorem 5.6, #APSP reduces to O(log n) instances of the funny matrix
product; also, now we can mod all entries in A′ , B ′ by U after each funny matrix product to keep them
e
O(1)-bit e 3−ε′′ ) time algorithm assuming APSP can
integers. Thus, by Claim A.6, #mod U APSP has an O(n
be solved in truly subcubic time.
Theorem A.7. If #mod c APSP for n-node graphs with positive edge weights has an O(n3−ε ) time algorithm
e
for some ε > 0, where c ≥ 2 is some O(1)-bit integer, then APSP for n-node graphs with positive edge
3−ε ′
weights has an O(n ) time algorithm for some ε′ > 0,
Proof. Suppose there is an O(n3−ε ) time algorithm for #mod c APSP for n-node graphs positive edge
weights, then there is also an O(n3−ε ) time algorithm for counting the number of witnesses modulo c
for Min-Plus-Product for n × n matrices, by following the standard reduction from Min-Plus-Product to
APSP [VW18].
e 3−ε ) time algo-
Then by essentially the same proof to the proof of Theorem 8.3, there exists an O(n
rithm for Min-Plus-Product. Finally, there is a truly subcubic time algorithm for APSP since APSP and
Min-Plus-Product are subcubically equivalent [VW18].
62
B Bounded-Difference or Monotone Min-Plus Product from the Triangle
Decomposition Theorem
In this appendix, we note that our Triangle Decomposition Theorem (Theorem 5.1) implies a truly subcu-
bic algorithm for bounded-difference Min-Plus product. Existing algorithms [BGSV16, VX20b, GPVX21,
CDX22, CDXZ22] are faster, but our presentation is simpler, and so is interesting from the pedagogical
perspective in our opinion. In a way, the Triangle Decomposition Theorem clarifies conceptually why a
subcubic algorithm is possible (if we don’t care too much about optimizing the exponent in the running
time).
Theorem B.1. There is a truly subcubic algorithm for computing the Min-Plus product of two integer n × n
matrices A and B satisfying the bounded difference property, i.e., |Ai,k+1 − Aik | ≤ c0 for all i, k and
|Bk+1,j − Bkj | ≤ c0 for all k, j for some constant c0 .
Proof. Let C = A ⋆ B denote the matrix that we want to compute. Let ℓ be a parameter, and let D be the set
of all indices in [n] that are divisible by ℓ. Let ℓ′ = 2c0 ℓ + 1. We first compute C̃ij = mink∈D (⌈Aik /ℓ′ ⌉ +
⌈Bkj /ℓ′ ⌉) for every i, j ∈ [n]. By brute force, this takes O(n3 /ℓ) time. Note that C̃ij ≥ Cij /ℓ′ . For each
(r) (r)
r ∈ [ℓ], i, j ∈ [n] and k ∈ D, let Aik = Ai,k+r − ⌈Aik /ℓ′ ⌉ ℓ′ and Bkj = Bk+r,j − ⌈Bkj /ℓ′ ⌉ ℓ′ . Note that
(r) (r)
Aik , Bkj ∈ [±O(c0 ℓ)].
To compute C = A ⋆ B, we use the following formula:
Cij = min (Ai,k+r + Bk+r,j )

k∈D, r∈[ℓ]
!
′ (r) (r)
= min (C̃ij + ∆)ℓ + min (Aik + Bkj ) .
∆∈{0,1,2} r∈[ℓ], k∈D: ⌈Aik /ℓ′ ⌉+⌈Bkj /ℓ′ ⌉=C̃ij +∆
The reason for restricting the range of ∆ to {0, 1, 2} is this: if ⌈Aik /ℓ′ ⌉ + ⌈Bkj /ℓ′ ⌉ ≥ C̃ij + 3, then
Aik /ℓ′ + Bkj /ℓ′ ≥ C̃ij + 1, and so Ai,k+r + Bk+r,j ≥ Aik + Bkj − 2c0 ℓ > C̃ij ℓ′ ; but we know that the
true value of Cij is at most C̃ij ℓ′ .
To evaluate the above formula, let’s fix ∆ ∈ {0, 1, 2}. We want to compute
′ (r) (r)
Cij := min (Aik + Bkj ).
r∈[ℓ], k∈D: ⌈Aik /ℓ′ ⌉+⌈Bkj /ℓ′ ⌉=C̃ij +∆
′ to ∞. Consider the tripartite graph with nodes {u : i ∈ [n]}, {x : k ∈ D}, and {v : j ∈

Initialize Cij i k j
[n]}, where ui xk has weight ⌈Aik /ℓ′ ⌉, and xk vj has weight ⌈Bkj /ℓ′ ⌉, and ui vj has weight −C̃ij −∆. Apply
Theorem 5.1 to get subgraphs G(λ) and a set R in O(n e 3 /s + s2 n2 ) time.
We first examine each triangle ui xk vj ∈ R and each r ∈ [ℓ] and reset Cij ′ to A(r) + B (r) if it is smaller
ik kj
e · (n3 /ℓ)/s) = O(n
than the current value. This takes O(ℓ e 3 /s) time.
Next, for each λ and each r ∈ [ℓ], we compute mink∈D: ui xk ,xk vj ∈G(λ) (A′ik + Bkj ′ ) and reset C ′ to this
ij
value if it is smaller, for every ui vj ∈ G(λ) . This reduces to a Min-Plus product instance on integers in
e 0 ℓnω ) time for each λ and r. The total time is O(s
[±O(c0 ℓ)] and takes O(c e 3 · ℓ · c0 ℓnω ) = O(c
e 0 ℓ2 s3 nω ).
e 3 /ℓ + n3 /s + c0 ℓ2 s3 nω ). Setting ℓ = s = n(3−ω)/6 gives a bound of
The overall running time is O(n
e
O(n (15+ω)/6 ) for constant c0 (which is improvable by using rectangular matrix multiplication).
63
The above algorithm easily extends to the case where we allow O(n2−δ ) exceptional pairs (i, k) to
violate the bounded difference property |Ai,k+1 − Aik | ≤ c0 , and similarly O(n2−δ ) exceptional pairs (k, j)
to violate the bounded difference property |Bk+1,j − Bkj | ≤ c0 . We just need to add an extra cost of
e 3−δ + n3 /ℓ + n3 /s + c0 ℓ2 s3 nω ) remains truly subcubic for an
O(ℓn2−δ · n). The total time bound O(ℓn
appropriate choice of ℓ and s.
The case of matrices with monotone rows/columns and integer entries in [n] easily reduce to the bounded
difference case with a nonconstant c0 = nδ and O(n2−δ ) exceptional pairs. So, we also get a truly subcubic
algorithm (with an appropriate choice of δ and a slightly worse final exponent) for the monotone case.
C More on BSG
In this appendix, we show that the proof of Theorem 10.3 can be modified to hold not just for indexed
sets, but also for arbitrary subsets A and C of an abelian group, if we ignore the construction time. This
requires a more clever argument in the “low-frequency” case. (The proof of Theorem 10.5 can also be
modified in a similar way.)
Theorem C.1. (Simpler BSG Covering) Given subsets A and C of size n of an abelian group and a
e 3 ) subsets A(1) , . . . , A(ℓ) ⊆ A, and a set R of O(n
parameter s, there exist a collection of ℓ = O(s e 2 /s) pairs
in A × A, such that
S
P
e 2 n3/2 ).
(λ) − A(λ) | = O(s
(ii) λ |A
Proof.
• Few-witnesses case. First add {(a, b) ∈ A × A : a − b ∈ C, popA (a − b) ≤ n/s} to R. The number of

e · n/s).
pairs added to R is O(n
• Many-witnesses case. Let F = {h : popA (h) > n/r}. Then |F | = O(rn), since the total popularity is
n2 . By adding extra elements to F , we may assume that |F | = Θ(rn). Pick a random subset H ⊆ F of
size c0 sr log n for a sufficiently large constant c0 .
– Low-frequency case. Add the following to R:
{(a, b) ∈ A × A : |{(a′ , b′ ) ∈ A × A : a − a′ = b − b′ 6∈ F }| > n/(2s)}.
Since for each (a, a′ ) with a − a′ 6∈ F , there are at most n/r choices of (b, b′ ) satisfying a − a′ = b − b′ ,
the number of pairs added to R is O( e n2 ·n/r ) = O(n
e 2 /s) by choosing r := s2 .
n/(2s)
– High-frequency case. For each h ∈ H, add the following subset to the collection:
A(h) = {a ∈ A : a − h ∈ A}.
e
The number of subsets is O(sr) e 3 ).
= O(s
Correctness. To verify (i), consider a fixed pair (a, b) ∈ A × A with a − b ∈ C. If popA (a − b) ≤ n/s,
then (a, b) ∈ R due to the “few-witnesses” case. So assume popA (a − b) > n/s. Furthermore, assume that
|{(a′ , b′ ) ∈ A × A : a − a′ = b − b′ 6∈ F }| ≤ n/(2s), for otherwise (a, b) ∈ R due to the “low-frequency”
64
case. There are at least n/s pairs (a′ , b′ ) ∈ A × A with a′ − b′ = a − b, which also satisfy a − a′ = b − b′
by Fredman’s trick. Among them, there are at least n/(2s) pairs (a′ , b′ ) ∈ A × A with a − a′ = b − b′ ∈ F .
For h = a − a′ = b − b′ , we have (a, b) ∈ A(h) × A(h) . So, the probability that (a, b) ∈ A(h) × A(h) for a
S
random h ∈ F is Ω( n/(2s)
rn ) = Ω(1/(sr)). Thus, (a, b) ∈ h∈H (A
(h) × A(h) ) w.h.p. for a random subset
H ⊂ F of size c0 sr log n.
To verify (ii), consider a fixed h. The set A(h) − A(h) is equal to {c : ∃a, b, a′ , b′ ∈ A, c = a −
b, a − h = a′ , b − h = b′ }. Let Yc = {a ∈ A : a − c ∈ A}. Then A(h) − A(h) is contained in
{c : ∃a ∈ Yc and a − h ∈ Yc }. The expected size of A(h) − A(h) for a random h ∈ F is thus bounded by
X p √
min{|Yc | · |Yc |/|F |, 1} ≤ O(n2 / |F |) = O(n3/2 / r),
c
P
since c |Yc | = O(n2 ).
P e √ e 2 n3/2 ).
The expected total sum h∈H |A(h) − A(h) | is bounded by O(sr · n3/2 / r) = O(s
Chan and Lewenstein [CL15] posed the following interesting combinatorial question:
P {(a, b) ∈ A×B : a+b ∈ C}

Given sets A, B, C in an abelian group of size n, we want to cover
by bicliques A(λ) × B (λ) , so as to minimize the cost function λ |A(λ) + B (λ) |. Prove worst-
case bounds on the minimum cost as a function of n.
They observed that Theorem 10.2 implies an O(n13/7 ) upper bound for this problem. Theorem C.1 im-
plies an improved O(ne 11/6 ) upper bound (as we can use “singleton” bicliques to cover R and choose s to
minimize O(ne 2 /s + s2 n3/2 )).
If we just want an analog of the BSG Theorem that extracts a single subset rather than constructs a cover
(in the style of Theorem 10.1), the proof of the above theorem becomes even simpler (the “few-witnesses”
and “low-frequency” cases may be skipped):
Theorem C.2. (Simpler BSG) Given subsets A and C of size n of an abelian group and a parameter s, if
|{(a, b) ∈ A × A : a − b ∈ C}| ≥ n2 /s, then there exists a subset A′ ⊆ A of size Ω(n/s), such that
|A′ − A′ | = O(s1/2 n3/2 ).

P
Proof. Let F = {x : popA (x) > n/(2s)}. Then |F | ≥ n/(2s), because otherwise, c∈C popA (c) <
|F |n + n2 /(2s) < n2 /s, contradicting the stated assumption.
Pick a random h ∈ F and let A(h) = {a ∈ A : a−h ∈ A} as before. Then |A(h) | = popA (h) > n/(2s).
Bypthe same argument as before, the expected size of A(h) − A(h) for a random h ∈ F is bounded by
O(n2 / |F |), which is now O(s1/2 n3/2 ).
The above O(s1/2 n3/2 ) bound is better than the known O(s5 n) bound (Theorem 10.1) when s ≫ n1/9 ,
though the latter holds only for the bichromatic sum set setting.9
Along the same lines, we can obtain the following combinatorial result from the Triangle Decomposition
Theorem, which might be of independent interest:
9
Bounds of the form O(sO(1) n) were also known in the setting of monochromatic difference sets, but direct comparisons
require care: many previous work such as [Gow01, Sch15] started from a related but different assumption, namely, that the energy
|{(a, b, a′ , b′ ) ∈ A × A × A × A : a + b = a′ + b′ }| is at least n3 /s′ for some parameter s′ . This parameter s′ is not identical to
the parameter s in Theorem C.2, though they are related polynomially (or quadratically).
65
Theorem C.3. Given a real-weighted tripartite graph G with n nodes, and given a parameter s, if G
e 3 /s4 ) triangles (and thus
contains Ω(n3 /s) zero-weight triangles, then there exists a subgraph G′ with Ω(n
e 2 4 ′
Ω(n /s ) edges) such that all triangles in G are zero-weight triangles.
e 3 /s′ ), which can be made less than
Proof. Apply Theorem 5.1 with s replaced by s′ . The size of R is O(n
3 ′ e (λ) e 3 /s) · (1/s3 )).
n /(2s) for some choice of s = Θ(s). Then there exists G with |Triangles(G(λ) )| = Ω((n

66

Frederick Trick

Uploaded by

Copyright:

Available Formats

Frederick Trick

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Frederick Trick

Uploaded by

Copyright:

Available Formats

Fredman’s Trick Meets Dominance Product: Fine-Grained

Complexity of Unweighted APSP, 3SUM Counting, and More

tmc@illinois.edu virgi@mit.edu xyzhan@mit.edu

* Supported by NSF Grant CCF-2224271.

3 Conditional Lower Bounds for Problems with Intermediate Complexity:

4 Equivalences Between Counting and Detection Problems:

5 Alternative to BSG: A Triangle Decomposition Theorem and Its Applications 18

6 More Lower Bounds under the Strong APSP Hypothesis 22

7 Lower Bounds under the u-dir-APSP Hypothesis 27

8 More Equivalences Between Counting and Detection Problems 29

10 BSG Theorems Revisited 43

11 Lower Bounds for Min-Equality Convolution 49

A Still More Equivalences Between Counting and Detection Problems 60

B Bounded-Difference or Monotone Min-Plus Product from the Triangle Decomposition Theo-

Q2: Does Min-Witness-Prod require superquadratic time, even if ω = 2?

1.1 Summary of Our Contributions and New Tool

Aik − Aiℓ ≤ Bℓj − Bkj .

Lower Bounds Hypotheses Strong Min-Plus

1.3 Equivalence of Counting and Detection

1.4 New Variants of the BSG Theorem with Algorithmic Applications

1.5 Paper Organization

Problem 2.1 (Min-Plus-Product). Given an n1 × n2 matrix A and an n2 × n3 matrix B, compute the

3 Conditional Lower Bounds for Problems with Intermediate Complexity:

Eij := min (A′ik + Bkj

3.2 The Key Reduction

Theorem 3.2. For any r, s, t with s ≤ n2 and t ≤ ℓ,

It suffices to describe how to compute C ′′ . We divide into two cases:

Proof. By Theorem 3.2,

The reduction from AE-Exact-Tri to #AE-Exact-Tri is trivial.

5 Alternative to BSG: A Triangle Decomposition Theorem and Its Applica-

5.1 Application 1: Exact Triangle in Preprocessed Universes

5.2 Application 2: 3SUM in Preprocessed Universes

5.3 Application 3: A Deterministic 3SUM Algorithm for Bounded Monotone Sets

5.4 Application 4: A Truly Subquartic #APSP Algorithm

6 More Lower Bounds under the Strong APSP Hypothesis

6.1 Min-Witness Product

6.2 All-Pairs Shortest Lightest Paths

• For each k ∈ [x] and u ∈ [y], create a node w1 [k, u].

• For each k ∈ [x] and v ∈ [y], create a node w3 [k, v].

• For each j ∈ [n], create a node t[j].

6.3 Batched Range Mode

6.4 Dynamic Shortest Paths in Unweighted Planar Graphs

Lemma 6.11. For any α, β ≤ 1,

6.5 Min-Witness Equality Product

Consequently, for any t ≤ n3−ω ,

7 Lower Bounds under the u-dir-APSP Hypothesis

e 2+ρ−Ω(ε) ), since ω(1, ρ + β, 1) ≤ ω(1, ρ, 1) + ω(1,σ,1)−ω(1,ρ,1) β = 1 + 2ρ + κβ, by convexity

7.2 All-Pairs Shortest Lightest Paths

7.3 Batched Range Mode

7.5 An Equivalence Result

8 More Equivalences Between Counting and Detection Problems

Proof. Given a #Min-Plus-Product instance on n × n matrices A, B, we first list up to n0.99 elements in

Proof. Let A, B be two n × n matrices of a Min-Plus-Product instance. Let A′ be another n × n matrix

8.2 3SUM Convolution and Min-Plus Convolution

fall in I can be expressed as

As in Remark 4.3, Theorem 8.7 implies that 3SUM-Convolution is subquadratically equivalent to

We then show a reduction from Min-Plus-Convolution to #Min-Plus-Convolution.

Plus convolution between A′ and B ′ . Clearly, for any k ∈ [n], C2k−1

8.3 All-Numbers 3SUM