3.1 A Simple Ω (N1/2) Lower Bound Due to Bender and Ron
The distribution over sparse digraphs we now describe corresponds to the distribution
\( \mathscr{G}_2 \) defined in Section 4 of [
4]; we denote this distribution by
\( \mathbb {BR}_{\mathrm{simple}} := \mathbb {BR}_{\mathrm{simple}}(N, d). \) A graph
\( {\boldsymbol {G}} \) drawn from
\( \mathbb {BR}_{\mathrm{simple}} \) is generated by randomly partitioning the
N vertices
\( \lbrace 1,\dots ,N\rbrace \) into two equal-size subsets
\( \boldsymbol {S}_1 \) and
\( \boldsymbol {S}_2 \) and taking
d random directed perfect matchings from
\( \boldsymbol {S}_1 \) to
\( \boldsymbol {S}_2 \) and
d random directed perfect matchings from
\( \boldsymbol {S}_2 \) to
\( \boldsymbol {S}_1 \) as the edges of
\( {\boldsymbol {G}}. \) A straightforward probabilistic analysis (see Lemma 5 of [
4]) shows that for any constant
\( d \ge 128 \) , a random graph
\( {\boldsymbol {G}} \sim \mathbb {BR}_{\mathrm{simple}} \) is
\( \varepsilon \) -far from acyclic for
\( \varepsilon =1/16 \) . To complete the lower bound, it remains to argue that any deterministic algorithm
\( \mathcal {A} \) that makes
\( o(N^{1/2}) \) queries finds a directed cycle in
\( {\boldsymbol {G}} \sim \mathbb {BR}_{\mathrm{simple}} \) with probability
\( o_N(1) \) . This follows from the following stronger property: with probability
\( 1-o_N(1) \) over the choice of
\( {\boldsymbol {G}} \sim \mathbb {BR}_{\mathrm{simple}} \) , no deterministic algorithm that makes
\( o(N^{1/2}) \) queries receives in response to a query a vertex it has previously observed, either as input to or output from a query.
4 This property follows from a standard birthday paradox type argument, i.e., the fact that a sequence of
\( o(N^{1/2}) \) uniform samples from an
N-element set samples the same element twice with probability
\( o_N(1) \) .
3.3 Our Construction and a Sketch of Our Main Ideas
We now give an informal description of the distribution
\( \mathbb {BR}:= \mathbb {BR}(N,d) \) that we analyze (a detailed description is given in Section
4). This distribution is a modified version of a construction proposed by Bender and Ron in [
4].
Each graph in the support of
\( \mathbb {BR} \) has
3N vertices, and each vertex has outdegree either
d or 0. A graph
\( {\boldsymbol {G}} \) drawn from
\( \mathbb {BR} \) is obtained as follows:
N vertices are randomly selected and designated as
blue vertices, and the remaining
2N vertices are designated as
red vertices. Red vertices are randomly partitioned into
L many layers
\( R_1,\dots ,R_L \) , each containing
\( W=2N/L \) vertices.
5 Each blue vertex is assigned
d outneighbors by choosing each one uniformly at random from the blue vertices and the first half of the layers of the red vertices. Each red vertex in layer
\( R_i \) (
\( i \lt L \) ) is assigned
d outneighbors by choosing each one uniformly from the
W vertices in
\( R_{i+1}. \) For a visual example, refer to Figure
1. A straightforward probabilistic argument (given in Section
4.2) shows that with probability
\( 1-o_N(1) \) a random
\( {\boldsymbol {G}} \sim \mathbb {BR} \) is
\( \varepsilon \) -far from acyclic, so the main challenge is to show that it is hard to find a directed cycle in a graph drawn from this distribution.
We give some intuition behind the construction of graphs in
\( \mathbb {BR} \) . Note that every cycle in
\( {\boldsymbol {G}} \) consists entirely of blue vertices. Thus, a cycle-finding algorithm may want to “avoid wandering into the red region.” This, however, is difficult to do because the local neighborhood of a typical vertex “looks the same” whether it is blue or red (note that an algorithm under the adjacency list model of course never receives explicit information about whether any particular vertex is blue or red). For example, the simple random walk approach sketched at the beginning of the previous subsection will not work for
\( {\boldsymbol {G}} \sim \mathbb {BR} \) : even if the random walk starts at a blue vertex, after
\( O(1) \) steps on average it will reach a red vertex and will have no chance of completing a cycle. Given that an algorithm needs
\( \Omega (N^{1/2}) \) queries to find a cycle even if it is given the set of blue vertices (since the blue part of
\( {\boldsymbol {G}}\sim \mathbb {BR} \) is very similar to graphs drawn from
\( \mathbb {BR}_{\mathrm{simple}} \) described in Section
3.1), it is natural to hope for an
\( \omega (N^{1/2}) \) lower bound using the distribution
\( \mathbb {BR} \) .
There are two challenges in obtaining an
\( \omega (N^{1/2}) \) lower bound using
\( \mathbb {BR} \) . First, as discussed in the previous subsection (which applies not only to
\( \mathbb {BR} \) but to any distribution), an
\( \omega (N^{1/2}) \) -query algorithm may experience many collisions and hence potentially obtain a significant amount of information about
\( {\boldsymbol {G}} \) . The second challenge is specific to
\( \mathbb {BR} \) . Despite the intuition, “wandering into the red region” may actually provide useful information about
\( {\boldsymbol {G}} \) when done strategically (see Section
8 for two attacks on
\( \mathbb {BR} \) based on exploring the red region; they together imply that one cannot hope to obtain a lower bound better than
\( N^{13/18} \) using
\( \mathbb {BR} \) ). Given that many algorithmic strategies are possible, how can one argue that
every algorithm that does not make too many queries is unlikely to find a cycle?
To explain the intuition that underlies our lower bound, we first note that for a query on vertex u to reveal a cycle, it must be the case that u is blue and there is a directed path from one of its outneighbors to u in the current “knowledge graph” of the algorithm (where the knowledge graph consists of all edges that have been found so far and v is an ancestor of u if it has a directed path to u). As a result, we focus on the maximum number of ancestors among all blue vertices in the current knowledge graph because the probability that an algorithm discovers a cycle when it queries a vertex is proportional to its number of ancestors in the knowledge graph. Our proof, at a high level, shows that this crucial quantity cannot grow too fast.
A key notion behind our analysis is the division of a sequence of queries made by an algorithm into distinct
epochs. Roughly speaking, an epoch ends either when a collision occurs (i.e., one of the outneighbors of the vertex queried is a vertex that the algorithm has seen before, either as a query vertex or as an outneighbor of a query vertex), or when “too many” queries have been made since the end of the previous epoch. We introduce the notion of epochs in Section
5 and bound the number of epochs that occur in the execution of any algorithm that makes at most
\( {\overline{Q}} \) queries (Lemma
2). We also bound the number of
blue surprise epochs: these are epochs that end because the vertex
u queried is blue and has a blue outneighbor
v that the algorithm has seen before (Lemma
3). We pay special attention to such epochs because with the discovery of
\( (u,v) \) , all ancestors of
u become ancestors of
v and the number of ancestors of
v may grow rapidly.
Next, in Section
6 we show that during an epoch of any algorithm, regardless of outcomes of previous epochs, the vertices queried are unlikely to contain a path of blue vertices of length more than
\( 4 \log N \) . We do so by analyzing the information that an algorithm has about the connected components of the knowledge graph at any point in its execution, and arguing that the distribution of colors of unqueried children of the knowledge graph is close to a “naive distribution” against which no algorithm can succeed in constructing a long blue path with high probability. We use this argument to prove Lemma
4, which is at the heart of our lower bound argument. In particular, Lemma
4 implies that during an epoch that is not a blue surprise epoch (or during a blue surprise epoch but ignoring the blue-blue collision edge found at the end of the epoch), the maximum number of ancestors of blue vertices in the knowledge graph can increase by no more than
\( 4\log N \) .
Finally, in Section
7, we combine Lemmas
2,
3, and
4 to bound the maximum number of ancestors of blue vertices in its knowledge graph during the execution of any
\( {\overline{Q}} \) -query algorithm on
\( {\boldsymbol {G}}\sim \mathbb {BR} \) . This is used to show that every such algorithm finds a cycle in
\( {\boldsymbol {G}}\sim \mathbb {BR} \) with probability
\( o_N(1) \) .
To simplify the presentation, we introduce in Section
5.1 an augmented query model, called the
color revelation model, in which more information is provided to the algorithm than in the standard model. Specifically, at the end of each epoch the query algorithm is provided with the color of every vertex it has previously seen. All our results discussed above are proved under this model and our lower bound trivially carries over to the standard model since any algorithm under the latter can be simulated under the color revelation model by simply ignoring the additional information.