research-article

A Generalization of Self-Improving Algorithms

Authors:

Kai Jin,

Siu-Wing Cheng,

Man-Kwun Chiu,

Man Ting WongAuthors Info & Claims

ACM Transactions on Algorithms (TALG), Volume 18, Issue 3

Article No.: 29, Pages 1 - 32

https://doi.org/10.1145/3531227

Published: 11 October 2022 Publication History

Get Access

Abstract

Ailon et al. [SICOMP’11] proposed self-improving algorithms for sorting and Delaunay triangulation (DT) when the input instances x₁, ..., x_n follow some unknown product distribution. That is, x_i is drawn independently from a fixed unknown distribution 𝒟_i. After spending O(n^1+ε) time in a learning phase, the subsequent expected running time is O((n + H)/ε), where H ∊ {H_S,H_DT}, and H_S and H_DT are the entropies of the distributions of the sorting and DT output, respectively. In this article, we allow dependence among the x_i’s under the group product distribution. There is a hidden partition of [1, n] into groups; the x_i’s in the kth group are fixed unknown functions of the same hidden variable u_k; and the u_k’s are drawn from an unknown product distribution. We describe self-improving algorithms for sorting and DT under this model when the functions that map u_k to x_i’s are well-behaved. After an O(poly(n))-time training phase, we achieve O(n + H_S) and O(nα (n) + H_DT) expected running times for sorting and DT, respectively, where α (⋅) is the inverse Ackermann function.

A Appendix

Lemma 3.4 is not proven yet. It states that we can compute efficiently an approximate partition of $\,[1,n]$ for DT almost surely. We prove Lemma 3.4 in the following subsections.

A.1 Learning an Approximate Partition for DT – Proof of Lemma 3.4

Recall that the functions $h_{i,k}^x$’s and $h_{i,k}^y$’s are bivariate polynomials of degrees at most some constant $d_0$, and that the distribution of every $u_k$ is $d_0d_1$-generic, where $d_1 = 2(d_0^2/2 + d_0)^{16}$.

Lemma A.1.

Every input coordinate $\xi$ satisfies exactly one of the following conditions.

(i)

$\xi$ is a constant input coordinate.

(ii)

For all $c \in \mathbb {R}$, $\Pr [\xi =c ]=0$.

Proof.

Without loss of generality, assume that $\xi =h_{i,k}^x(u_k)$. If $h_{i,k}^x$ is a constant function, then (i) is true. Suppose in the following that $h_{i,k}^x$ is not a constant function and we show that (ii) holds. Take an arbitrary value $c \in \mathbb {R}$. Because $h_{i,k}^x$ is not a constant function and by Definition 3.1(a), $h_{i,k}^x-c$ is a non-zero bivariate polynomial of degree at most $d_0$. By Definition 3.1(b), the distribution of $u_k$ is $d_0d_1$-generic, which by Definition 3.2 implies that $\Pr [[h_{i,k}^x-c](u_k)=0]=0$. This implies that $\Pr [h_{i,k}^x(u_k)=c]=0$, i.e., $\Pr [\xi =c]=0$.□

Every pair of non-constant input coordinates must be either coupled or uncoupled as explained in the following result. We defer its proof to Appendix A.2.

Lemma A.2.

For every pair of non-constant input coordinates $\xi _1$ and $\xi _2$, exactly one of the following conditions is satisfied.

(i)

There exists a non-zero bivariate polynomial $f$ of degree at most $d_1$ such that $f(\xi _1,\xi _2)\equiv 0$. We say that $\xi _1$ and $\xi _2$ are coupled in this case.

(ii)

The distribution of $(\xi _1,\xi _2)$ is $d_1$-generic. In this case say $\xi _1$ and $\xi _2$ are said uncoupled.

In particular, if $\xi _1$ and $\xi _2$ are in different groups in $\mathcal {G}$, then $\xi _1$ and $\xi _2$ must be uncoupled.

We state two properties of every triple of input coordinates in the following two lemmas.

Lemma A.3.

Assume $p,q\in \mathbb {R}$ and $a=g_1(p,q),b=g_2(p,q),c=g_3(p,q)$, where $g_1,g_2,g_3$ are bivariate polynomials of degree at most $d_0$. Then, there exists a non-zero trivariate polynomial $f$ with degree at most $d_1$ such that $f(a,b,c)\equiv 0$. As an immediate corollary, for every triple of input coordinates $\xi _1$, $\xi _2$, and $\xi _3$ that are in the same group in $\mathcal {G}$, there exists a non-zero trivariate polynomial $f$ with degree at most $d_1$ such that $f(\xi _1,\xi _2,\xi _3)\equiv 0$.

Lemma A.4.

For every triple of non-constant input coordinates $\xi _1$, $\xi _2$, and $\xi _3$ that are pairwise uncoupled and are not all in the same group in $\mathcal {G}$, the distribution of $(\xi _1,\xi _2,\xi _3)$ is $d_1$-generic.

The proofs are deferred to A.3 and A.4. The following lemma is the main tool for learning an approximate partition. We defer its proof to Appendix A.5.

Lemma A.5.

Let $m$ and $d$ be two positive integer constants. Let $\xi _1,\ldots ,\xi _m$ be $m$ input coordinates. Suppose that exactly one of the following conditions is satisfied.

(i)

There exists a non-zero $m$-variate polynomial $f$ of degree at most $d$ such that $f(\xi _1,\ldots ,\xi _m)\equiv 0$.

(ii)

The distribution of $(\xi _1,\ldots ,\xi _m)$ is $d$-generic.

Then, using $\kappa ={m+d\choose m}$ input instances and $O(\kappa ^3)$ time, we can determine almost surely whether condition (i) or (ii) is satisfied.

By combining Lemmas A.1–A.5, we can perform three kinds of tests on input coordinates as stated in the following result.

Lemma A.6.

(i)

For every input coordinate $\xi$, using ${1+1\choose 1}=2$ input instances, we can determine almost surely whether $\xi$ is a constant input coordinate in $O(1)$ time.

(ii)

For every pair of non-constant input coordinates $\xi _1$ and $\xi _2$, using ${2+d_1\choose 2}$ input instances, we can determine almost surely whether $\xi _1$ and $\xi _2$ are coupled or uncoupled in $O(d_1^6)$ time.

(iii)

For every triple of non-constant input coordinates $\xi _1$, $\xi _2$, and $\xi _3$ that are pairwise uncoupled, using ${3+d_1\choose 3}$ input instances, we can determine almost surely whether $\xi _1$, $\xi _2$, and $\xi _3$ are in the same group in $\mathcal {G}$ in $O(d_1^9)$ time.

Proof.

Consider (i). By Lemma A.1, either $\xi$ is a constant input coordinate or $\Pr [\xi =c]=0$ for all $c \in \mathbb {R}$. The former case is equivalent to the existence of a non-zero univariate polynomial $f$ of degree 1 such that $f(\xi) \equiv 0$, and the latter case is equivalent to the distribution of $\xi$ being 1-generic. As a result, by Lemma A.5, we can distinguish between these two cases almost surely using ${1+1\choose 1} = 2$ input instances in $O(1)$ time.

The correctness of (ii) follows directly from Lemmas A.2 and A.5.

Consider (iii). Either one of the following two possibilities hold.

–

If $\xi _1$, $\xi _2$, and $\xi _3$ are in the same group in $\mathcal {G}$, then by Lemma A.3, there exists a non-zero trivariate polynomial of degree at most $d_1$ such that $f(\xi _1,\xi _2,\xi _3)\equiv 0$.

–

If $\xi _1$, $\xi _2$, and $\xi _3$ are not all in the same group in $\mathcal {G}$, then by Lemma A.4, the distribution of $(\xi _1,\xi _2,\xi _3)$ is $d_1$-generic.

By Lemma A.5, these two possibilities can be distinguished almost surely using ${3+d_1\choose 3}$ instances in $O(d_1^9)$ time.□

We are ready to show that we can construct an approximate partition of $[1,n]$ efficiently.

Proof of Lemma 3.4

We construct an approximate partition $\mathcal {G}^{\prime }$ as stated in Definition 3.3. First, we apply Lemma A.6(i) to decide for each input coordinate whether it is a constant input coordinate or not. This step requires $O(n)$ input instances and $O(n)$ time.

Afterwards, we can pick out the constant input points. The indices of these constant input points form $G_0^{\prime }$ in $\mathcal {G}^{\prime }$. For $j \ge 1$, we initialize each $G^{\prime }_j$ to contain a distinct index in $[1,n]\!\setminus \!G_0^{\prime }$. This gives the initial $\mathcal {G}^{\prime } = (G_0^{\prime },G_1^{\prime },G_2^{\prime },\ldots)$. Properties (a) and (b) in Definition 3.3 are satisfied, but property (c) may be violated. We show below how to merge groups in $\mathcal {G^{\prime }}$ step by step so that properties (a) and (b) are preserved and property (c) is satisfied in the end. Note that $G_0^{\prime }$ will not change throughout the successive merges.

There are two phases in the merging process.

Take two non-constant input coordinates $\xi _1$ and $\xi _2$ that are in distinct groups in the current $\mathcal {G}^{\prime }$. We apply Lemma A.6(ii) to test whether $\xi _1$ and $\xi _2$ are coupled. If the test reveals that $\xi _1$ and $\xi _2$ are coupled, by Lemma A.2, $\xi _1$ and $\xi _2$ must be in the same group in $\mathcal {G}$, and therefore, we update $\mathcal {G}^{\prime }$ by merging the groups in $\mathcal {G}^{\prime }$ that contain $\xi _1$ and $\xi _2$. We repeat this step until no two groups of $\mathcal {G}^{\prime }$ can be merged anymore. This completes phase one. We require $O(1)$ input instances and $O(1)$ time to test a pair of input coordinates. The rest can be done using a disjoint union-find data structure [11]. So a total of $O(n^2)$ input instances and $O(n^2\alpha (n))$ time are needed in phase one.

Take a triple of non-constant input coordinates $\xi _1$, $\xi _2$, and $\xi _3$ that are in three distinct groups in the current $\mathcal {G}^{\prime }$. Thanks to phase one, $\xi _1$, $\xi _2$, and $\xi _3$ are pairwise uncoupled as they are in distinct groups in $\mathcal {G}^{\prime }$. Therefore, Lemma A.6(iii) is applicable and we use it to decide whether $\xi _1$, $\xi _2$, and $\xi _3$ are in the same group in $\mathcal {G}$. If so, we update $\mathcal {G}^{\prime }$ by merging the groups in $\mathcal {G}^{\prime }$ that contain $\xi _1$, $\xi _2$, and $\xi _3$. We repeat this step until no three groups of $\mathcal {G}^{\prime }$ can be merged anymore. This completes phase two. We require $O(1)$ instances and $O(1)$ time to test a triple of non-constant input coordinates. So a total of $O(n^3)$ instances and $O(n^3\alpha (n))$ time are needed in phase two.

Clearly, properties (a) and (b) in Definition 3.3 are preserved throughout. Suppose for the sake of contradiction that there are at least three groups $G^{\prime }_i$, $G^{\prime }_j$, and $G^{\prime }_l$ in $\mathcal {G}^{\prime }$, where $i$, $j$ and $l$ are positive, that are contained in the same group $G_k \in \mathcal {G}$ after phase two. Since the input points in $G^{\prime }_i$, $G^{\prime }_j$, and $G^{\prime }_l$ are not constant input points, we can find three non-constant input coordinates $\xi _1$, $\xi _2$, and $\xi _3$ in $G^{\prime }_i$, $G^{\prime }_j$, and $G^{\prime }_l$, respectively. But then, the application of Lemma A.6(iii) to $\xi _1$, $\xi _2$, and $\xi _3$ in phase two must have told us to merge $G^{\prime }_i$, $G^{\prime }_j$, and $G^{\prime }_l$, a contradiction. This shows that property (c) in Definition 3.3 is satisfied after phase two.□

A.2 Proof of Lemma A.2

There are two cases depending on whether $\xi _1$ and $\xi _2$ are in the same group in $\mathcal {G}$.

Case 1: $\xi _1$ and $\xi _2$ are in different groups in $\mathcal {G}$. We show that the distribution of $(\xi _1,\xi _2)$ is $d_1$-generic. Take an arbitrary non-zero bivariate polynomial $f(\xi _1,\xi _2)$ of degree at most $d_1$. We can express $f(\xi _1,\xi _2)$ as $f_{0}(\xi _1) \cdot \xi _2^{d_1}+f_{1}(\xi _1) \cdot \xi _2^{d_1-1}+\ldots +f_{d_1-1}(\xi _1) \cdot \xi _2+f_{d_1}(\xi _1)$, where $f_i(\xi _1)$ is a polynomial in $\xi _1$ with degree at most $i$. Since $f$ is not the zero polynomial, there exists $j \in [0,d_1]$ such that $f_j(\xi _1)$ is not the zero polynomial whereas $f_i(\xi _1)$ is the zero polynomial for all $i \in [0,j-1]$. Let $R$ be the set of roots of the equation $f_j(\xi _1) = 0$. By the fundamental theorem of algebra, $|R| \le j \le d_1$.

Since $\xi _1$ is a non-constant input coordinate by assumption, Lemma A.1 implies that $\Pr [\xi _1 = c] = 0$ for all $c \in \mathbb {R}$. Therefore,

\begin{equation} \Pr [\xi _1 \in R] = 0. \end{equation}

(9)

Fix an arbitrary $r \in \mathbb {R}\!\setminus \!R$. Then, $f(r,\xi _2)$ becomes a polynomial $\hat{f}(\xi _2)$ of degree at most $d_1$. Also, $\hat{f}(\xi _2)$ is not the zero polynomial because the coefficient $f_j(r)$ of the monomial $\xi _2^{d_1-j}$ is non-zero by our choice of $r \in \mathbb {R}\!\setminus \!R$. The equation $\hat{f}(\xi _2) = 0$ has at most $d_1$ roots by the fundamental theorem of algebra. Since $\xi _1$ and $\xi _2$ are in different groups in $\mathcal {G}$ by the case assumption, $\xi _2$ is independent from $\xi _1$. Since $\xi _2$ is a non-constant input coordinate, Lemma A.1 implies that

\begin{equation} \forall \, r \in \mathbb {R}\!\setminus \!R, \quad \Pr [f(\xi _1,\xi _2) = 0 \, | \, \xi _1 = r] = 0. \end{equation}

(10)

Let $p : \mathbb {R}\rightarrow \mathbb {R}$ be the probability density function of the distribution of $\xi _1$. We have

Hence, the distribution of $(\xi _1,\xi _2)$ is $d_1$-generic by Definition 3.2.

Case 2: $\xi _1$ and $\xi _2$ are in the same group in $\mathcal {G}$. Conditions in (i) and (ii) in the lemma cannot both be true. Therefore, it suffices to prove that the distribution of $(\xi _1,\xi _2)$ is $d_1$-generic under the assumption that (i) does not hold.

Without loss of generality, assume that $\xi _1$ and $\xi _2$ are in the group $G_k \in \mathcal {G}$. Therefore, $\xi _1=h(u_k)$ and $\xi _2=h^{\prime }(u_k)$ for some bivariate polynomials of degrees at most $d_0$. Take an arbitrary non-zero bivariate polynomial $f(\xi _1,\xi _2)$. Substituting $\xi _1 = h(u_k)$ and $\xi _2 = h^{\prime }(u_k)$ into $f(\xi _1,\xi _2)$, we obtain a polynomial $\hat{f}(u_k)$ with degree at most $d_0d_1$. Moreover, $\hat{f}(u_k)$ is not the zero polynomial—otherwise, $f(\xi _1,\xi _2) \equiv 0$, contradicting our assumption that (i) does not hold. Then, since the distribution of $u_k$ is $d_0d_1$-generic by Definition 3.1(b), we conclude by Definition 3.2 that $\Pr [ \hat{f}(u_k)=0 ]=0$ and therefore $\Pr [ f(\xi _1,\xi _2)=0 ]=0$.

A.3 Proof of Lemma A.3

The corollary part of this lemma is trivial. We only need to prove the first part of the lemma.

Our proof makes use of terminologies about algebraic varieties and ideals (e.g., [13]).

Denote by $\mathbb {R}[z_1,\ldots ,z_d]$ the ring whose elements are sums of monomials in $z_1,\ldots ,z_d$ with real coefficients. Define three polynomials $f_1,f_2,f_3$ in $\mathbb {R}[p,q,r,s,t]$ as follows:

\begin{equation*} f_1 := r - g_1(p,q), \qquad f_2 := s - g_2(p,q), \qquad f_3 := t - g_3(p,q). \end{equation*}

Let $\mathcal {J}$ be the ideal generated by $f_1$, $f_2$, and $f_3$. That is,

\begin{equation*} \mathcal {J} := \left\lbrace \sum \limits _{i=1}^{3} \alpha _i f_i : \alpha _1,\alpha _2,\alpha _3 \in \mathbb {R}[p,q,r,s,t] \right\rbrace . \end{equation*}

Let $\mathcal {B}$ be the unique reduced Gröbner basis of $\mathcal {J}$ with respect to the lex order $p \gt q \gt r \gt s \gt t$. (The uniqueness is given in [13, p. 92, Prop 6]. A basis of $\mathcal {J}$ is any set $f^{\prime }_1,\ldots ,f^{\prime }_m$ of polynomials such that $\mathcal {J}=\lbrace \sum _{i=1}^{m}\alpha _if^{\prime }_i:\alpha _1,\ldots ,\alpha _m\in \mathbb {R}[p,q,r,s,t]\rbrace$, and a Gröbner basis of $\mathcal {J}$ is a basis of $\mathcal {J}$ with a special property [13, p. 77, Def 5], and a reduced Gröbner basis requires further properties [13, p. 92, Def 5].) The degrees of $f_1$, $f_2$, and $f_3$ are at most $d_0$ because the degrees of $g_1$, $g_2$, and $g_3$ are at most $d_0$ by assumption. By the result of Dubé [15], the degrees of polynomials in $\mathcal {B}$ are bounded by $2((d_0^2/2) + d_0)^{16} = d_1$.

Let $\mathcal {J}_2$ be the second elimination ideal of $\mathcal {J}$ with respect to the same lex order $p \gt q \gt r \gt s \gt t$; formally, $\mathcal {J}_2=\mathcal {J} \cap \mathbb {R}[r,s,t]$. Applying the Elimination Theorem [13, p. 116, Thm 2], $\mathcal {B}_2 = \mathcal {B} \cap \mathbb {R}[r,s,t]$ is a Gröbner basis of $\mathcal {J}_2$.

Assume for the time being that $\mathcal {J}_2$ does not consist of the zero polynomial alone. This implies the existence of a non-zero polynomial in $\mathcal {B}_2$, denoted by $f$. The degree of $f$ is at most $d_1$ because $f \in \mathcal {B}_2\subseteq \mathcal {B}$ and the degree of every polynomial in $\mathcal {B}$ is at most $d_1$.

Because $f_1$, $f_2$, and $f_3$ form a basis of $\mathcal {J}$, whereas $f(r,s,t)\in \mathcal {J}$, there exist $\alpha _1,\alpha _2,\alpha _3 \in \mathbb {R}[p,q,r,s,t]$ such that $f(r,s,t) = \sum _{i=1}^3 \alpha _i(p,q,r,s,t) f_i(p,q,r,s,t) = 0$. By the definitions of $f_1$, $f_2$, and $f_3$, it is obvious that for $i \in [1,3]$, $f_i(p,q,a,b,c) = 0$ for all $p,q \in \mathbb {R}$. Together, for every $p,q\in \mathbb {R}$, $f(a,b,c)\equiv 0$, thereby establishing the correctness of the lemma.

What remains to be proved is that $\mathcal {J}_2$ does not consist of the zero polynomial alone. Let $\mathbb {C}$ be the set of complex numbers. Let $\mathbb {C}[z_1,\ldots ,z_d]$ denote the ring whose elements are sums of monomials in $z_1,\ldots ,z_d$ with complex coefficients. Consider the ideal ${\mathbb {J}}$ generated by $f_1$, $f_2$ and $f_3$ in $\mathbb {C}[p,q,r,s,t]$. That is, ${\mathbb {J}} = \bigl \lbrace \sum _{i=1}^3 \beta _if_i : \beta _1,\beta _2,\beta _3 \in \mathbb {C}[p,q,r,s,t] \bigr \rbrace$. Let ${\mathbb {J}}_2$ be the second elimination ideal of ${\mathbb {J}}$, i.e., ${\mathbb {J}}_2={\mathbb {J}}\cap \mathbb {C}[r,s,t]$. It is known that if we compute the reduced Gröbner basis of $\mathcal {J}$ using Buchberger’s algorithm [13], the result is also the reduced Gröbner basis of ${\mathbb {J}}$. Accordingly, $\mathcal {J}_2$ consists of the zero polynomial alone if and only if ${\mathbb {J}}_2$ consists of the zero polynomial alone. Thus, it reduces to showing that ${\mathbb {J}}_2$ does not consist of the zero polynomial alone.

Let $V({\mathbb {J}})=\lbrace (u,v,x,y,z) \in \mathbb {C}^5 : \forall \, f \in {\mathbb {J}}, \, f(u,v,x,y,z)=0 \rbrace$, i.e., the subset of $\mathbb {C}^5$ at which all polynomials in ${\mathbb {J}}$ vanish. Similarly, let $V({\mathbb {J}}_2)= \lbrace (x,y,z) \in \mathbb {C}^3 : \forall \, f \in {\mathbb {J}}_2, \, f(x,y,z)=0 \rbrace$. Define the projection $\varphi : \mathbb {C}^5 \rightarrow \mathbb {C}^3$ such that $\varphi (u,v,x,y,z) = (x,y,z)$. Then, $\varphi (V({\mathbb {J}}))$ is the image of $V({\mathbb {J}})$ under $\varphi$. Let $\overline{\varphi (V({\mathbb {J}}))}$ be the Zariski closure of $\varphi (V({\mathbb {J}}))$, i.e., the smallest affine algebraic variety containing $\varphi (V({\mathbb {J}}))$ [13].

Because ${\mathbb {J}}$ is generated by $r - g_1(p,q)$, $s -g_2(p,q)$, and $t-g_3(p,q)$, we have $r=g_1(p,q)$, $s=g_2(p,q)$, and $t=g_3(p,q)$ for every element $(p,q,r,s,t) \in V({\mathbb {J}})$. In other words, once $p$ and $q$ are fixed, the values of $r$, $s$ and $t$ are completely determined. Hence, $V({\mathbb {J}})$ is isomorphic to $\mathbb {C}^2$, which implies that $\dim (V({\mathbb {J}}))=\dim (\mathbb {C}^2)=2$.

By the Closure Theorem [13, p. 125, Thm 3], $V({\mathbb {J}}_2)=\overline{\varphi (V({\mathbb {J}}))}$. Therefore, $\dim (V({\mathbb {J}}_2))=\dim (\overline{\varphi (V({\mathbb {J}}))})$. It is also known that $\dim (\overline{\varphi (V({\mathbb {J}}))}) \le \dim (V({\mathbb {J}}))$ [13].

Altogether $\dim (V({\mathbb {J}}_2))\le \dim (V({\mathbb {J}}))=2$. Therefore, $V({\mathbb {J}}_2)\ne \mathbb {C}_3$ as $\dim (\mathbb {C}_3)=3$. This completes the proof because if ${\mathbb {J}}_2$ consists of the zero polynomial alone, $V({\mathbb {J}}_2)$ would be equal to $\mathbb {C}_3$.

A.4 Proof of Lemma A.4

As $\xi _1$, $\xi _2$, and $\xi _3$ are not all in the same group in $\mathcal {G}$, we can assume without loss of generality that $\xi _3$ is in a group in $\mathcal {G}$ different from those to which $\xi _1$ and $\xi _2$ belong. By Definition 3.2, we need to prove that for every non-zero trivariate polynomial $f$ in $\xi _1$, $\xi _2$, and $\xi _3$ that has real coefficients and degree at most $d_1$, $\Pr [f({\xi }_1,\xi _2,\xi _3) = 0 ]= 0$.

Express $f$ as $f_0(\xi _1,\xi _2) \cdot \xi _3^{d_1}+f_1(\xi _1,\xi _2) \cdot \xi _3^{d_1-1}+\ldots +f_{d_1-1}(\xi _1,\xi _2) \cdot \xi _3+f_{d_1}(\xi _1,\xi _2)$, where $f_i(\xi _1,\xi _2) \in \mathbb {R}[\xi _1,\xi _2]$ and $f_i$ has degree at most $i$. Since $f$ is a non-zero polynomial, there exists $j \in [0,d_1]$ such that $f_j$ is a non-zero polynomial and $f_i$ is the zero polynomial for $i \in [0,j-1]$.

Denote by $R$ the set of roots of the equation $f_j(\xi _1,\xi _2) = 0$. By the assumption of the lemma, $\xi _1$ and $\xi _2$ are uncoupled, which means that the distribution of $(\xi _1,\xi _2)$ is $d_1$-generic. Thus, it follows from Definition 3.2 that $\Pr [ f_j(\xi _1,\xi _2) = 0] = 0$. In other words,

\begin{equation} \Pr [ (\xi _1,\xi _2) \in R] = 0. \end{equation}

(11)

Fix an arbitrary point $(r,s) \in \mathbb {R}^2\!\setminus \!R$. Then, $f(r,s,\xi _3)$ is a polynomial $g(\xi _3)$ in $\xi _3$ with degree at most $d_1$. Moreover, $g(\xi _3)$ is not the zero polynomial because the coefficient $f_j(r,s)$ of the monomial $\xi _3^{d_1-j}$ is non-zero by our choice of $(r,s) \in \mathbb {R}^2\!\setminus \!R$. By the fundamental theorem of algebraic, there are at most $d_1$ roots to the equation $g(\xi _3) = 0$. Since $\xi _3$ is not in the same group as $\xi _1$ or $\xi _2$, the distribution of $\xi _3$ is independent from that of $(\xi _1,\xi _2)$. As $\xi _3$ is a non-constant input coordinate, the probability of $\xi _3$ being a root of $g(\xi _3)= 0$ is zero. Hence,

\begin{equation} \forall \, (r,s) \in \mathbb {R}^2\!\setminus \!R, \quad \Pr [ f(\xi _1,\xi _2,\xi _3) = 0 \, | \, \xi _1 = r, \xi _2 = s ] = 0. \end{equation}

(12)

Let $p : \mathbb {R}^2 \rightarrow \mathbb {R}$ be the joint probability density function of the distribution of $(\xi _1,\xi _2)$. We have

Hence, the distribution of $(\xi _1,\xi _2,\xi _3)$ is $d_1$-generic by Definition 3.2, establishing the lemma.

A.5 Proof of Lemma A.5

We first define an extension of a vector in $\mathbb {R}^m$ as follows. Given a positive integer $d$ and a vector $(r_1,\ldots ,r_m)\in \mathbb {R}^m$, we can extend the vector to a longer one that consists of all possible monomials in $r_1,\ldots ,r_m$ whose degrees are at most $d$. Let $\kappa = {{m+d}\choose {m}}$. There are $\kappa$ such monomials, and we list them out in lexicographical order, i.e., $r_1^{d_1} \cdots r_m^{d_m} \lt r_1^{d^{\prime }_1} \cdots r_m^{d^{\prime }_m}$ if and only if there exists $j \in [1,m]$ such that for $d_i = d^{\prime }_i$ for $i \in [1,j-1]$ and $d_j \lt d^{\prime }_j$. We use ${\mathcal {E}}_d(r_1,\cdots ,r_m)$ to denote the extended vector of $(r_1,\ldots ,r_m)$ as defined above.

Let $\xi _1,\ldots ,\xi _m$ be the input coordinates that we are interested in. Draw a sample of $\kappa$ input instances. For $i \in [1,\kappa ]$, let $(\xi _1^{(i)},\ldots , \xi _m^{(i)})$ denote the instance of $(\xi _1,\ldots ,\xi _m)$ in the $i$th input instance drawn. For $i \in [1,\kappa ]$, let $q_i = {\mathcal {E}}_{d}(\xi _1^{(i)}, \ldots , \xi _m^{(i)})$. We show that we can use $q_1,\ldots ,q_\kappa$ to decide whether condition (i) or (ii) below is satisfied by $\xi _1,\xi _2,\ldots ,\xi _m$, assuming that exactly one of conditions (i) and (ii) is satisfied.

–

Condition (i): There exists a non-zero $m$-variate polynomial $f$ of degree at most $d$ such that $f(\xi _1,\ldots ,\xi _m)\equiv 0$.

–

Condition (ii): The distribution of $(\xi _1,\ldots ,\xi _m)$ is $d$-generic.

Our method is to test the linear dependence of $q_1,\ldots ,q_{\kappa }$ by running Gaussian elimination on the $\kappa \times \kappa$ matrix with columns equal to $q_1,\ldots ,q_{\kappa }$. This takes $O(\kappa ^3)$ time. If $q_1,\ldots ,q_{\kappa }$ are found to be linear dependent, we report that condition (i) is satisfied. Otherwise, we report that condition (ii) is satisfied. To show that our answer is correct almost surely, it suffices to prove the following statements:

–

If condition (i) holds, then $q_1,\ldots ,q_{\kappa }$ are linearly dependent.

–

If condition (ii) holds, then $q_1,\ldots ,q_{\kappa }$ are linearly independent almost surely.

Suppose that condition (i) holds. Then, there is a non-zero $m$-variate polynomial $f$ of degree $d$ such that $f(\xi _1,\ldots ,\xi _m)\equiv 0$. Observe that $f(\xi _1,\ldots ,\xi _m)$ is equal to the inner product $\langle {\mathcal {E}}_{d}(\xi _1,\ldots ,\xi _m),a \rangle$ for some non-zero vector $a\in \mathbb {R}^\kappa$. It follows that for $i \in [1,\kappa ]$, $\langle q_i,a\rangle = f(\xi _1^{(i)},\ldots ,\xi _m^{(i)}) = 0$. In other words, the vectors $q_1,\ldots ,q_\kappa$ in $\mathbb {R}^{\kappa }$ are all orthogonal to $a$. Since the orthogonal complement of $a$ has dimension $\kappa -1$, the vectors $q_1, \ldots , q_{\kappa }$ must be linearly dependent.

Suppose that condition (ii) holds. We prove by induction that $q_{1},\ldots ,q_{j}$ are linearly independent almost surely for $j = 1, 2, \ldots ,\kappa$.

Consider the base case of $j = 1$. The vector $q_1$ is linearly independent because $q_1 \not= (0,\ldots ,0)$. This is because the first dimension of $q_1$ is always 1.

Assume the induction hypothesis for some $j=k \lt \kappa$. Consider $j=k+1$. Since $k \lt \kappa$, the $k$ vectors $q_1, \ldots , q_k$ cannot span $\mathbb {R}^{\kappa }$. Therefore, there exists a non-zero vector $a$ that is orthogonal to $q_i$ for all $i \in [1,k]$. Consider the equation $\langle {\mathcal {E}}_{d}(\xi _1, \ldots , \xi _m)\cdot a \rangle =0$. We can write this equation as $f({\xi }_1,\ldots ,\xi _m) = 0$, where $f$ is a sum of monomials in ${\xi }_1,\ldots ,\xi _m$ with coefficients equal to the corresponding entries in $a$. Since $a$ is non-zero vector, $f$ is a non-zero $m$-variate polynomial of degree at most $d$. By condition (ii), the distribution of $(\xi _1,\ldots ,\xi _m)$ is $d$-generic, which gives $\Pr [f({\xi }_1,\ldots ,\xi _m) = 0 ]= 0$ according to Definition 3.2. Therefore, $\Pr [\langle q_{k+1}, a \rangle = 0] = \Pr [f({\xi }^{(k+1)}_1,\ldots ,\xi ^{(k+1)}_m) = 0] = 0$.

By our choice of $a$, we have $\langle q_i, a \rangle = 0$ for $i \in [1,k]$. It implies that $\langle q_{k+1}, a \rangle = 0$ if $q_{k+1}$ is equal to some linear combination of $q_1,\ldots ,q_k$. Therefore, $q_{k+1}$ is not a linear combination of $q_1,\ldots ,q_{k}$ almost surely because $\Pr [\langle q_{k+1}, a \rangle = 0] = 0$. Together with the induction hypothesis that $q_1,\ldots ,q_k$ are linear independent almost surely, we conclude that $q_1,\ldots ,q_{k+1}$ are linearly independent almost surely.

A.6 Example: Construction of SplitT(Q_i) Boils Down to O(|Q_i|) Nearest Common Ancestor Queries in SplitT(P), as Claimed in the Proof of Lemma 3.10

We use an example to demonstrate the following result used by Lemma 3.10: Computing $\mathit {SplitT}(Q_i)$ boils down to $O(|Q_i|)$ nca queries in $\mathit {SplitT}(P)$.

Our example is shown in Figure 1. The leftmost picture draws $\mathit {SplitT}(P)$, where the nodes labeled with $b_1,\ldots ,b_6$ are in $Q_i$. Let $a_i$ denote the nca of $b_i$ and $b_{i+1}$ for $1\le i\le 5$.

Fig. 1.

We can construct $\mathit {SplitT}(Q_i)$ incrementally after the ancestors $a_1,\ldots ,a_5$ are preprocessed.

In the first step, simply insert $b_1$ into $\mathit {SplitT}(Q_i)$.

In the second step, insert $a_1$ and $b_2$ into $\mathit {SplitT}(Q_i)$. Set the left child of $a_1$ to be $b_1$. (The right child of $a_1$ is undetermined yet and will be assigned later. It may not be $b_2$.)

At this place, we point out that our algorithm uses a stack $S$ to store those nodes in $\lbrace a_1,\ldots ,a_5\rbrace$ that are on the path from the root (of $\mathit {SplitT}(P)$) to $b_i$ after the $i$th step.

$S=(a_1)$ so far. In the third step, as $a_2$ is deeper than $a_1$, node $a_2$ is pushed into $S$. Set the left child of $a_2$ to be $b_2$. (The right child of $a_2$ is undetermined yet.)

In the fourth step, as $a_3$ is deeper than $a_2$, node $a_3$ is pushed into $S$. Set the left child of $a_3$ to be $b_3$. (The right child of $a_3$ is undetermined yet.)

In the fifth step, as $a_4$ is lower than $a_3$, node $a_3$ is popped out from $S$ and its right child is set to be $b_4$. Moreover, $a_4$ is still lower than $a_2$, so node $a_2$ is popped out from $S$ and its right child is set to be $a_3$. Then, $a_4$ is pushed into $S$, and the left child of $a_4$ is set to be $a_2$.

In the sixth step, as $a_5$ is lower than $a_4$, node $a_4$ is popped out from $S$ and its right child is set to be $b_5$. Moreover, $a_5$ is still lower than $a_1$, so node $a_1$ is popped out from $S$ and its right child is set to be $a_4$. Then, $a_5$ is pushed into $S$, and the left child of $a_5$ is set to be $a_1$.

Finally, the right child of $a_5$ is set to be $b_6$, and $a_5$ is chosen as the root of $\mathit {SplitT}(Q_i)$.

It is easy to design the final algorithm, and the algorithm clearly runs in $O(|Q_i|)$ time.

References

[1]

N. Ailon, B. Chazelle, K. Clarkson, D. Liu, W. Mulzer, and C. Seshadhri. 2011. Self-improving algorithms. SIAM Journal on Computing 40, 2 (2011), 350–375. DOI:

Abstract

A Appendix

A.1 Learning an Approximate Partition for DT – Proof of Lemma 3.4

A.2 Proof of Lemma A.2

A.3 Proof of Lemma A.3

A.4 Proof of Lemma A.4

A.5 Proof of Lemma A.5

A.6 Example: Construction of SplitT(Qi) Boils Down to O(|Qi|) Nearest Common Ancestor Queries in SplitT(P), as Claimed in the Proof of Lemma 3.10

References

Index Terms

Recommendations

Work-Time Optimal k-Merge Algorithms on the PRAM

Self-Improving Algorithms

Parallel algorithms with ultra-fast expected times

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Login options

Full Access

View options

PDF

eReader

Full Text

HTML Format

Share

Share this Publication link

Share on social media

Affiliations

A.6 Example: Construction of SplitT(Q_i) Boils Down to O(|Q_i|) Nearest Common Ancestor Queries in SplitT(P), as Claimed in the Proof of Lemma 3.10