\xapptobibmacro

finentry\printfieldNOTE

Metric Dimension and Resolvability of Jaccard Spaces

Manuel E. Lladser Department of Applied Mathematics, University of Colorado, Boulder, USA Corresponding author: manuel.lladser@colorado.edu Alexander J. Paradise Department of Applied Mathematics, University of Colorado, Boulder, USA

Abstract

A subset of points in a metric space is said to resolve it if each point in the space is uniquely characterized by its distance to each point in the subset. In particular, resolving sets can be used to represent points in abstract metric spaces as Euclidean vectors. Importantly, due to the triangle inequality, points close by in the space are represented as vectors with similar coordinates, which may find applications in classification problems of symbolic objects under suitably chosen metrics. In this manuscript, we address the resolvability of Jaccard spaces, i.e., metric spaces of the form $(2^{X},\text{Jac})$ , where $2^{X}$ is the power set of a finite set $X$ , and Jac is the Jaccard distance between subsets of $X$ . Specifically, for different $a,b\in 2^{X}$ , $\text{Jac}(a,b)=|a\Delta b|/|a\cup b|$ , where $|\cdot|$ denotes size (i.e., cardinality) and $\Delta$ denotes the symmetric difference of sets. We combine probabilistic and linear algebra arguments to construct highly likely but nearly optimal (i.e., of minimal size) resolving sets of $(2^{X},\text{Jac})$ . In particular, we show that the metric dimension of $(2^{X},\text{Jac})$ , i.e., the minimum size of a resolving set of this space, is $\Theta(|X|/\ln|X|)$ . In addition, we show that a much smaller subset of $2^{X}$ suffices to resolve, with high probability, all different pairs of subsets of $X$ of cardinality at most $\sqrt{|X|}/\ln|X|$ , up to a factor.

Keywords. Jaccard distance, metric dimension, metric space, multilateration, resolving set

1 Introduction

A metric space is an ordered-pair of the form $(X,d)$ , where $X$ is a nonempty set, and $d:X\times X\to\mathbb{R}$ a function satisfying that $d(x,y)=d(y,x)\geq 0$ , $d(x,y)=0$ if and only if $x=y$ , and $d(x,y)\leq d(x,z)+d(z,y)$ , for all $x,y,z\in X$ . In particular, $d$ is non-negative, symmetric, and satisfies the triangular inequality. We say the metric space is finite when $|X|<+\infty$ .

Resolvability extends the concept of trilateration of the plane to general metric spaces; in particular, it includes the vertex set of connected graphs endowed with shortest path distances between vertices—which is where the concept originated [5, 21, 8]. In a metric space $(X,d)$ , a non-empty set $R=\{r_{i}:i\in I\}\subset X$ , with $I=\big{\{}1,\ldots,|R|\big{\}}$ , is said to resolve it when the transformation

d(x|R):=\big{(}d(x,r_{i})\big{)}_{i\in I},\text{ for each }x\in X,

(1)

is one-to-one. In particular, $d(\cdot|R)$ uniquely encodes points in $X$ as $|R|$ -dimensional real vectors; and, owing to the triangular inequality, proximate points in $X$ are encoded as vectors with similar coordinates. Resolving sets thus enable sound embeddings of metric spaces into Euclidean ones, which can be useful for generating numerical features of symbolic objects in statistical and machine learning tasks like regression or classification [24, 20].

One can think of a resolving set as a collection of “landmarks” in a metric space that uniquely identify the “location” of any point in that space by its distance to those landmarks. In that regard, resolvability serves as a form of “multi-lateration” of the space, similar to tri-lateration, although more than three landmarks may be needed to resolve a given metric space.

Irrespective of the metric space, resolving sets always exist, although they are never unique in non-trivial settings. This is because $X$ always resolves $(X,d)$ , and if $R$ resolves $(X,d)$ and $S\supset R$ , then $S$ also resolves it. So, finding a resolving set is straightforward. In contrast, finding a resolving set with the smallest possible size is usually challenging; in fact, it is an NP-complete problem in arbitrary finite metric spaces [11, 6]. Minimizing the size of a resolving set is nonetheless crucial to embedding the points in $X$ into a low-dimensional Euclidean space using transformations of the form (1). This motivates the notion of metric dimension, which is the size of the smallest resolving set of a metric space $(X,d)$ , denoted from now on as $\beta(X,d)$ .

For a concise overview of resolvability and metric dimension in the context of graph theory, see [22]. Instead, for a comprehensive review of these and related concepts, see [23, 14].

A very limited number of studies have addressed the resolvability of non-graphical metric spaces in the literature [16, 2], as most efforts have focused on finite graphs [4]. Nevertheless, spaces with metric dimensions 1 or 2 have been characterized under general topological assumptions [16, 2]. It is also known that the metric dimension of a $k$ -dimensional subspace of $\mathbb{R}^{n}$ with respect to the Euclidean distance is $(k+1)$ ; in particular, $(\mathbb{R}^{n},\|\cdot\|_{2})$ has metric dimension $(n+1)$ [2]. The hypersphere $(\mathbb{S}^{n},\|\cdot\|_{2})$ has also metric dimension $(n+1)$ . Additionally, the metric dimension of the hyperbolic space $\mathbb{H}^{n}$ with respect to the metric $d(x,y):=\int_{x}^{y}dx/x_{n}$ , for all $x,y\in\mathbb{H}^{n}$ , is $(n+1)$ [2]. Likewise, the metric dimension of the $n$ -dimensional unit ball $\mathbb{B}^{n}$ with respect to the metric $d(x,y):=\int_{x}^{y}2\,|dx|/(1-\|x\|^{2})$ , with $x,y\in\mathbb{B}^{n}$ , is $(n+1)$ [2].

In contrast, the systematic study of the resolvability and metric dimension of non-graphical, finite, metric spaces is essentially unexplored. In this paper, we study the resolvability of finite Jaccard metric spaces, i.e., metric spaces of the form $(2^{X},{\text{Jac}})$ , where $2^{X}$ denotes the power set of a finite set $X$ , and Jac is the Jaccard distance between subsets of $X$ [10]. Namely, for all $a,b\in 2^{X}$ ,

{\text{Jac}}(a,b):=\begin{cases}\frac{|a\Delta b|}{|a\cup b|},&a\neq b;\\ 0,&a=b.\end{cases}

Jac is a metric in $2^{X}$ [7, 13]. (In the literature, for distinct $a,b\in 2^{X}$ , the quantity $1-{\text{Jac}}(a,b)=|a\cap b|/|a\cup b|$ is referred to as the Jaccard similarity. This index is widely used in fields such as information retrieval, data mining, and natural language processing, among many others.)

Given that $X$ is finite in our setting, we may, in principle, estimate $\beta(2^{X},{\text{Jac}})$ and find non-trivial resolving sets with the so-called Information Content Heuristic (ICH) [9]. In a general setting, the input of this algorithm is the (symmetric) distance matrix between all pairs of points in a metric space, and the output is a subset of columns that resolve it, which is determined greedily through an entropy maximization procedure. Unfortunately, however, in the context of Jaccard spaces, the ICH is infeasible even for moderate values of $|X|$ because of its $O(2^{3|X|})$ time complexity.

Nevertheless, besides being of theoretical interest, learning to resolve optimally or nearly optimally Jaccard spaces may find applications in e.g. lexicon-based approaches to natural language processing (NLP). In the most basic implementation of this idea, $X$ would be the set of all words in a language and sentences represented as subsets of $X$ (aka, bag of words). The Jaccard distance is then a natural way to assess the similarity of sentences based on the words used, and a resolving set would induce a numerical encoding of sentences, mapping sentences with similar word content into vectors with similar coordinates, potentially providing low-dimensional feature vectors to learn to classify or regress sentences based on their lexicon [17].

1.1 Main Results

In what remains of this manuscript, $X$ is assumed to be a finite non-empty set.

In this section, we outline our key findings, with expanded statements and proofs provided in Section 2.

From now on, the Jaccard distance is the reference metric in $2^{X}$ ; in particular, e.g., statements like “ $R$ resolves $2^{X}$ ,” mean that “ $R\subset 2^{X}$ resolves $(2^{X},{\text{Jac}})$ .” We also say that $R$ resolves $a,b\in 2^{X}$ when there exists $r\in R$ such that ${\text{Jac}}(a,r)\neq{\text{Jac}}(b,r)$ .

We first provide a necessary condition for a set $R$ to resolve $2^{X}$ .

Proposition 1.1.

If $R$ resolves $2^{X}$ then $R$ separates the distinct elements of ${X}$ , and it covers all but possibly one element in $X$ .

The proof of the proposition can be found in Section 2.1. We note that these properties are necessary but not sufficient. For instance, if ${X}=\{1,2,3,4\}$ and $R=\big{\{}\{1,2\},\{1,3\},\{1,4\}\big{\}}$ then $R$ separates different elements in ${X}$ and also covers it. Nevertheless, $R$ is not resolving because ${\text{Jac}}(\{1\}|R)=(1/2,1/2,1/2)={\text{Jac}}({X}|R)$ . This counterexample can be easily generalized to sets $X$ of arbitrary size.

Next, we provide a lower bound on the size of any resolving subset of $2^{X}$ ; in particular, this is also a lower bound for $\beta(2^{X},{\text{Jac}})$ .

Proposition 1.2.

If $R$ resolves $2^{X}$ then

|R|\geq\frac{|X|(\ln 2)\left(1+o(1)\right)-2\ln\left(|X|/2\right)-1}{\ln\left(% |X|/2+1\right)}\sim\frac{|X|\ln 2}{\ln|X|}.

The proof of the proposition can be found in Section 2.2.

To state our main two results we require the following definition.

Definition 1.1.

A random $r\in 2^{X}$ is said to have a Binomial $(X,1/2)$ distribution, in which case we write $r\sim\text{Binomial}(X,1/2)$ , when ${\mathbb{P}}(x\in r)=1/2$ for each $x\in X$ , and the events $[x\in r]$ with $x\in X$ are independent.

Clearly, if $r\sim\text{Binomial}(X,1/2)$ then $|r|\sim\text{Binomial}(|X|,1/2)$ ; namely, ${\mathbb{P}}\big{(}|r|=k\big{)}=\frac{1}{2^{|X|}}{|X|\choose k}$ for $k=0,\ldots,|X|$ .

Theorem 1.1.

If $k\geq\frac{2\ln(2e)|X|}{\ln(|X|/2)}$ and $r_{1},\ldots,r_{k}\sim\text{Binomial}(X,1/2)$ are independent and identically distributed (i.i.d.), then, for each $x\in X$ , $R:=\big{\{}\emptyset,\{x\},X\setminus\{x\},r_{1},\ldots,r_{k}\big{\}}$ resolves $2^{X}$ , with overwhelmingly high probability, as $|X|\to\infty$ .

The proof of the theorem can be found in Section 2.5 and relies on auxiliary results in Sections 2.3 and 2.4.

In conjunction, Proposition 1.2 and Theorem 1.1 imply that

\frac{(\ln 2)|X|}{\ln(|X|/2)}\left(1+o(1)\right)\leq\beta(2^{X},{\text{Jac}})% \leq\frac{2\ln(2e)|X|}{\ln(|X|/2)}\left(1+o(1)\right),

which characterizes the metric dimension of $2^{X}$ with respect to the Jaccard distance within a factor of $\frac{2\ln(2e)}{\ln 2}\approx 5.0$ . In particular, we can assert the following.

Corollary 1.1.

$\beta(2^{X},{\text{Jac}})=\Theta\left(\frac{|X|}{\ln|X|}\right)$ , as $|X|\to\infty$ .

It turns out that, for any $x\in X$ , the set $\big{\{}\emptyset,\{x\},X\setminus\{x\}\big{\}}$ resolves all pairs of subsets of $X$ with different cardinalities (see Lemma 2.1 ahead). So the crux of the proof of Theorem 1.1 lies in showing that the sets in $\{r_{1},\ldots,r_{k}\}$ resolve all possible pairs $a,b\in 2^{X}$ of equal size—with overwhelmingly high probability—when $|X|$ is large. We demonstrate this in Section 2.5.

In the context of potential NLP applications outlined in the Introduction, it is unclear whether the highly likely resolving set proposed in Theorem 1.1 is of any practical value for distinguishing between bags-of-words of different cardinalities. This is because the numerical encoding in (1) based on this set might differentiate such pairs solely based on the presence or absence of a single word or token, which seems too coarse for practical use in NLP classification (or regression) problems. Our following result addresses this issue by proposing a less contrived set, which is likely to resolve all pairs of bags-of-words of different cardinalities. Its proof can be found in Section 2.6.

Theorem 1.2.

Let $\epsilon>0$ . If $k\geq(4+\epsilon)\sqrt{|X|}$ and $r_{1},\ldots,r_{k}\sim\text{Binomial}(X,1/2)$ are i.i.d., then $R:=\big{\{}r_{1},r_{1}^{c},\ldots,r_{k},r_{k}^{c}\big{\}}$ resolves all pairs of subsets of $X$ of different size, with overwhelmingly high probability, as $|X|\to\infty$ .

As expected, the lower bound for the size of the set $R$ in Theorem 1.2 is asymptotically negligible compared to the one in Theorem 1.1; after all, the former set is only required to resolve pairs of subsets of $X$ with different cardinalities, which, as explained earlier, can be accomplished using just three subsets of $X$ (i.e., the empty set, and any singleton and its complement). Nevertheless, in practical situations—for instance, when representing social media posts as bags-of-words—more often than not, a random pair of posts would be associated with bags-of-words of different cardinality. In particular, in terms of the numerical encoding in (1), Theorem 1.2 suggests that $O(\sqrt{|X|})$ Jaccard distances, as opposed to $\Theta\left(|X|/\ln|X|\right)$ , should suffice in practice to encode posts effectively when the reference lexicon $X$ is sufficiently large. Our following result makes this intuition precise at the expense of limiting the size of bags-of-words one wishes to resolve.

Corollary 1.2.

Let $0<\epsilon<1$ . If $k\geq(4+\epsilon)\sqrt{|X|}$ and $r_{1},\ldots,r_{k}\sim\text{Binomial}(X,1/2)$ are i.i.d., then the set $R:=\big{\{}r_{1},r_{1}^{c},\ldots,r_{k},r_{k}^{c}\big{\}}$ resolves all different pairs of subsets of $X$ of size at most $\frac{(1-\epsilon)(\ln\pi)\sqrt{|X|}}{\ln|X|}$ , with overwhelmingly high probability, as $|X|\to\infty$ .

2 Technical Results and Proofs

2.1 Necessary Conditions for Resolvability

In this section, we prove Proposition 1.1. Specifically, suppose that $R$ resolves $2^{X}$ . Next we show that the following properties applies:

(i)

For all $x_{1},x_{2}\in{X}$ with $x_{1}\neq x_{2}$ , there exists $r\in R$ such that either $x_{1}\in r$ and $x_{2}\notin r$ , or $x_{1}\notin r$ and $x_{2}\in r$ .
(ii)

If $\emptyset\notin R$ , then $R$ covers ${X}$ , i.e., $\bigcup_{r\in R}r={X}$ .
(iii)

If $\emptyset\in R$ , then $R$ covers ${X}$ , or there exists $x\in{X}$ such that $\bigcup_{r\in R}r={X}\setminus\{x\}$ .

To show the property (i), suppose by contradiction that there are distinct $x_{1},x_{2}\in{X}$ such that, for each $r\in R$ , $\{x_{1},x_{2}\}\subset r$ or $\{x_{1},x_{2}\}\subset{X}\setminus r$ . In the first case: ${\text{Jac}}(\{x_{1}\},r)=1-1/|r|={\text{Jac}}(\{x_{2}\},r)$ , and in the second case: ${\text{Jac}}(\{x_{1}\},r)=1={\text{Jac}}(\{x_{2}\},r)$ . In either case, $R$ could not possibly be resolving, which shows the first property.

To show the property (ii), suppose that there is $x\in{X}$ , which does not belong to any of the sets in $R$ . Then, for each $r\in R$ , ${\text{Jac}}(\{x\},r)=1={\text{Jac}}(\emptyset,r)$ , which is not possible. This shows the second property.

Finally, to show the property (iii), suppose there are distinct $x_{1},x_{2}\in{X}$ which do not belong to any of the sets in $R$ . Then, for each $r\in R$ , ${\text{Jac}}(\{x_{1}\},r)=1={\text{Jac}}(\{x_{2}\},r)$ , which is not possible and completes the proof of the proposition.

2.2 Metric Dimension lower bound

In this section, we prove Proposition 1.2.

Suppose that $R$ resolves $2^{X}$ . If $c,r\subset X$ , then by the Inclusion-Exclusion Principle, ${\text{Jac}}(c,r)=1-\frac{|c\cap r|}{|c|+|r|-|c\cap r|}$ . Since $0\leq|c\cap r|\leq|c|$ , the range of ${\text{Jac}}(\cdot|R)$ , when restricted to sets $c$ such that $|c|=n$ , has size at most $(n+1)^{|R|}$ . In particular, due to the Pigeonhole Principle, we must have $\binom{|{X}|}{n}\leq(n+1)^{|R|}$ , i.e.:

|R|\geq\max\limits_{0<n<|{X}|}\frac{\ln{|{X}|\choose n}}{\ln(n+1)}\geq\frac{% \ln{|{X}|\choose\lfloor|X|/2\rfloor}}{\ln\big{(}|X|/2+1\big{)}}.

(2)

The right-most lower bound above should be a reasonable estimate of the best one (based on the Pigeon Principle) because $\ln(n+1)$ is a slowly increasing function of $n$ , and ${|{X}|\choose n}$ , with $0<n<|X|$ , is maximized at $n=\lfloor|X|/2\rfloor$ (equivalently, $n=\lceil|X|/2\rceil$ ). To make the last numerator above more explicit, we use that [12, Exercise 24, §1.2.5]:

\frac{n^{n}}{e^{n-1}}\leq n!\leq\frac{n^{n+1}}{e^{n-1}},\text{ for }n\geq 1.

In particular, if $n=\lfloor|X|/2\rfloor$ then

	$\displaystyle\ln\binom{\|{X}\|}{\lfloor\|X\|/2\rfloor}$	$\displaystyle\geq\|X\|\ln\|X\|-(n+1)\ln(n)-\big{(}\|X\|-n+1\big{)}\ln\big{(}\|X\|-n% \big{)}-1$
		$\displaystyle=\|X\|\left\{\ln 2+\frac{n}{\|X\|}\ln\left(\frac{\|X\|}{2n}\right)+% \frac{\|X\|-n}{\|X\|}\ln\left(\frac{\|X\|}{2\|X\|-2n}\right)\right\}-\ln\Big{(}n\big{(% }\|X\|-n\big{)}\Big{)}-1$
		$\displaystyle\geq\|X\|\big{\{}\ln 2+o(1)\big{\}}-2\ln\left(\frac{\|X\|}{2}\right)-1.$

The proposition is now direct from (2).

2.3 Resolving Subsets of $X$ of Different Cardinalities

Lemma 2.1.

For all $x\in X$ and all $a,b\in 2^{X}$ , if $|a|\neq|b|$ then $a$ and $b$ are resolved by $R=\big{\{}\emptyset,\{x\},X\setminus\{x\}\big{\}}$ .

Proof.

Without any loss of generality assume that $|X|>1$ . Fix an $x\in X$ and note that for each $c\in 2^{X}$ :

	$\displaystyle{\text{Jac}}(c,\{x\})$	$\displaystyle=1-\begin{cases}\frac{1}{\|c\|},&x\in c;\\ 0,&x\notin c;\end{cases}$
	$\displaystyle{\text{Jac}}(c,X\setminus\{x\})$	$\displaystyle=1-\begin{cases}\frac{\|c\|-1}{\|X\|},&x\in c;\\ \frac{\|c\|}{\|X\|-1},&x\notin c.\end{cases}$

Define $R:=\big{\{}\emptyset,\{x\},X\setminus\{x\}\big{\}}$ . Consider $a,b\in 2^{X}$ such that $|a|\neq|b|$ , and suppose that ${\text{Jac}}(a,r)={\text{Jac}}(b,r)$ , for all $r\in R$ . In particular, $a$ cannot be empty; otherwise, ${\text{Jac}}(b,\emptyset)={\text{Jac}}(a,\emptyset)=0$ , implying that $b=\emptyset$ because Jac is a metric. However, the latter is not possible because $|a|\neq|b|$ . Likewise, $b$ is cannot be empty.

Moreover, if $x\in a$ , then ${\text{Jac}}(b,{x})=1-|a|^{-1}<1$ . In particular, $x$ must be in $b$ as otherwise ${\text{Jac}}(b,{x})=1$ , which is not possible. But then, $1-|b|^{-1}=1-|a|^{-1}$ , i.e., $|b|=|a|$ , which is not possible either. Instead, if $x\notin a$ and $x\notin b$ then, because ${\text{Jac}}(a,X\setminus\{x\})={\text{Jac}}(b,X\setminus\{x\})$ , we must have that $1-\frac{|a|}{|X|-1}=1-\frac{|b|}{|X|-1}$ , i.e., $|a|=|b|$ , which is again not possible. Hence, there has to be an $r\in R$ such that ${\text{Jac}}(a,r)\neq{\text{Jac}}(b,r)$ , implying that $R$ resolves $a$ and $b$ . The same conclusion applies if $x\in b$ , which completes the proof of the lemma. ∎

2.4 Inner product Characterization of Equidistant Sets

Two sets $a,b\in 2^{X}$ are said equidistant from an $r\in 2^{X}$ when ${\text{Jac}}(a,r)={\text{Jac}}(b,r)$ . In this case, $r$ is not useful to resolve $a$ from $b$ when $a\neq b$ , and we say that $a$ and $b$ collide in terms of their Jaccard distance to $r$ .

In this section, we characterize collisions in linear algebra terms by representing subsets of $X$ as binary vectors. We note that linear algebra characterizations have been used to study the metric dimension of Hypercube graphs [3] and Hamming graphs [15].

In what follows, we represent elements in $2^{X}$ as binary vectors of dimension $|X|$ . Namely, for $a\in 2^{X}$ , $a(x)=1$ when $x\in a$ , and $a(x)=0$ when $x\notin a$ . (For instance, ${X}$ is represented by a vector of all ones, whereas $\emptyset$ by a vector of all zeros.) Additionally, for $r\in 2^{X}$ and $z\in\mathbb{R}^{|{X}|}$ , $\langle r,z\rangle$ denotes the inner product between the binary vector associated with $r$ and the vector $z$ . Namely:

\langle r,z\rangle:=\sum_{x\in{X}}r(x)\cdot z(x).

In what follows, we use product notation to denote set intersections. Namely, if $a,b\in 2^{X}$ then $ab:=(a\cap b)$ .

The next result characterizes equidistant sets in terms of inner products. This characterization will be used in Section 2.5.1, in the proof of Theorem 1.1, to assess the probability that two different subsets of $X$ , of the same size, collide in terms of their distance to a random subset of $X$ .

Lemma 2.2.

Let $a,b,r\in 2^{X}$ and define the vector $z:=\big{(}|r|+|b|\big{)}\,a-\big{(}|r|+|a|\big{)}\,b$ . If ${\text{Jac}}(a,r)={\text{Jac}}(b,r)$ then $\langle r,z\rangle=0$ . Conversely, if $r\neq\emptyset$ and $\langle r,z\rangle=0$ then ${\text{Jac}}(a,r)={\text{Jac}}(b,r)$ .

Proof.

We show first that

\langle r,z\rangle=\big{(}|r|+|b|\big{)}\cdot|ar|-\big{(}|r|+|a|\big{)}\cdot|% br|.

(3)

For this, observe that $|ar|=\langle a,r\rangle$ and $|br|=\langle b,r\rangle$ ; from which the identity in equation (3) is immediate due to the bilinearity of inner products.

Since $\langle\emptyset,z\rangle=0$ , to complete the proof, it suffices to show that if $r\neq\emptyset$ then ${\text{Jac}}(a,r)={\text{Jac}}(b,r)$ if and only if $\langle r,z\rangle=0$ . For this, note that $1-{\text{Jac}}(c,r)=|cr|/|c\cup r|$ and $|c\cup r|=|c|+|r|-\langle c,r\rangle$ , for all $c\in 2^{X}$ . In particular, a simple algebra shows that ${\text{Jac}}(a,r)={\text{Jac}}(b,r)$ is equivalent to having $(|r|+|b|)\,\langle a,r\rangle-(|r|+|a|)\,\langle b,r\rangle=0$ , that is, $\langle z,r\rangle=0$ due to the bilinearity of inner products. ∎

We also want an inner product characterization of sets $a$ and $b$ that not only collide in terms of their Jaccard distance to a set $r$ but also to $r^{c}$ , the complement of $r$ . Our next result provides a necessary condition for both collisions to occur. This is characterization is used in Section 2.6.1 to show Theorem 1.2.

Corollary 2.1.

Let $a,b,r\in 2^{X}$ . If ${\text{Jac}}(a,r)={\text{Jac}}(b,r)$ and ${\text{Jac}}(a,r^{c})={\text{Jac}}(b,r^{c})$ then $\big{(}|r^{c}|-|r|\big{)}\cdot\big{(}|br|-|ar|\big{)}=|r^{c}|\cdot\big{(}|b|-|% a|\big{)}$ .

Proof.

If ${\text{Jac}}(a,r)={\text{Jac}}(b,r)$ and ${\text{Jac}}(a,r^{c})={\text{Jac}}(b,r^{c})$ then Lemma 2.2 implies that $\langle r,z_{1}\rangle=0$ and $\langle r^{c},z_{2}\rangle=0$ , where $z_{1}:=\big{(}|r|+|b|\big{)}\,a-\big{(}|r|+|a|\big{)}\,b$ and $z_{2}:=\big{(}|r^{c}|+|b|\big{)}\,a-\big{(}|r^{c}|+|a|\big{)}\,b$ . Hence, due to the identity in equation (3), we have that

$\displaystyle 0$	$\displaystyle=\langle r,z_{1}\rangle+\langle r^{c},z_{2}\rangle$
	$\displaystyle=\big{(}\|r\|+\|b\|\big{)}\cdot\|ar\|-\big{(}\|r\|+\|a\|\big{)}\cdot\|br\|+% \big{(}\|r^{c}\|+\|b\|\big{)}\cdot\|r^{c}a\|-\big{(}\|r^{c}\|+\|a\|\big{)}\|r^{c}b\|$
	$\displaystyle=\|r\|\cdot\big{(}\|ar\|-\|br\|\big{)}+\|r^{c}\|\cdot\big{\{}\|r^{c}a\|-\|r^% {c}b\|\big{\}}+\|ar\|\cdot\|b\|-\|br\|\cdot\|a\|+\big{\{}\|r^{c}a\|\cdot\|b\|-\|r^{c}b\|\cdot% \|a\|\big{\}}.$	(4)

But $|r^{c}a|=|a|-|ar|$ and $|r^{c}b|=|b|-|br|$ ; in particular, we may rewrite the expressions within the curly parentheses above as follows: $|r^{c}a|-|r^{c}b|=\big{(}|br|-|ar|\big{)}+|a|-|b|$ , and $|r^{c}a|\cdot|b|-|r^{c}b|\cdot|a|=|a|\cdot|br|-|b|\cdot|ar|$ . Finally, substituting these two expressions back in equation (4), and after recognizing various terms cancellations, we obtain that

0=\big{(}|r^{c}|-|r|\big{)}\cdot\big{(}|br|-|ar|\big{)}-|r^{c}|\cdot\big{(}|b|% -|a|\big{)},

from which the Corollary follows. ∎

$\|X\|$	1	2	3	4	5	6	7	8	9	10	11	12	13	14
$\|R\|$	1	2	2	3	3	4	5	5	6	6	7	7	8	8
$\frac{1}{\|R\|}\sum\limits_{r\in R}\|r\|$	1	1.5	2.0	2.33	2.66	3.5	4.4	3.87	4.3	5.8	5.9	6	6.4	7.4

Table 1: Upper bounds for

\beta(2^{X},{\text{Jac}})

obtained by the ICH for a range of sizes of

X

. The middle row is the size of the resolving set

R

found by the ICH. The bottom row is the average size of the sets in

R

, which often seem approximately equal to

|X|/2

2.5 Resolving Pairs of Subsets of $X$ of Equal Size

In this section, we prove Theorem 1.1 using the probabilistic method [1].

In what follows, $k\geq 1$ and $r_{1},\ldots,r_{k}$ are i.i.d. $\text{Bernoulli}(X,1/2)$ random subsets. In particular, $\mathbb{E}|r_{i}|=|X|/2$ , which is consistent with the experimental results displayed in Table 1, and guided the selection of the parameter 1/2 in the Binomial distribution.

Define

R_{1}:=\{r_{1},\ldots,r_{k}\}.

In accordance with the probabilistic method, and to obtain an upper bound on the metric dimension of $2^{X}$ , we aim to find a $k$ such that the probability that $R_{1}$ does not resolve all distinct pairs $a,b\in 2^{X}$ of equal size is strictly less than one. If we can find such a $k$ , then there exists an $R\subset 2^{X}$ with $|R|=k$ that resolves all different pairs of subsets of $X$ of equal size. In particular, due to Lemma 2.1, we could assert that $\beta(2^{X},{\text{Jac}})\leq(3+k)$ . The challenge is to find $k$ as small as possible so that $(k+3)$ is a tight upper bound for the metric dimension of $2^{X}$ , and the following probability

\Sigma_{1}:={\mathbb{P}}\left(\exists\,a,b\in 2^{X},\text{ with }|a|=|b|\text{% but }a\neq b,\text{ such that }\forall r\in R_{1}:{\text{Jac}}(a,r)={\text{% Jac}}(b,r)\right)

(5)

becomes asymptotically negligible as $|X|\to\infty$ . Theorem 1.1 identifies a $k$ meeting this criterion.

2.5.1 Sizing the probability $\Sigma_{1}$

In this section, we identify a $k$ in terms of $|X|$ , of the same order of magnitude as the asymptotic lower bound for $\beta(2^{X},{\text{Jac}})$ in Proposition 1.2, such that $\Sigma_{1}=o(1)$ .

Suppose there exists $a,b\in 2^{X}$ such that $|a|=|b|$ , $a\neq b$ , and ${\text{Jac}}(a,r)={\text{Jac}}(b,r)$ for all $r\in R_{1}$ . Then, per Lemma 2.2, for each $r\in R_{1}$ , $\big{(}|r|+|a|\big{)}\cdot\langle a-b,r\rangle=0$ . But $|a|>0$ because $a\neq b$ . So $\langle a-b,r\rangle=0$ , or equivalently:

\langle z,r\rangle=0,\text{ with }z:=(ab^{c}-a^{c}b).

But observe that $z\in\mathcal{Z}$ , where

\mathcal{Z}:=\left\{z\in\big{\{}0,\pm 1\big{\}}^{|X|}\text{ such that }\sum_{x% \in X}z(x)=0\text{ and }\sum_{x\in X}|z(x)|\geq 2\right\}.

Consequently:

\Sigma_{1}\leq{\mathbb{P}}\big{(}\exists\,z\in\mathcal{Z}\text{ such that }% \forall r\in R_{1}:\langle z,r\rangle=0\big{)}.

To bound the probability on the right-hand side above, consider a $z\in\mathcal{Z}$ and the sets $I:=\{x\in X\text{ such that }z(x)=+1\}$ , and $J:=\{x\in X\text{ such that }z(x)=-1\}$ . Observe that $I$ and $J$ are non-empty, disjoint, and of the same size; let $i$ be said cardinality. Note that $1\leq i\leq\lfloor|X|/2\rfloor$ , and that $|I\cap r_{1}|,\ldots,|I\cap r_{k}|,|J\cap r_{1}|,\ldots,|J\cap r_{k}|$ are i.i.d. $\text{Binomial}(i,1/2)$ random variables. Further, since $\langle z,r_{t}\rangle=|I\cap r_{t}|-|J\cap r_{t}|$ , $\langle z,r_{1}\rangle,\ldots,\langle z,r_{k}\rangle$ are also i.i.d. As a result:

{\mathbb{P}}\big{(}\langle z,r_{t}\rangle=0\big{)}=\sum_{j=1}^{i}{i\choose j}^% {2}\left(\frac{1}{2}\right)^{2i}=\left(\frac{1}{2}\right)^{2i}\sum_{j=1}^{i}{i% \choose j}{i\choose i-j}=\left(\frac{1}{2}\right)^{2i}\left\{{2i\choose i}-1% \right\},

and

\Sigma_{1}\leq\sum_{i=1}^{\lfloor|X|/2\rfloor}{|X|\choose i,i,|X|-2i}\left\{{2% i\choose i}\left(\frac{1}{2}\right)^{2i}\right\}^{k}.

(6)

But

{|X|\choose i,i,|X|-2i}\leq\frac{|X|^{2i}}{(i!)^{2}}=O\left(\frac{\big{(}|X|e/% i\big{)}^{2i}}{i}\right),

where the big-O is direct from Stirling’s formula. On the other hand, Stirling’s formula also implies that ${2i\choose i}\big{(}\frac{1}{2}\big{)}^{2i}\sim\frac{1}{\sqrt{i\pi}}$ . However, for a bona fide substitution of ${2i\choose i}\big{(}\frac{1}{2}\big{)}^{2i}$ by $\frac{1}{\sqrt{i\pi}}$ in equation (6), one needs a stronger relationship between these two sequences. For this effect, observe that [19]:

\exp\left\{\frac{1}{12i+1}\right\}<\frac{i!}{\sqrt{2\pi}\,i^{i+1/2}\,e^{-i}}<% \exp\left\{\frac{1}{12i}\right\},\text{ for all }i\geq 1.

In particular,

{2i\choose i}\Big{(}\frac{1}{2}\Big{)}^{2i}\leq\frac{\exp\Big{\{}\frac{1-36i}{% 24i(12i+1)}\Big{\}}}{\sqrt{i\pi}}\leq\frac{1}{\sqrt{i\pi}},

and from the inequality in equation (6) we see that

\Sigma_{1}=\frac{|X|\big{(}|X|-1\big{)}}{2^{k}}+O\left(\frac{1}{\pi^{k/2}}\sum% _{i=2}^{\lfloor|X|/2\rfloor}\frac{\big{(}|X|e/i\big{)}^{2i}}{i^{k/2}}\cdot% \frac{1}{i}\right).

(7)

The following result will let us handle the big-O term above.

Lemma 2.3.

If $k\geq\frac{2|X|\ln(2e)}{\ln(|X|/2)}$ then $\frac{(|X|e/i)^{2i}}{i^{k/2}}=O(1)$ , uniformly for all $|X|$ large enough and $2\leq i\leq\lfloor|X|/2\rfloor$ .

Proof.

It suffices to show that

k\geq\frac{4i\ln\big{(}|X|e/i\big{)}}{\ln(i)},

(8)

for all $|X|$ large enough and $2\leq i\leq\lfloor|X|/2\rfloor$ . For this, consider the function defined as $f(\tau):=4\tau\ln\big{(}|X|e/\tau\big{)}/\ln(\tau)$ , for $2\leq\tau\leq|X|/2$ . But note that

f^{\prime}(\tau)=4\frac{\ln(|X|e)\ln(\tau)-\ln^{2}(\tau)-\ln(|X|e)}{\big{\{}% \ln(\tau)\big{\}}^{2}}=4\frac{\ln\left(\frac{\tau_{0}}{\tau}\right)\cdot\ln% \left(\frac{\tau}{\tau_{1}}\right)}{\big{\{}\ln(\tau)\big{\}}^{2}},

(9)

where the second identity assumes that $|X|>e^{3}$ , in which case

	$\displaystyle\tau_{0}$	$\displaystyle:=\exp\left\{\frac{1+\ln\|X\|}{2}\left(1-\sqrt{1-\frac{4}{1+\ln\|X\|}% }\right)\right\}=\exp\left\{1+O\left(\frac{1}{\ln\|X\|}\right)\right\};$
	$\displaystyle\tau_{1}$	$\displaystyle:=\exp\left\{\frac{1+\ln\|X\|}{2}\left(1+\sqrt{1-\frac{4}{1+\ln\|X\|}% }\right)\right\}=\exp\left\{\ln\|X\|+O\left(\frac{1}{\ln\|X\|}\right)\right\}.$

In particular, $\tau_{0}\sim e$ and $\tau_{1}\sim|X|$ . Thus, as long as $|X|$ is large enough, $2<\tau_{0}<|X|/2<\tau_{1}$ , and equation (9) implies that $f(\tau)$ is decreasing for $\tau\in[2,\tau_{0}]$ and increasing for $\tau\in[\tau_{0},|X|/2]$ , which in turn implies that $f(\tau)$ is maximized at $\tau=2$ or $\tau=|X|/2$ . Since $f(2)\ll f\big{(}|X|/2\big{)}$ , $f$ is maximized at $\tau=|X|/2$ ; in particular, the inequality in equation (8) is satisfied when $k\geq f\big{(}|X|/2\big{)}$ , which shows the Lemma. ∎

Finally, due to equation (7), if $k$ satisfies the condition in Lemma 2.3 then

\Sigma_{1}=O\left(\frac{|X|^{2}}{2^{k}}\right)+O\left(\frac{1}{\pi^{k/2}}\sum_% {i=2}^{\lfloor|X|/2\rfloor}\frac{1}{i}\right)=o(1)+O\left(\frac{\ln|X|}{\pi^{k% /2}}\right)=o(1),

where, for the middle identity, we have used that the harmonic series grows logarithmic with the number of terms. This completes the proof of Theorem 1.1.

2.6 Resolving Subsets of $X$ of Different Size

In this section, we prove Theorem 1.2.

After having characterized the asymptotic order of the metric dimension of $(2^{X},{\text{Jac}})$ , i.e., the asymptotically optimal size of resolving sets for $2^{X}$ , in this final section, we see how to resolve all pairs of subsets of $X$ of different size.

For this, consider the problem of resolving all distinct $a,b\in 2^{X}$ such that $|a|<|b|$ , using a set of the form

R_{2}=\{r_{1},r_{1}^{c},\ldots,r_{k},r_{k}^{c}\},

(10)

where $r_{1},\ldots,r_{k}$ are i.i.d. with a $\text{Binomial}(X,1/2)$ distribution. It follows that

{\mathbb{P}}\big{(}R_{2}\text{ does not resolve all distinct }a,b\in 2^{X}% \text{ such that }|a|,|b|\leq|X|/2\big{)}\leq\Sigma_{2},

where

\Sigma_{2}:={\mathbb{P}}\left(\exists\,a,b\in 2^{X},\text{ with }|a|<|b|\leq% \frac{|X|}{2},\text{ such that }\forall r\in R_{2}:{\text{Jac}}(a,r)={\text{% Jac}}(b,r)\right).

(11)

2.6.1 Sizing the probability $\Sigma_{2}$

Suppose that $a,b,r\in 2^{X}$ are such that $|a|<|b|$ , ${\text{Jac}}(a,r)={\text{Jac}}(b,r)$ , and ${\text{Jac}}(a,r^{c})={\text{Jac}}(b,r^{c})$ ; in particular, per Corollary 2.1, $|r^{c}|\cdot\big{(}|b|-|a|\big{)}=\big{(}|r^{c}|-|r|\big{)}\cdot\big{(}|br|-|% ar|\big{)}$ . But $|r^{c}|=|X|-|r|$ , $|b|-|a|=|a^{c}b|-|b^{c}a|$ , and $|br|-|ar|=|a^{c}br|-|b^{c}ar|$ . So, we may rewrite the last identity equivalently as follows:

\big{(}|a^{c}b|-|b^{c}a|\big{)}\cdot\left(1-\frac{|r|}{|{X}|}\right)=\big{(}|a% ^{c}br|-|b^{c}ar|\big{)}\cdot\left(1-\frac{2|r|}{|{X}|}\right).

Equivalently, if we define $\Delta_{c}(r):=|c|-2|cr|$ for each $c\in 2^{X}$ , the above identity is equivalent to

\frac{\Delta_{X}(r)}{|X|}\cdot\frac{\Delta_{v}(r)-\Delta_{u}(r)}{|u|-|v|}=1,

(12)

where $u:=a^{c}b$ and $v:=b^{c}a$ are disjoint subsets of $X$ such that $|u|>|v|$ . Notably, for $r\sim\text{Binomial}(X,1/2)$ , the probability of the above event depends only on the quantities $|u|$ , $|v|$ , and $|X|$ , without regard to the specific identity of $u$ and $v$ , except for the constraints that $uv=\emptyset$ and $|u|>|v|$ . So we may define $\rho(i,j,X)$ as the probability of the event in (12)—when $(u,v)\in 2^{X}\times 2^{X}$ are such that $uv=\emptyset$ , $|u|=i>j=|v|$ , and $r\sim\text{Binomial}(X,1/2)$ .

It follows from the above discussion that if

\mathcal{P}:=\left\{(u,v)\in 2^{X}\times 2^{X}\text{ such that }uv=\emptyset% \text{ and }|v|<|u|\right\},

then

\Sigma_{2}\leq{\mathbb{P}}\left(\exists\,(u,v)\in\mathcal{P}\text{ such that }% \forall t\in\{1,\ldots,k\}:\frac{\Delta_{X}(r_{t})}{|X|}\cdot\frac{\Delta_{v}(% r_{t})-\Delta_{u}(r_{t})}{|u|-|v|}=1\right).

But note that the random vectors $\big{(}\Delta_{u}(r_{t}),\Delta_{v}(r_{t}),\Delta_{X}(r_{t})\big{)}$ , with $t=1,\ldots,k$ , are i.i.d. for any given $(u,v)\in\mathcal{P}$ . As a result:

\Sigma_{2}\leq\sum_{i=1}^{|X|}\sum_{j}{|X|\choose i,j,|X|-i-j}\,\rho^{k}\big{(% }i,j,X\big{)},

(13)

where the index $j$ is the inner sum above is such that $0\leq j<i$ and $(i+j)\leq|X|$ .

Lemma 2.4.

If $1\leq i\leq|X|$ and $0\leq j<i$ , with $(i+j)\leq|X|$ , then $\rho(i,j,X)\leq 4\exp\left(-\frac{\sqrt{|X|}}{2}\right)$ .

Proof.

Let $(u,v)\in\mathcal{P}$ be such that $|u|=i$ and $j=|v|$ ; in particular, $i>j$ . Then, for each $\tau>0$ :

	$\displaystyle\rho(i,j,X)$	$\displaystyle={\mathbb{P}}\left(\Delta_{X}(r)\cdot\big{(}\Delta_{v}(r)-\Delta_% {u}(r)\big{)}=(i-j)\cdot\|X\|\right)$
		$\displaystyle\leq{\mathbb{P}}\left(\|\Delta_{X}(r)\|\geq\sqrt{2\tau\|X\|}\right)+{% \mathbb{P}}\left(\|\Delta_{u}(r)-\Delta_{v}(r)\|\geq(i-j)\sqrt{\frac{\|X\|}{2\tau}% }\right)$
		$\displaystyle\leq 2\left\{\exp\left(-\tau\right)+\exp\left(-\frac{(i-j)^{2}\|X\|% }{4(i+j)\tau}\right)\right\},$

where for the last inequality we have used the well-known Hoeffding’s inequality, and that $2\big{(}\Delta_{u}(r)-\Delta_{v}(r)\big{)}$ has the same distribution as $\sum_{k=1}^{i+j}\big{(}Z_{k}-\mathbb{E}(Z_{k})\big{)}$ , where $Z_{1},\ldots,Z_{i+j}$ are independent random variables, with $Z_{k}\sim\text{Bernoulli}(1/2)$ for $1\leq k\leq i$ , and $(-Z_{k})\sim\text{Bernoulli}(1/2)$ for $i<k\leq i+j$ . Therefore, by selecting

\tau:=\frac{i-j}{2}\sqrt{\frac{|X|}{i+j}}

(14)

we obtain that

\rho(i,j,X)\leq 4e^{-\tau}.

(15)

But

\tau=\sqrt{i}\cdot\frac{1-j/i}{\sqrt{1+j/i}}\cdot\frac{\sqrt{|X|}}{2}\geq\frac% {\sqrt{|X|}}{2},

(16)

because the first factor above is an increasing function of $i$ , whereas the second factor is a decreasing function of $j/i$ . The lemma is now a direct consequence of the inequalities in (15)-(16). ∎

Remark 2.1.

The choice of $\tau$ in (14) is somewhat optimal when $\tau\geq 1$ , which is a necessary condition for the upper-bound in (15) to be non-trivial. (The latter requires of course $\tau\geq 2\ln 2$ which, based on (16), can be guaranteed as soon as $|X|\geq 8$ .) Indeed, from the last proof: $\rho(i,j,X)\leq 2f(t)$ , where $f(t):=e^{-t}+e^{-\tau^{2}/t}$ for $t>0$ . But note that $f^{\prime}(t)=\left(g(\tau^{2}/t)-g(t)\right)/t$ , with $g(t):=te^{-t}$ for $t>0$ ; hence $t=\tau$ is a critical point of $f(t)$ . Moreover, since $f^{\prime\prime}\big{(}t\big{)}=2\,e^{-t}\left(1-t^{-1}\right)$ , $t=\tau$ is a local minimum when $\tau>1$ . In particular, since $f^{\prime\prime\prime}(1)=0$ but $f^{\prime\prime\prime\prime}(1)=2e^{-1}>0$ when $\tau=1$ , $t=\tau$ is a local minimum of $f(t)$ when $\tau\geq 1$ .

Let $\rho_{X}$ be the upper bound for $\rho(i,j,X)$ given in Lemma 2.4. It follows from (13) that

	$\displaystyle\Sigma_{2}$	$\displaystyle\leq\rho^{k}_{X}\sum_{i=1}^{\|X\|}\sum_{j}{\|X\|\choose i,j,\|X\|-i-j}$
		$\displaystyle\leq\rho^{k}_{X}\sum_{i=1}^{\|X\|}\sum_{j}\frac{\|X\|^{i+j}}{i!\,j!}$
		$\displaystyle\leq\rho^{k}_{X}\left(\sum_{i=1}^{\|X\|}\frac{\|X\|^{i}}{i!}\right)^{2}$
		$\displaystyle=\rho^{k}_{X}e^{2\|X\|}\,\left(\frac{\Gamma\left(\|X\|+1,\|X\|\right)}{% \|X\|!}\right)^{2},$

where, for an integer $n>0$ and $x\in\mathbb{R}$ , $\Gamma(n,x):=(n-1)!\,e^{-x}\sum\limits_{i=0}^{n-1}\frac{x^{k}}{(n-1)!}=\int% \limits_{x}^{\infty}t^{n-1}e^{-t}\,dt$ is the (upper) incomplete Gamma function. Finally, due to [18, Proposition 2.7], $\Gamma(|X|+1,|X|)=O\big{(}|X|^{|X|}e^{-|X|}\big{)}$ . Consequently,

\Sigma_{2}=O\left(\frac{\rho_{X}^{k}\,|X|^{2|X|}}{(|X|!)^{2}}\right)=O\left(% \frac{\rho_{X}^{k}e^{2|X|}}{|X|}\right)=O\left(\frac{e^{2|X|+k\ln(\rho_{X})}}{% |X|}\right),

where we have used the Stirling’s approximation and the exp-log transform. In particular, for any $\epsilon<1$ , if select $k$ so that $2|X|+k\ln(\rho_{X})\leq\epsilon\ln|X|$ , for instance, $k=\left\lceil\frac{4|X|-2\epsilon\ln|X|}{\sqrt{|X|}-4\ln 2}\right\rceil\sim 4% \sqrt{|X|}$ , then $\Sigma_{2}=o(1)$ , which completes the proof of Theorem 1.2.

2.7 Resolving Comparatively Small Subsets of $X$

In this section we prove Corollary 1.2, which is the consequence of arguments already used in the proofs of theorems 1.1 and 1.2. For this, let $0<\epsilon<1$ , and $1\leq W\leq(1-\epsilon)(\ln\pi)\sqrt{|X|}/\ln|X|$ be an integer.

To show the Corollary, we reconsider the set $R_{2}$ in (10) with $k\geq(4+\epsilon)\sqrt{|X|}$ . By distinguishing pairs $a,b\in 2^{X}$ such that $|a|=|b|$ from $|a|\neq|b|$ , we find this time that

{\mathbb{P}}\left(\exists\,a,b\in 2^{X}\text{ with }a\neq b\text{ such that }% \forall r\in R_{2}:{\text{Jac}}(a,r)={\text{Jac}}(b,r)\right)\leq\Sigma_{2}+% \Sigma_{3},

(17)

where $\Sigma_{2}$ is the double-sum in (13), and $\Sigma_{3}$ is a truncated version of the summation in (6). Specifically

\Sigma_{3}:=\sum_{i=1}^{W}{|X|\choose i,i,|X|-2i}\left\{{2i\choose i}\left(% \frac{1}{2}\right)^{2i}\right\}^{k}.

But, from the discussion in Section 2.6.1, we already know that $\Sigma_{2}=o(1)$ . On the other hand, from the discussion in Section 2.5.1 that led to (7), we can say that

\Sigma_{3}=O\left(\sum_{i=1}^{W}\frac{|X|^{2i}}{(i!)^{2}\,(i\pi)^{k/2}}\right).

As a result

\Sigma_{3}=O\left(\frac{|X|^{2W}}{\pi^{k/2}}\sum_{i=1}^{W}\frac{1}{(i!)^{2}}% \right)=O\left(\frac{|X|^{2W}}{\pi^{k/2}}\right)=O\left(\frac{|X|^{2W}}{\pi^{2% \sqrt{|X|}}}\right)=O\left(\pi^{-2\epsilon\sqrt{|X|}}\right),

where for the last two asymptotic bounds we have use the constrains on $k$ and $W$ . The Corollary is now a direct consequence of the inequality in (17).

Acknowledgments. This work was partially funded by the NSF grant No. 1836914.

References

[1] N. Alon and J. H. Spencer, The Probabilistic Method, 2nd edn., Wiley, 2004.
[2] S. Bau and A. F. Beardon, The metric dimension of metric spaces, Comput. Methods Funct. Theory 13 (2013), 295–305.
[3] A. F. Beardon, Resolving the Hypercube, Discrete Applied Mathematics 161 (2013), 1882–1887.
[4] G. Chartrand et al., Resolvability in graphs and the metric dimension of a graph, Discrete Applied Mathematics 105 (2000), no. 1, 99–113.
[5] P. Erdös, F. Harary, and W. T. Tutte, On the dimension of a graph, Mathematika 12 (1965), no. 2, 118–122.
[6] M. R. Garey and D. S. Johnson, Computers and Intractability: A Guide to the Theory of NP-completeness, WH Freeman and Company, New York, 1979.
[7] G. Gilbert, Distance between sets, Nature 239 (1972), no. 174.
[8] F. Harary and R. A. Melter, On the metric dimension of a graph, Ars Combinatoria 2 (1976), no. 191-195, 1.
[9] M. Hauptmann, R. Schmied, and C. Viehmann, Approximation complexity of metric dimension problem, Journal of Discrete Algorithms 14 (2012), 214–222.
[10] P. Jaccard, Étude comparative de la distribution florale dans une portion des alpes et du jura, Bull. Société Vaudoise des Sciences Naturelles 37 (1901), no. 142, 547–579.
[11] R. M. Karp, Reducibility among combinatorial problems, Complexity of Computer Computations, Springer, 1972. 85–103.
[12] D. E. Knuth, The Art of Computer Programming, Vol. 1: Fundamental Algorithms, 3rd edn., Addison-Wesley, 1997.
[13] S. Kosub, A note on the triangle inequality for the jaccard distance, Pattern Recognition Letters 120 (2019), 36–38.
[14] D. Kuziak and I. G. Yero, Metric dimension related parameters in graphs: A survey on combinatorial, computational and applied results, arXiv preprint arXiv:2107.04877 (2021).
[15] L. Laird et al., Resolvability of Hamming graphs, SIAM Journal on Discrete Mathematics 34 (2020), no. 4, 2063–2081.
[16] G. Murphy, A metric basis characterization of Euclidean space, Pac. J. Math. 60 (1975), 159–163.
[17] A. Paradise, Quantitative encoding of bags-of-words for sentiment and sarcasm detection in textual data, Master’s thesis, The University of Colorado, 2024.
[18] I. Pinelis, Exact lower and upper bounds on the incomplete gamma function, Mathematical Inequalities & Applications 23 (2020), no. 4, 1261–1278.
[19] H. Robbins, A remark on Stirling’s formula, The American Mathematical Monthly 62 (1955), no. 1, 26–29.
[20] P. E. Ruth and M. E. Lladser, Levenshtein graphs: Resolvability, automorphisms & determining sets, Discrete Mathematics 346 (2023), no. 5, 113310.
[21] P. J. Slater, Leaves of trees, Congressus Numerantium 14 (1975), no. 549-559, 37.
[22] R. C. Tillquist, R. M. Frongillo, and M. E. Lladser, Metric Dimension, Scholarpedia 14 (2019), no. 10, 53881. Revision #190769.
[23] R. C. Tillquist, R. M. Frongillo, and M. E. Lladser, Getting the lay of the land in discrete space: A survey of metric dimension and its applications, SIAM Review 65 (2023), no. 4, 919–962.
[24] R. C. Tillquist and M. E. Lladser, Low-dimensional representation of genomic sequences, Journal of Mathematical Biology 79 (2019), no. 1, 1–29.

	$\displaystyle\ln\binom{\|{X}\|}{\lfloor\|X\|/2\rfloor}$	$\displaystyle\geq\|X\|\ln\|X\|-(n+1)\ln(n)-\big{(}\|X\|-n+1\big{)}\ln\big{(}\|X\|-n% \big{)}-1$
		$\displaystyle=\|X\|\left\{\ln 2+\frac{n}{\|X\|}\ln\left(\frac{\|X\|}{2n}\right)+% \frac{\|X\|-n}{\|X\|}\ln\left(\frac{\|X\|}{2\|X\|-2n}\right)\right\}-\ln\Big{(}n\big{(% }\|X\|-n\big{)}\Big{)}-1$
		$\displaystyle\geq\|X\|\big{\{}\ln 2+o(1)\big{\}}-2\ln\left(\frac{\|X\|}{2}\right)-1.$

	$\displaystyle\rho(i,j,X)$	$\displaystyle={\mathbb{P}}\left(\Delta_{X}(r)\cdot\big{(}\Delta_{v}(r)-\Delta_% {u}(r)\big{)}=(i-j)\cdot\|X\|\right)$
		$\displaystyle\leq{\mathbb{P}}\left(\|\Delta_{X}(r)\|\geq\sqrt{2\tau\|X\|}\right)+{% \mathbb{P}}\left(\|\Delta_{u}(r)-\Delta_{v}(r)\|\geq(i-j)\sqrt{\frac{\|X\|}{2\tau}% }\right)$
		$\displaystyle\leq 2\left\{\exp\left(-\tau\right)+\exp\left(-\frac{(i-j)^{2}\|X\|% }{4(i+j)\tau}\right)\right\},$

	$\displaystyle\Sigma_{2}$	$\displaystyle\leq\rho^{k}_{X}\sum_{i=1}^{\|X\|}\sum_{j}{\|X\|\choose i,j,\|X\|-i-j}$
		$\displaystyle\leq\rho^{k}_{X}\sum_{i=1}^{\|X\|}\sum_{j}\frac{\|X\|^{i+j}}{i!\,j!}$
		$\displaystyle\leq\rho^{k}_{X}\left(\sum_{i=1}^{\|X\|}\frac{\|X\|^{i}}{i!}\right)^{2}$
		$\displaystyle=\rho^{k}_{X}e^{2\|X\|}\,\left(\frac{\Gamma\left(\|X\|+1,\|X\|\right)}{% \|X\|!}\right)^{2},$

Metric Dimension and Resolvability of Jaccard Spaces

Abstract

1 Introduction

1.1 Main Results

Proposition 1.1.

Proposition 1.2.

Definition 1.1.

Theorem 1.1.

Corollary 1.1.

Theorem 1.2.

Corollary 1.2.

2 Technical Results and Proofs

2.1 Necessary Conditions for Resolvability

2.2 Metric Dimension lower bound

2.3 Resolving Subsets of X𝑋Xitalic_X of Different Cardinalities

Lemma 2.1.

Proof.

2.4 Inner product Characterization of Equidistant Sets

Lemma 2.2.

Proof.

Corollary 2.1.

Proof.

2.5 Resolving Pairs of Subsets of X𝑋Xitalic_X of Equal Size

2.5.1 Sizing the probability Σ1subscriptΣ1\Sigma_{1}roman_Σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT

Lemma 2.3.

Proof.

2.6 Resolving Subsets of X𝑋Xitalic_X of Different Size

2.6.1 Sizing the probability Σ2subscriptΣ2\Sigma_{2}roman_Σ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT

Lemma 2.4.

Proof.

Remark 2.1.

2.7 Resolving Comparatively Small Subsets of X𝑋Xitalic_X

References

2.3 Resolving Subsets of $X$ of Different Cardinalities

2.5 Resolving Pairs of Subsets of $X$ of Equal Size

2.5.1 Sizing the probability $\Sigma_{1}$

2.6 Resolving Subsets of $X$ of Different Size

2.6.1 Sizing the probability $\Sigma_{2}$

2.7 Resolving Comparatively Small Subsets of $X$