Search | arXiv e-print repository

Efficiently Computing Similarities to Private Datasets

Authors: Arturs Backurs, Zinan Lin, Sepideh Mahabadi, Sandeep Silwal, Jakub Tarnawski

Abstract: Many methods in differentially private model training rely on computing the similarity between a query point (such as public or synthetic data) and private data. We abstract out this common subroutine and study the following fundamental algorithmic problem: Given a similarity function $f$ and a large high-dimensional private dataset $X \subset \mathbb{R}^d$, output a differentially private (DP) da… ▽ More Many methods in differentially private model training rely on computing the similarity between a query point (such as public or synthetic data) and private data. We abstract out this common subroutine and study the following fundamental algorithmic problem: Given a similarity function $f$ and a large high-dimensional private dataset $X \subset \mathbb{R}^d$, output a differentially private (DP) data structure which approximates $\sum_{x \in X} f(x,y)$ for any query $y$. We consider the cases where $f$ is a kernel function, such as $f(x,y) = e^{-\|x-y\|_2^2/σ^2}$ (also known as DP kernel density estimation), or a distance function such as $f(x,y) = \|x-y\|_2$, among others. Our theoretical results improve upon prior work and give better privacy-utility trade-offs as well as faster query times for a wide range of kernels and distance functions. The unifying approach behind our results is leveraging `low-dimensional structures' present in the specific functions $f$ that we study, using tools such as provable dimensionality reduction, approximation theory, and one-dimensional decomposition of the functions. Our algorithms empirically exhibit improved query times and accuracy over prior state of the art. We also present an application to DP classification. Our experiments demonstrate that the simple methodology of classifying based on average similarity is orders of magnitude faster than prior DP-SGD based approaches for comparable accuracy. △ Less

Submitted 13 March, 2024; originally announced March 2024.

Comments: To appear at ICLR 2024

arXiv:2403.01749 [pdf, other]

Differentially Private Synthetic Data via Foundation Model APIs 2: Text

Authors: Chulin Xie, Zinan Lin, Arturs Backurs, Sivakanth Gopi, Da Yu, Huseyin A Inan, Harsha Nori, Haotian Jiang, Huishuai Zhang, Yin Tat Lee, Bo Li, Sergey Yekhanin

Abstract: Text data has become extremely valuable due to the emergence of machine learning algorithms that learn from it. A lot of high-quality text data generated in the real world is private and therefore cannot be shared or used freely due to privacy concerns. Generating synthetic replicas of private text data with a formal privacy guarantee, i.e., differential privacy (DP), offers a promising and scalab… ▽ More Text data has become extremely valuable due to the emergence of machine learning algorithms that learn from it. A lot of high-quality text data generated in the real world is private and therefore cannot be shared or used freely due to privacy concerns. Generating synthetic replicas of private text data with a formal privacy guarantee, i.e., differential privacy (DP), offers a promising and scalable solution. However, existing methods necessitate DP finetuning of large language models (LLMs) on private data to generate DP synthetic data. This approach is not viable for proprietary LLMs (e.g., GPT-3.5) and also demands considerable computational resources for open-source LLMs. Lin et al. (2024) recently introduced the Private Evolution (PE) algorithm to generate DP synthetic images with only API access to diffusion models. In this work, we propose an augmented PE algorithm, named Aug-PE, that applies to the complex setting of text. We use API access to an LLM and generate DP synthetic text without any model training. We conduct comprehensive experiments on three benchmark datasets. Our results demonstrate that Aug-PE produces DP synthetic text that yields competitive utility with the SOTA DP finetuning baselines. This underscores the feasibility of relying solely on API access of LLMs to produce high-quality DP synthetic texts, thereby facilitating more accessible routes to privacy-preserving LLM applications. Our code and data are available at https://github.com/AI-secure/aug-pe. △ Less

Submitted 23 July, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

Comments: ICML'24 Spotlight

arXiv:2310.16960 [pdf, other]

Privately Aligning Language Models with Reinforcement Learning

Authors: Fan Wu, Huseyin A. Inan, Arturs Backurs, Varun Chandrasekaran, Janardhan Kulkarni, Robert Sim

Abstract: Positioned between pre-training and user deployment, aligning large language models (LLMs) through reinforcement learning (RL) has emerged as a prevailing strategy for training instruction following-models such as ChatGPT. In this work, we initiate the study of privacy-preserving alignment of LLMs through Differential Privacy (DP) in conjunction with RL. Following the influential work of Ziegler e… ▽ More Positioned between pre-training and user deployment, aligning large language models (LLMs) through reinforcement learning (RL) has emerged as a prevailing strategy for training instruction following-models such as ChatGPT. In this work, we initiate the study of privacy-preserving alignment of LLMs through Differential Privacy (DP) in conjunction with RL. Following the influential work of Ziegler et al. (2020), we study two dominant paradigms: (i) alignment via RL without human in the loop (e.g., positive review generation) and (ii) alignment via RL from human feedback (RLHF) (e.g., summarization in a human-preferred way). We give a new DP framework to achieve alignment via RL, and prove its correctness. Our experimental results validate the effectiveness of our approach, offering competitive utility while ensuring strong privacy protections. △ Less

Submitted 3 May, 2024; v1 submitted 25 October, 2023; originally announced October 2023.

Comments: Accepted at ICLR 2024

arXiv:2212.01539 [pdf, other]

Exploring the Limits of Differentially Private Deep Learning with Group-wise Clipping

Authors: Jiyan He, Xuechen Li, Da Yu, Huishuai Zhang, Janardhan Kulkarni, Yin Tat Lee, Arturs Backurs, Nenghai Yu, Jiang Bian

Abstract: Differentially private deep learning has recently witnessed advances in computational efficiency and privacy-utility trade-off. We explore whether further improvements along the two axes are possible and provide affirmative answers leveraging two instantiations of \emph{group-wise clipping}. To reduce the compute time overhead of private learning, we show that \emph{per-layer clipping}, where the… ▽ More Differentially private deep learning has recently witnessed advances in computational efficiency and privacy-utility trade-off. We explore whether further improvements along the two axes are possible and provide affirmative answers leveraging two instantiations of \emph{group-wise clipping}. To reduce the compute time overhead of private learning, we show that \emph{per-layer clipping}, where the gradient of each neural network layer is clipped separately, allows clipping to be performed in conjunction with backpropagation in differentially private optimization. This results in private learning that is as memory-efficient and almost as fast per training update as non-private learning for many workflows of interest. While per-layer clipping with constant thresholds tends to underperform standard flat clipping, per-layer clipping with adaptive thresholds matches or outperforms flat clipping under given training epoch constraints, hence attaining similar or better task performance within less wall time. To explore the limits of scaling (pretrained) models in differentially private deep learning, we privately fine-tune the 175 billion-parameter GPT-3. We bypass scaling challenges associated with clipping gradients that are distributed across multiple devices with \emph{per-device clipping} that clips the gradient of each model piece separately on its host device. Privately fine-tuning GPT-3 with per-device clipping achieves a task performance at $ε=1$ better than what is attainable by non-privately fine-tuning the largest GPT-2 on a summarization task. △ Less

Submitted 3 December, 2022; originally announced December 2022.

Comments: 25 pages

arXiv:2206.04301 [pdf, other]

Unveiling Transformers with LEGO: a synthetic reasoning task

Authors: Yi Zhang, Arturs Backurs, Sébastien Bubeck, Ronen Eldan, Suriya Gunasekar, Tal Wagner

Abstract: We propose a synthetic reasoning task, LEGO (Learning Equality and Group Operations), that encapsulates the problem of following a chain of reasoning, and we study how the Transformer architectures learn this task. We pay special attention to data effects such as pretraining (on seemingly unrelated NLP tasks) and dataset composition (e.g., differing chain length at training and test time), as well… ▽ More We propose a synthetic reasoning task, LEGO (Learning Equality and Group Operations), that encapsulates the problem of following a chain of reasoning, and we study how the Transformer architectures learn this task. We pay special attention to data effects such as pretraining (on seemingly unrelated NLP tasks) and dataset composition (e.g., differing chain length at training and test time), as well as architectural variants such as weight-tied layers or adding convolutional components. We study how the trained models eventually succeed at the task, and in particular, we manage to understand some of the attention heads as well as how the information flows in the network. In particular, we have identified a novel \emph{association} pattern that globally attends only to identical tokens. Based on these observations we propose a hypothesis that here pretraining helps for LEGO tasks due to certain structured attention patterns, and we experimentally verify this hypothesis. We also observe that in some data regime the trained transformer finds ``shortcut" solutions to follow the chain of reasoning, which impedes the model's robustness, and moreover we propose ways to prevent it. Motivated by our findings on structured attention patterns, we propose the LEGO attention module, a drop-in replacement for vanilla attention heads. This architectural change significantly reduces Flops and maintains or even \emph{improves} the model's performance at large-scale pretraining. △ Less

Submitted 17 February, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

arXiv:2206.01838 [pdf, other]

Differentially Private Model Compression

Authors: Fatemehsadat Mireshghallah, Arturs Backurs, Huseyin A Inan, Lukas Wutschitz, Janardhan Kulkarni

Abstract: Recent papers have shown that large pre-trained language models (LLMs) such as BERT, GPT-2 can be fine-tuned on private data to achieve performance comparable to non-private models for many downstream Natural Language Processing (NLP) tasks while simultaneously guaranteeing differential privacy. The inference cost of these models -- which consist of hundreds of millions of parameters -- however, c… ▽ More Recent papers have shown that large pre-trained language models (LLMs) such as BERT, GPT-2 can be fine-tuned on private data to achieve performance comparable to non-private models for many downstream Natural Language Processing (NLP) tasks while simultaneously guaranteeing differential privacy. The inference cost of these models -- which consist of hundreds of millions of parameters -- however, can be prohibitively large. Hence, often in practice, LLMs are compressed before they are deployed in specific applications. In this paper, we initiate the study of differentially private model compression and propose frameworks for achieving 50% sparsity levels while maintaining nearly full performance. We demonstrate these ideas on standard GLUE benchmarks using BERT models, setting benchmarks for future research on this topic. △ Less

Submitted 3 June, 2022; originally announced June 2022.

arXiv:2110.06500 [pdf, other]

Differentially Private Fine-tuning of Language Models

Authors: Da Yu, Saurabh Naik, Arturs Backurs, Sivakanth Gopi, Huseyin A. Inan, Gautam Kamath, Janardhan Kulkarni, Yin Tat Lee, Andre Manoel, Lukas Wutschitz, Sergey Yekhanin, Huishuai Zhang

Abstract: We give simpler, sparser, and faster algorithms for differentially private fine-tuning of large-scale pre-trained language models, which achieve the state-of-the-art privacy versus utility tradeoffs on many standard NLP tasks. We propose a meta-framework for this problem, inspired by the recent success of highly parameter-efficient methods for fine-tuning. Our experiments show that differentially… ▽ More We give simpler, sparser, and faster algorithms for differentially private fine-tuning of large-scale pre-trained language models, which achieve the state-of-the-art privacy versus utility tradeoffs on many standard NLP tasks. We propose a meta-framework for this problem, inspired by the recent success of highly parameter-efficient methods for fine-tuning. Our experiments show that differentially private adaptations of these approaches outperform previous private algorithms in three important dimensions: utility, privacy, and the computational and memory cost of private training. On many commonly studied datasets, the utility of private models approaches that of non-private models. For example, on the MNLI dataset we achieve an accuracy of $87.8\%$ using RoBERTa-Large and $83.5\%$ using RoBERTa-Base with a privacy budget of $ε= 6.7$. In comparison, absent privacy constraints, RoBERTa-Large achieves an accuracy of $90.2\%$. Our findings are similar for natural language generation tasks. Privately fine-tuning with DART, GPT-2-Small, GPT-2-Medium, GPT-2-Large, and GPT-2-XL achieve BLEU scores of 38.5, 42.0, 43.1, and 43.8 respectively (privacy budget of $ε= 6.8,δ=$ 1e-5) whereas the non-private baseline is $48.1$. All our experiments suggest that larger models are better suited for private fine-tuning: while they are well known to achieve superior accuracy non-privately, we find that they also better maintain their accuracy when privacy is introduced. △ Less

Submitted 14 July, 2022; v1 submitted 13 October, 2021; originally announced October 2021.

Comments: ICLR 2022. Code available at https://github.com/huseyinatahaninan/Differentially-Private-Fine-tuning-of-Language-Models

arXiv:2102.08341 [pdf, other]

Faster Kernel Matrix Algebra via Density Estimation

Authors: Arturs Backurs, Piotr Indyk, Cameron Musco, Tal Wagner

Abstract: We study fast algorithms for computing fundamental properties of a positive semidefinite kernel matrix $K \in \mathbb{R}^{n \times n}$ corresponding to $n$ points $x_1,\ldots,x_n \in \mathbb{R}^d$. In particular, we consider estimating the sum of kernel matrix entries, along with its top eigenvalue and eigenvector. We show that the sum of matrix entries can be estimated to $1+ε$ relative error i… ▽ More We study fast algorithms for computing fundamental properties of a positive semidefinite kernel matrix $K \in \mathbb{R}^{n \times n}$ corresponding to $n$ points $x_1,\ldots,x_n \in \mathbb{R}^d$. In particular, we consider estimating the sum of kernel matrix entries, along with its top eigenvalue and eigenvector. We show that the sum of matrix entries can be estimated to $1+ε$ relative error in time $sublinear$ in $n$ and linear in $d$ for many popular kernels, including the Gaussian, exponential, and rational quadratic kernels. For these kernels, we also show that the top eigenvalue (and an approximate eigenvector) can be approximated to $1+ε$ relative error in time $subquadratic$ in $n$ and linear in $d$. Our algorithms represent significant advances in the best known runtimes for these problems. They leverage the positive definiteness of the kernel matrix, along with a recent line of work on efficient kernel density estimation. △ Less

Submitted 17 June, 2021; v1 submitted 16 February, 2021; originally announced February 2021.

arXiv:2101.08248 [pdf, other]

Data-to-text Generation by Splicing Together Nearest Neighbors

Authors: Sam Wiseman, Arturs Backurs, Karl Stratos

Abstract: We propose to tackle data-to-text generation tasks by directly splicing together retrieved segments of text from "neighbor" source-target pairs. Unlike recent work that conditions on retrieved neighbors but generates text token-by-token, left-to-right, we learn a policy that directly manipulates segments of neighbor text, by inserting or replacing them in partially constructed generations. Standar… ▽ More We propose to tackle data-to-text generation tasks by directly splicing together retrieved segments of text from "neighbor" source-target pairs. Unlike recent work that conditions on retrieved neighbors but generates text token-by-token, left-to-right, we learn a policy that directly manipulates segments of neighbor text, by inserting or replacing them in partially constructed generations. Standard techniques for training such a policy require an oracle derivation for each generation, and we prove that finding the shortest such derivation can be reduced to parsing under a particular weighted context-free grammar. We find that policies learned in this way perform on par with strong baselines in terms of automatic and human evaluation, but allow for more interpretable and controllable generation. △ Less

Submitted 28 October, 2021; v1 submitted 20 January, 2021; originally announced January 2021.

Comments: EMNLP 2021; figures updated/improved

arXiv:2010.14181 [pdf, ps, other]

Impossibility Results for Grammar-Compressed Linear Algebra

Authors: Amir Abboud, Arturs Backurs, Karl Bringmann, Marvin Künnemann

Abstract: To handle vast amounts of data, it is natural and popular to compress vectors and matrices. When we compress a vector from size $N$ down to size $n \ll N$, it certainly makes it easier to store and transmit efficiently, but does it also make it easier to process? In this paper we consider lossless compression schemes, and ask if we can run our computations on the compressed data as efficiently a… ▽ More To handle vast amounts of data, it is natural and popular to compress vectors and matrices. When we compress a vector from size $N$ down to size $n \ll N$, it certainly makes it easier to store and transmit efficiently, but does it also make it easier to process? In this paper we consider lossless compression schemes, and ask if we can run our computations on the compressed data as efficiently as if the original data was that small. That is, if an operation has time complexity $T(\rm{inputsize})$, can we perform it on the compressed representation in time $T(n)$ rather than $T(N)$? We consider the most basic linear algebra operations: inner product, matrix-vector multiplication, and matrix multiplication. In particular, given two compressed vectors, can we compute their inner product in time $O(n)$? Or perhaps we must decompress first and then multiply, spending $Ω(N)$ time? The answer depends on the compression scheme. While for simple ones such as Run-Length-Encoding (RLE) the inner product can be done in $O(n)$ time, we prove that this is impossible for compressions from a richer class: essentially $n^2$ or even larger runtimes are needed in the worst case (under complexity assumptions). This is the class of grammar-compressions containing most popular methods such as the Lempel-Ziv family. These schemes are more compressing than the simple RLE, but alas, we prove that performing computations on them is much harder. △ Less

Submitted 27 October, 2020; originally announced October 2020.

Comments: NeurIPS'20, 20 pages

arXiv:2008.13374 [pdf, ps, other]

Active Local Learning

Authors: Arturs Backurs, Avrim Blum, Neha Gupta

Abstract: In this work we consider active local learning: given a query point $x$, and active access to an unlabeled training set $S$, output the prediction $h(x)$ of a near-optimal $h \in H$ using significantly fewer labels than would be needed to actually learn $h$ fully. In particular, the number of label queries should be independent of the complexity of $H$, and the function $h$ should be well-defined,… ▽ More In this work we consider active local learning: given a query point $x$, and active access to an unlabeled training set $S$, output the prediction $h(x)$ of a near-optimal $h \in H$ using significantly fewer labels than would be needed to actually learn $h$ fully. In particular, the number of label queries should be independent of the complexity of $H$, and the function $h$ should be well-defined, independent of $x$. This immediately also implies an algorithm for distance estimation: estimating the value $opt(H)$ from many fewer labels than needed to actually learn a near-optimal $h \in H$, by running local learning on a few random query points and computing the average error. For the hypothesis class consisting of functions supported on the interval $[0,1]$ with Lipschitz constant bounded by $L$, we present an algorithm that makes $O(({1 / ε^6}) \log(1/ε))$ label queries from an unlabeled pool of $O(({L / ε^4})\log(1/ε))$ samples. It estimates the distance to the best hypothesis in the class to an additive error of $ε$ for an arbitrary underlying distribution. We further generalize our algorithm to more than one dimensions. We emphasize that the number of labels used is independent of the complexity of the hypothesis class which depends on $L$. Furthermore, we give an algorithm to locally estimate the values of a near-optimal function at a few query points of interest with number of labels independent of $L$. We also consider the related problem of approximating the minimum error that can be achieved by the Nadaraya-Watson estimator under a linear diagonal transformation with eigenvalues coming from a small range. For a $d$-dimensional pointset of size $N$, our algorithm achieves an additive approximation of $ε$, makes $\tilde{O}({d}/{ε^2})$ queries and runs in $\tilde{O}({d^2}/{ε^{d+4}}+{dN}/{ε^2})$ time. △ Less

Submitted 3 September, 2020; v1 submitted 31 August, 2020; originally announced August 2020.

Comments: Published at COLT 2020

arXiv:2008.10577 [pdf, ps, other]

Fast and Simple Modular Subset Sum

Authors: Kyriakos Axiotis, Arturs Backurs, Karl Bringmann, Ce Jin, Vasileios Nakos, Christos Tzamos, Hongxun Wu

Abstract: We revisit the Subset Sum problem over the finite cyclic group $\mathbb{Z}_m$ for some given integer $m$. A series of recent works has provided near-optimal algorithms for this problem under the Strong Exponential Time Hypothesis. Koiliaris and Xu (SODA'17, TALG'19) gave a deterministic algorithm running in time $\tilde{O}(m^{5/4})$, which was later improved to $O(m \log^7 m)$ randomized time by A… ▽ More We revisit the Subset Sum problem over the finite cyclic group $\mathbb{Z}_m$ for some given integer $m$. A series of recent works has provided near-optimal algorithms for this problem under the Strong Exponential Time Hypothesis. Koiliaris and Xu (SODA'17, TALG'19) gave a deterministic algorithm running in time $\tilde{O}(m^{5/4})$, which was later improved to $O(m \log^7 m)$ randomized time by Axiotis et al. (SODA'19). In this work, we present two simple algorithms for the Modular Subset Sum problem running in near-linear time in $m$, both efficiently implementing Bellman's iteration over $\mathbb{Z}_m$. The first one is a randomized algorithm running in time $O(m \log^2 m)$, that is based solely on rolling hash and an elementary data-structure for prefix sums; to illustrate its simplicity we provide a short and efficient implementation of the algorithm in Python. Our second solution is a deterministic algorithm running in time $O(m\ \mathrm{polylog}\ m)$, that uses dynamic data structures for string manipulation. We further show that the techniques developed in this work can also lead to simple algorithms for the All Pairs Non-Decreasing Paths Problem (APNP) on undirected graphs, matching the near-optimal running time of $\tilde{O}(n^2)$ provided in the recent work of Duan et al. (ICALP'19). △ Less

Submitted 30 October, 2020; v1 submitted 24 August, 2020; originally announced August 2020.

Comments: accepted at SOSA'21

arXiv:2004.05494 [pdf, other]

Submodular Clustering in Low Dimensions

Authors: Arturs Backurs, Sariel Har-Peled

Abstract: We study a clustering problem where the goal is to maximize the coverage of the input points by $k$ chosen centers. Specifically, given a set of $n$ points $P \subseteq \mathbb{R}^d$, the goal is to pick $k$ centers $C \subseteq \mathbb{R}^d$ that maximize the service $ \sum_{p \in P}\mathsf{\varphi}\bigl( \mathsf{d}(p,C) \bigr) $ to the points $P$, where $\mathsf{d}(p,C)$ is the distance of $p$ t… ▽ More We study a clustering problem where the goal is to maximize the coverage of the input points by $k$ chosen centers. Specifically, given a set of $n$ points $P \subseteq \mathbb{R}^d$, the goal is to pick $k$ centers $C \subseteq \mathbb{R}^d$ that maximize the service $ \sum_{p \in P}\mathsf{\varphi}\bigl( \mathsf{d}(p,C) \bigr) $ to the points $P$, where $\mathsf{d}(p,C)$ is the distance of $p$ to its nearest center in $C$, and $\mathsf{\varphi}$ is a non-increasing service function $\mathsf{\varphi} : \mathbb{R}^+ \to \mathbb{R}^+$. This includes problems of placing $k$ base stations as to maximize the total bandwidth to the clients -- indeed, the closer the client is to its nearest base station, the more data it can send/receive, and the target is to place $k$ base stations so that the total bandwidth is maximized. We provide an $n^{\varepsilon^{-O(d)}}$ time algorithm for this problem that achieves a $(1-\varepsilon)$-approximation. Notably, the runtime does not depend on the parameter $k$ and it works for an arbitrary non-increasing service function $\mathsf{\varphi} : \mathbb{R}^+ \to \mathbb{R}^+$. △ Less

Submitted 11 April, 2020; originally announced April 2020.

Comments: To appear in SWAT 20

arXiv:1910.04126 [pdf, other]

Scalable Nearest Neighbor Search for Optimal Transport

Authors: Arturs Backurs, Yihe Dong, Piotr Indyk, Ilya Razenshteyn, Tal Wagner

Abstract: The Optimal Transport (a.k.a. Wasserstein) distance is an increasingly popular similarity measure for rich data domains, such as images or text documents. This raises the necessity for fast nearest neighbor search algorithms according to this distance, which poses a substantial computational bottleneck on massive datasets. In this work we introduce Flowtree, a fast and accurate approximation algor… ▽ More The Optimal Transport (a.k.a. Wasserstein) distance is an increasingly popular similarity measure for rich data domains, such as images or text documents. This raises the necessity for fast nearest neighbor search algorithms according to this distance, which poses a substantial computational bottleneck on massive datasets. In this work we introduce Flowtree, a fast and accurate approximation algorithm for the Wasserstein-$1$ distance. We formally analyze its approximation factor and running time. We perform extensive experimental evaluation of nearest neighbor search algorithms in the $W_1$ distance on real-world dataset. Our results show that compared to previous state of the art, Flowtree achieves up to $7.4$ times faster running time. △ Less

Submitted 28 September, 2020; v1 submitted 9 October, 2019; originally announced October 2019.

Comments: ICML 2020

arXiv:1902.03519 [pdf, other]

Scalable Fair Clustering

Authors: Arturs Backurs, Piotr Indyk, Krzysztof Onak, Baruch Schieber, Ali Vakilian, Tal Wagner

Abstract: We study the fair variant of the classic $k$-median problem introduced by Chierichetti et al. [2017]. In the standard $k$-median problem, given an input pointset $P$, the goal is to find $k$ centers $C$ and assign each input point to one of the centers in $C$ such that the average distance of points to their cluster center is minimized. In the fair variant of $k$-median, the points are colored,… ▽ More We study the fair variant of the classic $k$-median problem introduced by Chierichetti et al. [2017]. In the standard $k$-median problem, given an input pointset $P$, the goal is to find $k$ centers $C$ and assign each input point to one of the centers in $C$ such that the average distance of points to their cluster center is minimized. In the fair variant of $k$-median, the points are colored, and the goal is to minimize the same average distance objective while ensuring that all clusters have an "approximately equal" number of points of each color. Chierichetti et al. proposed a two-phase algorithm for fair $k$-clustering. In the first step, the pointset is partitioned into subsets called fairlets that satisfy the fairness requirement and approximately preserve the $k$-median objective. In the second step, fairlets are merged into $k$ clusters by one of the existing $k$-median algorithms. The running time of this algorithm is dominated by the first step, which takes super-quadratic time. In this paper, we present a practical approximate fairlet decomposition algorithm that runs in nearly linear time. Our algorithm additionally allows for finer control over the balance of resulting clusters than the original work. We complement our theoretical bounds with empirical evaluation. △ Less

Submitted 10 June, 2019; v1 submitted 9 February, 2019; originally announced February 2019.

Comments: ICML 2019

arXiv:1808.08494 [pdf, other]

Towards Tight Approximation Bounds for Graph Diameter and Eccentricities

Authors: Arturs Backurs, Liam Roditty, Gilad Segal, Virginia Vassilevska Williams, Nicole Wein

Abstract: Among the most important graph parameters is the Diameter, the largest distance between any two vertices. There are no known very efficient algorithms for computing the Diameter exactly. Thus, much research has been devoted to how fast this parameter can be approximated. Chechik et al. showed that the diameter can be approximated within a multiplicative factor of $3/2$ in $\tilde{O}(m^{3/2})$ time… ▽ More Among the most important graph parameters is the Diameter, the largest distance between any two vertices. There are no known very efficient algorithms for computing the Diameter exactly. Thus, much research has been devoted to how fast this parameter can be approximated. Chechik et al. showed that the diameter can be approximated within a multiplicative factor of $3/2$ in $\tilde{O}(m^{3/2})$ time. Furthermore, Roditty and Vassilevska W. showed that unless the Strong Exponential Time Hypothesis (SETH) fails, no $O(n^{2-ε})$ time algorithm can achieve an approximation factor better than $3/2$ in sparse graphs. Thus the above algorithm is essentially optimal for sparse graphs for approximation factors less than $3/2$. It was, however, completely plausible that a $3/2$-approximation is possible in linear time. In this work we conditionally rule out such a possibility by showing that unless SETH fails no $O(m^{3/2-ε})$ time algorithm can achieve an approximation factor better than $5/3$. Another fundamental set of graph parameters are the Eccentricities. The Eccentricity of a vertex $v$ is the distance between $v$ and the farthest vertex from $v$. Chechik et al. showed that the Eccentricities of all vertices can be approximated within a factor of $5/3$ in $\tilde{O}(m^{3/2})$ time and Abboud et al. showed that no $O(n^{2-ε})$ algorithm can achieve better than $5/3$ approximation in sparse graphs. We show that the runtime of the $5/3$ approximation algorithm is also optimal under SETH. We also show that no near-linear time algorithm can achieve a better than $2$ approximation for the Eccentricities and that this is essentially tight: we give an algorithm that approximates Eccentricities within a $2+δ$ factor in $\tilde{O}(m/δ)$ time for any $0<δ<1$. This beats all Eccentricity algorithms in Cairo et al. △ Less

Submitted 29 March, 2021; v1 submitted 25 August, 2018; originally announced August 2018.

Comments: Revised to implement referee comments

arXiv:1807.04825 [pdf, ps, other]

Fast Modular Subset Sum using Linear Sketching

Authors: Kyriakos Axiotis, Arturs Backurs, Christos Tzamos

Abstract: Given n positive integers, the Modular Subset Sum problem asks if a subset adds up to a given target t modulo a given integer m. This is a natural generalization of the Subset Sum problem (where m=+\infty) with ties to additive combinatorics and cryptography. Recently, in [Bringmann, SODA'17] and [Koiliaris and Xu, SODA'17], efficient algorithms have been developed for the non-modular case, runn… ▽ More Given n positive integers, the Modular Subset Sum problem asks if a subset adds up to a given target t modulo a given integer m. This is a natural generalization of the Subset Sum problem (where m=+\infty) with ties to additive combinatorics and cryptography. Recently, in [Bringmann, SODA'17] and [Koiliaris and Xu, SODA'17], efficient algorithms have been developed for the non-modular case, running in near-linear pseudo-polynomial time. For the modular case, however, the best known algorithm by Koiliaris and Xu [Koiliaris and Xu, SODA'17] runs in time O~(m^{5/4}). In this paper, we present an algorithm running in time O~(m), which matches a recent conditional lower bound of [Abboud et al.'17] based on the Strong Exponential Time Hypothesis. Interestingly, in contrast to most previous results on Subset Sum, our algorithm does not use the Fast Fourier Transform. Instead, it is able to simulate the "textbook" Dynamic Programming algorithm much faster, using ideas from linear sketching. This is one of the first applications of sketching-based techniques to obtain fast algorithms for combinatorial problems in an offline setting. △ Less

Submitted 12 July, 2018; originally announced July 2018.

arXiv:1803.00796 [pdf, other]

doi 10.1109/FOCS.2017.26

Fine-Grained Complexity of Analyzing Compressed Data: Quantifying Improvements over Decompress-And-Solve

Authors: Amir Abboud, Arturs Backurs, Karl Bringmann, Marvin Künnemann

Abstract: Can we analyze data without decompressing it? As our data keeps growing, understanding the time complexity of problems on compressed inputs, rather than in convenient uncompressed forms, becomes more and more relevant. Suppose we are given a compression of size $n$ of data that originally has size $N$, and we want to solve a problem with time complexity $T(\cdot)$. The naive strategy of "decompres… ▽ More Can we analyze data without decompressing it? As our data keeps growing, understanding the time complexity of problems on compressed inputs, rather than in convenient uncompressed forms, becomes more and more relevant. Suppose we are given a compression of size $n$ of data that originally has size $N$, and we want to solve a problem with time complexity $T(\cdot)$. The naive strategy of "decompress-and-solve" gives time $T(N)$, whereas "the gold standard" is time $T(n)$: to analyze the compression as efficiently as if the original data was small. We restrict our attention to data in the form of a string (text, files, genomes, etc.) and study the most ubiquitous tasks. While the challenge might seem to depend heavily on the specific compression scheme, most methods of practical relevance (Lempel-Ziv-family, dictionary methods, and others) can be unified under the elegant notion of Grammar Compressions. A vast literature, across many disciplines, established this as an influential notion for Algorithm design. We introduce a framework for proving (conditional) lower bounds in this field, allowing us to assess whether decompress-and-solve can be improved, and by how much. Our main results are: - The $O(nN\sqrt{\log{N/n}})$ bound for LCS and the $O(\min\{N \log N, nM\})$ bound for Pattern Matching with Wildcards are optimal up to $N^{o(1)}$ factors, under the Strong Exponential Time Hypothesis. (Here, $M$ denotes the uncompressed length of the compressed pattern.) - Decompress-and-solve is essentially optimal for Context-Free Grammar Parsing and RNA Folding, under the $k$-Clique conjecture. - We give an algorithm showing that decompress-and-solve is not optimal for Disjointness. △ Less

Submitted 2 March, 2018; originally announced March 2018.

Comments: Presented at FOCS'17. Full version. 63 pages

ACM Class: F.2.2

arXiv:1704.02958 [pdf, ps, other]

On the Fine-Grained Complexity of Empirical Risk Minimization: Kernel Methods and Neural Networks

Authors: Arturs Backurs, Piotr Indyk, Ludwig Schmidt

Abstract: Empirical risk minimization (ERM) is ubiquitous in machine learning and underlies most supervised learning methods. While there has been a large body of work on algorithms for various ERM problems, the exact computational complexity of ERM is still not understood. We address this issue for multiple popular ERM problems including kernel SVMs, kernel ridge regression, and training the final layer of… ▽ More Empirical risk minimization (ERM) is ubiquitous in machine learning and underlies most supervised learning methods. While there has been a large body of work on algorithms for various ERM problems, the exact computational complexity of ERM is still not understood. We address this issue for multiple popular ERM problems including kernel SVMs, kernel ridge regression, and training the final layer of a neural network. In particular, we give conditional hardness results for these problems based on complexity-theoretic assumptions such as the Strong Exponential Time Hypothesis. Under these assumptions, we show that there are no algorithms that solve the aforementioned ERM problems to high accuracy in sub-quadratic time. We also give similar hardness results for computing the gradient of the empirical loss, which is the main computational burden in many non-convex learning tasks. △ Less

Submitted 10 April, 2017; originally announced April 2017.

arXiv:1607.04229 [pdf, other]

Improving Viterbi is Hard: Better Runtimes Imply Faster Clique Algorithms

Authors: Arturs Backurs, Christos Tzamos

Abstract: The classic algorithm of Viterbi computes the most likely path in a Hidden Markov Model (HMM) that results in a given sequence of observations. It runs in time $O(Tn^2)$ given a sequence of $T$ observations from a HMM with $n$ states. Despite significant interest in the problem and prolonged effort by different communities, no known algorithm achieves more than a polylogarithmic speedup. In this… ▽ More The classic algorithm of Viterbi computes the most likely path in a Hidden Markov Model (HMM) that results in a given sequence of observations. It runs in time $O(Tn^2)$ given a sequence of $T$ observations from a HMM with $n$ states. Despite significant interest in the problem and prolonged effort by different communities, no known algorithm achieves more than a polylogarithmic speedup. In this paper, we explain this difficulty by providing matching conditional lower bounds. We show that the Viterbi algorithm runtime is optimal up to subpolynomial factors even when the number of distinct observations is small. Our lower bounds are based on assumptions that the best known algorithms for the All-Pairs Shortest Paths problem (APSP) and for the Max-Weight $k$-Clique problem in edge-weighted graphs are essentially tight. Finally, using a recent algorithm by Green Larsen and Williams for online Boolean matrix-vector multiplication, we get a $2^{Ω(\sqrt {\log n})}$ speedup for the Viterbi algorithm when there are few distinct transition probabilities in the HMM. △ Less

Submitted 3 November, 2016; v1 submitted 14 July, 2016; originally announced July 2016.

arXiv:1602.05837 [pdf, ps, other]

Tight Hardness Results for Maximum Weight Rectangles

Authors: Arturs Backurs, Nishanth Dikkala, Christos Tzamos

Abstract: Given $n$ weighted points (positive or negative) in $d$ dimensions, what is the axis-aligned box which maximizes the total weight of the points it contains? The best known algorithm for this problem is based on a reduction to a related problem, the Weighted Depth problem [T. M. Chan, FOCS'13], and runs in time $O(n^d)$. It was conjectured [Barbay et al., CCCG'13] that this runtime is tight up to… ▽ More Given $n$ weighted points (positive or negative) in $d$ dimensions, what is the axis-aligned box which maximizes the total weight of the points it contains? The best known algorithm for this problem is based on a reduction to a related problem, the Weighted Depth problem [T. M. Chan, FOCS'13], and runs in time $O(n^d)$. It was conjectured [Barbay et al., CCCG'13] that this runtime is tight up to subpolynomial factors. We answer this conjecture affirmatively by providing a matching conditional lower bound. We also provide conditional lower bounds for the special case when points are arranged in a grid (a well studied problem known as Maximum Subarray problem) as well as for other related problems. All our lower bounds are based on assumptions that the best known algorithms for the All-Pairs Shortest Paths problem (APSP) and for the Max-Weight k-Clique problem in edge-weighted graphs are essentially optimal. △ Less

Submitted 2 March, 2016; v1 submitted 18 February, 2016; originally announced February 2016.

arXiv:1511.07070 [pdf, ps, other]

Which Regular Expression Patterns are Hard to Match?

Authors: Arturs Backurs, Piotr Indyk

Abstract: Regular expressions constitute a fundamental notion in formal language theory and are frequently used in computer science to define search patterns. A classic algorithm for these problems constructs and simulates a non-deterministic finite automaton corresponding to the expression, resulting in an $O(mn)$ running time (where $m$ is the length of the pattern and $n$ is the length of the text). This… ▽ More Regular expressions constitute a fundamental notion in formal language theory and are frequently used in computer science to define search patterns. A classic algorithm for these problems constructs and simulates a non-deterministic finite automaton corresponding to the expression, resulting in an $O(mn)$ running time (where $m$ is the length of the pattern and $n$ is the length of the text). This running time can be improved slightly (by a polylogarithmic factor), but no significantly faster solutions are known. At the same time, much faster algorithms exist for various special cases of regular expressions, including dictionary matching, wildcard matching, subset matching, word break problem etc. In this paper, we show that the complexity of regular expression matching can be characterized based on its {\em depth} (when interpreted as a formula). Our results hold for expressions involving concatenation, OR, Kleene star and Kleene plus. For regular expressions of depth two (involving any combination of the above operators), we show the following dichotomy: matching and membership testing can be solved in near-linear time, except for "concatenations of stars", which cannot be solved in strongly sub-quadratic time assuming the Strong Exponential Time Hypothesis (SETH). For regular expressions of depth three the picture is more complex. Nevertheless, we show that all problems can either be solved in strongly sub-quadratic time, or cannot be solved in strongly sub-quadratic time assuming SETH. An intriguing special case of membership testing involves regular expressions of the form "a star of an OR of concatenations", e.g., $[a|ab|bc]^*$. This corresponds to the so-called {\em word break} problem, for which a dynamic programming algorithm with a runtime of (roughly) $O(n\sqrt{m})$ is known. We show that the latter bound is not tight and improve the runtime to $O(nm^{0.44\ldots})$. △ Less

Submitted 26 September, 2016; v1 submitted 22 November, 2015; originally announced November 2015.

arXiv:1510.04622 [pdf, other]

Subtree Isomorphism Revisited

Authors: Amir Abboud, Arturs Backurs, Thomas Dueholm Hansen, Virginia Vassilevska Williams, Or Zamir

Abstract: The Subtree Isomorphism problem asks whether a given tree is contained in another given tree. The problem is of fundamental importance and has been studied since the 1960s. For some variants, e.g., ordered trees, near-linear time algorithms are known, but for the general case truly subquadratic algorithms remain elusive. Our first result is a reduction from the Orthogonal Vectors problem to Subt… ▽ More The Subtree Isomorphism problem asks whether a given tree is contained in another given tree. The problem is of fundamental importance and has been studied since the 1960s. For some variants, e.g., ordered trees, near-linear time algorithms are known, but for the general case truly subquadratic algorithms remain elusive. Our first result is a reduction from the Orthogonal Vectors problem to Subtree Isomorphism, showing that a truly subquadratic algorithm for the latter refutes the Strong Exponential Time Hypothesis (SETH). In light of this conditional lower bound, we focus on natural special cases for which no truly subquadratic algorithms are known. We classify these cases against the quadratic barrier, showing in particular that: -- Even for binary, rooted trees, a truly subquadratic algorithm refutes SETH. -- Even for rooted trees of depth $O(\log\log{n})$, where $n$ is the total number of vertices, a truly subquadratic algorithm refutes SETH. -- For every constant $d$, there is a constant $ε_d>0$ and a randomized, truly subquadratic algorithm for degree-$d$ rooted trees of depth at most $(1+ ε_d) \log_{d}{n}$. In particular, there is an $O(\min\{ 2.85^h ,n^2 \})$ algorithm for binary trees of depth $h$. Our reductions utilize new "tree gadgets" that are likely useful for future SETH-based lower bounds for problems on trees. Our upper bounds apply a folklore result from randomized decision tree complexity. △ Less

Submitted 15 October, 2015; originally announced October 2015.

arXiv:1504.01431 [pdf, ps, other]

If the Current Clique Algorithms are Optimal, so is Valiant's Parser

Authors: Amir Abboud, Arturs Backurs, Virginia Vassilevska Williams

Abstract: The CFG recognition problem is: given a context-free grammar $\mathcal{G}$ and a string $w$ of length $n$, decide if $w$ can be obtained from $\mathcal{G}$. This is the most basic parsing question and is a core computer science problem. Valiant's parser from 1975 solves the problem in $O(n^ω)$ time, where $ω<2.373$ is the matrix multiplication exponent. Dozens of parsing algorithms have been propo… ▽ More The CFG recognition problem is: given a context-free grammar $\mathcal{G}$ and a string $w$ of length $n$, decide if $w$ can be obtained from $\mathcal{G}$. This is the most basic parsing question and is a core computer science problem. Valiant's parser from 1975 solves the problem in $O(n^ω)$ time, where $ω<2.373$ is the matrix multiplication exponent. Dozens of parsing algorithms have been proposed over the years, yet Valiant's upper bound remains unbeaten. The best combinatorial algorithms have mildly subcubic $O(n^3/\log^3{n})$ complexity. Lee (JACM'01) provided evidence that fast matrix multiplication is needed for CFG parsing, and that very efficient and practical algorithms might be hard or even impossible to obtain. Lee showed that any algorithm for a more general parsing problem with running time $O(|\mathcal{G}|\cdot n^{3-\varepsilon})$ can be converted into a surprising subcubic algorithm for Boolean Matrix Multiplication. Unfortunately, Lee's hardness result required that the grammar size be $|\mathcal{G}|=Ω(n^6)$. Nothing was known for the more relevant case of constant size grammars. In this work, we prove that any improvement on Valiant's algorithm, even for constant size grammars, either in terms of runtime or by avoiding the inefficiencies of fast matrix multiplication, would imply a breakthrough algorithm for the $k$-Clique problem: given a graph on $n$ nodes, decide if there are $k$ that form a clique. Besides classifying the complexity of a fundamental problem, our reduction has led us to similar lower bounds for more modern and well-studied cubic time problems for which faster algorithms are highly desirable in practice: RNA Folding, a central problem in computational biology, and Dyck Language Edit Distance, answering an open question of Saha (FOCS'14). △ Less

Submitted 5 November, 2015; v1 submitted 6 April, 2015; originally announced April 2015.

arXiv:1504.01076 [pdf, ps, other]

Nearly-optimal bounds for sparse recovery in generic norms, with applications to $k$-median sketching

Authors: Arturs Backurs, Piotr Indyk, Eric Price, Ilya Razenshteyn, David P. Woodruff

Abstract: We initiate the study of trade-offs between sparsity and the number of measurements in sparse recovery schemes for generic norms. Specifically, for a norm $\|\cdot\|$, sparsity parameter $k$, approximation factor $K>0$, and probability of failure $P>0$, we ask: what is the minimal value of $m$ so that there is a distribution over $m \times n$ matrices $A$ with the property that for any $x$, given… ▽ More We initiate the study of trade-offs between sparsity and the number of measurements in sparse recovery schemes for generic norms. Specifically, for a norm $\|\cdot\|$, sparsity parameter $k$, approximation factor $K>0$, and probability of failure $P>0$, we ask: what is the minimal value of $m$ so that there is a distribution over $m \times n$ matrices $A$ with the property that for any $x$, given $Ax$, we can recover a $k$-sparse approximation to $x$ in the given norm with probability at least $1-P$? We give a partial answer to this problem, by showing that for norms that admit efficient linear sketches, the optimal number of measurements $m$ is closely related to the doubling dimension of the metric induced by the norm $\|\cdot\|$ on the set of all $k$-sparse vectors. By applying our result to specific norms, we cast known measurement bounds in our general framework (for the $\ell_p$ norms, $p \in [1,2]$) as well as provide new, measurement-efficient schemes (for the Earth-Mover Distance norm). The latter result directly implies more succinct linear sketches for the well-studied planar $k$-median clustering problem. Finally, our lower bound for the doubling dimension of the EMD norm enables us to address the open question of [Frahling-Sohler, STOC'05] about the space complexity of clustering problems in the dynamic streaming model. △ Less

Submitted 4 April, 2015; originally announced April 2015.

Comments: 29 pages

arXiv:1501.07053 [pdf, ps, other]

Quadratic-Time Hardness of LCS and other Sequence Similarity Measures

Authors: Amir Abboud, Arturs Backurs, Virginia Vassilevska Williams

Abstract: Two important similarity measures between sequences are the longest common subsequence (LCS) and the dynamic time warping distance (DTWD). The computations of these measures for two given sequences are central tasks in a variety of applications. Simple dynamic programming algorithms solve these tasks in $O(n^2)$ time, and despite an extensive amount of research, no algorithms with significantly be… ▽ More Two important similarity measures between sequences are the longest common subsequence (LCS) and the dynamic time warping distance (DTWD). The computations of these measures for two given sequences are central tasks in a variety of applications. Simple dynamic programming algorithms solve these tasks in $O(n^2)$ time, and despite an extensive amount of research, no algorithms with significantly better worst case upper bounds are known. In this paper, we show that an $O(n^{2-ε})$ time algorithm, for some $ε>0$, for computing the LCS or the DTWD of two sequences of length $n$ over a constant size alphabet, refutes the popular Strong Exponential Time Hypothesis (SETH). Moreover, we show that computing the LCS of $k$ strings over an alphabet of size $O(k)$ cannot be done in $O(n^{k-ε})$ time, for any $ε>0$, under SETH. Finally, we also address the time complexity of approximating the DTWD of two strings in truly subquadratic time. △ Less

Submitted 29 January, 2015; v1 submitted 28 January, 2015; originally announced January 2015.

arXiv:1412.0348 [pdf, ps, other]

Edit Distance Cannot Be Computed in Strongly Subquadratic Time (unless SETH is false)

Authors: Arturs Backurs, Piotr Indyk

Abstract: The edit distance (a.k.a. the Levenshtein distance) between two strings is defined as the minimum number of insertions, deletions or substitutions of symbols needed to transform one string into another. The problem of computing the edit distance between two strings is a classical computational task, with a well-known algorithm based on dynamic programming. Unfortunately, all known algorithms for t… ▽ More The edit distance (a.k.a. the Levenshtein distance) between two strings is defined as the minimum number of insertions, deletions or substitutions of symbols needed to transform one string into another. The problem of computing the edit distance between two strings is a classical computational task, with a well-known algorithm based on dynamic programming. Unfortunately, all known algorithms for this problem run in nearly quadratic time. In this paper we provide evidence that the near-quadratic running time bounds known for the problem of computing edit distance might be tight. Specifically, we show that, if the edit distance can be computed in time $O(n^{2-δ})$ for some constant $δ>0$, then the satisfiability of conjunctive normal form formulas with $N$ variables and $M$ clauses can be solved in time $M^{O(1)} 2^{(1-ε)N}$ for a constant $ε>0$. The latter result would violate the Strong Exponential Time Hypothesis, which postulates that such algorithms do not exist. △ Less

Submitted 15 August, 2017; v1 submitted 30 November, 2014; originally announced December 2014.

Comments: STOC'15

arXiv:1302.4625 [pdf, ps, other]

On the sum of $L1$ influences

Authors: Artūrs Bačkurs, Mohammad Bavarian

Abstract: For a function $f$ over the discrete cube, the total $L_1$ influence of $f$ is defined as $\sum_{i=1}^n \|\partial_i f\|_1$, where $\partial_i f$ denotes the discrete derivative of $f$ in the direction $i$. In this work, we show that the total $L_1$ influence of a $[-1,1]$-valued function $f$ can be upper bounded by a polynomial in the degree of $f$, resolving affirmatively an open problem of Aaro… ▽ More For a function $f$ over the discrete cube, the total $L_1$ influence of $f$ is defined as $\sum_{i=1}^n \|\partial_i f\|_1$, where $\partial_i f$ denotes the discrete derivative of $f$ in the direction $i$. In this work, we show that the total $L_1$ influence of a $[-1,1]$-valued function $f$ can be upper bounded by a polynomial in the degree of $f$, resolving affirmatively an open problem of Aaronson and Ambainis (ITCS 2011). The main challenge here is that the $L_1$ influences do not admit an easy Fourier analytic representation. In our proof, we overcome this problem by introducing a new analytic quantity $\mathcal I_p(f)$, relating this new quantity to the total $L_1$ influence of $f$. This new quantity, which roughly corresponds to an average of the total $L_1$ influences of some ensemble of functions related to $f$, has the benefit of being much easier to analyze, allowing us to resolve the problem of Aaronson and Ambainis. We also give an application of the theorem to graph theory, and discuss the connection between the study of bounded functions over the cube and the quantum query complexity of partial functions where Aaronson and Ambainis encountered this question. △ Less

Submitted 12 April, 2014; v1 submitted 19 February, 2013; originally announced February 2013.

Comments: Proceedings of CCC (2014)

arXiv:1112.3337 [pdf, ps, other]

Search by quantum walks on two-dimensional grid without amplitude amplification

Authors: Andris Ambainis, Arturs Backurs, Nikolajs Nahimovs, Raitis Ozols, Alexander Rivosh

Abstract: We study search by quantum walk on a finite two dimensional grid. The algorithm of Ambainis, Kempe, Rivosh (quant-ph/0402107) takes O(\sqrt{N log N}) steps and finds a marked location with probability O(1/log N) for grid of size \sqrt{N} * \sqrt{N}. This probability is small, thus amplitude amplification is needed to achieve Θ(1) success probability. The amplitude amplification adds an additional… ▽ More We study search by quantum walk on a finite two dimensional grid. The algorithm of Ambainis, Kempe, Rivosh (quant-ph/0402107) takes O(\sqrt{N log N}) steps and finds a marked location with probability O(1/log N) for grid of size \sqrt{N} * \sqrt{N}. This probability is small, thus amplitude amplification is needed to achieve Θ(1) success probability. The amplitude amplification adds an additional O(\sqrt{log N}) factor to the number of steps, making it O(\sqrt{N} log N). In this paper, we show that despite a small probability to find a marked location, the probability to be within an O(\sqrt{N}) neighbourhood (at an O(\sqrt[4]{N}) distance) of the marked location is Θ(1). This allows to skip amplitude amplification step and leads to an O(\sqrt{log N}) speed-up. We describe the results of numerical experiments supporting this idea, and we prove this fact analytically. △ Less

Submitted 14 December, 2011; originally announced December 2011.

Comments: 22 pages, 3 figures

arXiv:1112.3330 [pdf, other]

Quantum strategies are better than classical in almost any XOR game

Authors: Andris Ambainis, Arturs Backurs, Kaspars Balodis, Dmitry Kravcenko, Raitis Ozols, Juris Smotrovs, Madars Virza

Abstract: We initiate a study of random instances of nonlocal games. We show that quantum strategies are better than classical for almost any 2-player XOR game. More precisely, for large n, the entangled value of a random 2-player XOR game with n questions to every player is at least 1.21... times the classical value, for 1-o(1) fraction of all 2-player XOR games. We initiate a study of random instances of nonlocal games. We show that quantum strategies are better than classical for almost any 2-player XOR game. More precisely, for large n, the entangled value of a random 2-player XOR game with n questions to every player is at least 1.21... times the classical value, for 1-o(1) fraction of all 2-player XOR games. △ Less

Submitted 14 December, 2011; originally announced December 2011.

Comments: 22 pages, 1 figure

Showing 1–30 of 30 results for author: Backurs, A