research-article

Public Access

Bandits and Experts in Metric Spaces

Authors:

Robert Kleinberg,

Aleksandrs Slivkins,

Eli UpfalAuthors Info & Claims

Journal of the ACM (JACM), Volume 66, Issue 4

Article No.: 30, Pages 1 - 77

https://doi.org/10.1145/3299873

Published: 31 May 2019 Publication History

All formats PDF

Abstract

In a multi-armed bandit problem, an online algorithm chooses from a set of strategies in a sequence of trials to maximize the total payoff of the chosen strategies. While the performance of bandit algorithms with a small finite strategy set is well understood, bandit problems with large strategy sets are still a topic of active investigation, motivated by practical applications, such as online auctions and web advertisement. The goal of such research is to identify broad and natural classes of strategy sets and payoff functions that enable the design of efficient solutions.

In this work, we study a general setting for the multi-armed bandit problem, in which the strategies form a metric space, and the payoff function satisfies a Lipschitz condition with respect to the metric. We refer to this problem as the Lipschitz MAB problem. We present a solution for the multi-armed bandit problem in this setting. That is, for every metric space, we define an isometry invariant that bounds from below the performance of Lipschitz MAB algorithms for this metric space, and we present an algorithm that comes arbitrarily close to meeting this bound. Furthermore, our technique gives even better results for benign payoff functions. We also address the full-feedback (“best expert”) version of the problem, where after every round the payoffs from all arms are revealed.

References

[1]

Yasin Abbasi-Yadkori, Dávid Pál, and Csaba Szepesvári. 2011. Improved algorithms for linear stochastic bandits. In Proceedings of the 25th Advances in Neural Information Processing Systems (NIPS’11). 2312--2320.

Digital Library

[2]

Jacob Abernethy, Elad Hazan, and Alexander Rakhlin. 2008. Competing in the Dark: An Efficient Algorithm for Bandit Linear Optimization. In Proceedings of the 21st Conference on Learning Theory (COLT’08). 263--274.

[3]

Ittai Abraham and Dahlia Malkhi. 2005. Name independent routing for growth bounded networks. In Proceedings of the 17th ACM Symposium on Parallel Algorithms and Architectures (SPAA). 49--55.

Digital Library

[4]

Rajeev Agrawal. 1995. The continuum-armed bandit problem. SIAM J. Control Optimiz. 33, 6 (1995), 1926--1951.

Digital Library

[5]

Shipra Agrawal, Vashist Avadhanula, Vineet Goyal, and Assaf Zeevi. 2016. A near-optimal exploration-exploitation approach for assortment selection. In Proceedings of the 17th ACM Conference on Economics and Computation (ACM EC’16). 599--600.

Digital Library

[6]

Shipra Agrawal and Nikhil R. Devanur. 2014. Bandits with concave rewards and convex knapsacks. In Proceedings of the 15th ACM Conference on Economics and Computation (ACM EC’14).

Digital Library

[7]

Kareem Amin, Michael Kearns, and Umar Syed. 2011. Bandits, query learning, and the haystack dimension. In Proceedings of the 24th Conference on Learning Theory (COLT’11).

[8]

J.-Y. Audibert and S. Bubeck. 2010. Regret bounds and minimax policies under partial monitoring. J. Mach. Learn. Res. 11 (2010), 2785--2836.

Digital Library

[9]

J.-Y. Audibert, R. Munos, and Cs. Szepesvári. 2009. Exploration-exploitation trade-off using variance estimates in multi-armed bandits. Theoret. Comput. Sci. 410 (2009), 1876--1902.

Digital Library

[10]

Peter Auer. 2002. Using confidence bounds for exploitation-exploration trade-offs. J. Mach. Learn. Res. 3 (2002), 397--422.

Digital Library

[11]

Peter Auer, Nicolò Cesa-Bianchi, and Paul Fischer. 2002. Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47, 2--3 (2002), 235--256.

Digital Library

[12]

Peter Auer, Nicolò Cesa-Bianchi, Yoav Freund, and Robert E. Schapire. 2002. The nonstochastic multiarmed bandit problem. SIAM J. Comput. 32, 1 (2002), 48--77.

Digital Library

[13]

Peter Auer and Ronald Ortner. 2010. UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem. Periodica Mathematica Hungarica 61 (2010), 55--65.

[14]

Peter Auer, Ronald Ortner, and Csaba Szepesvári. 2007. Improved rates for the stochastic continuum-armed bandit problem. In Proceedings of the 20th Conference on Learning Theory (COLT’07). 454--468.

Digital Library

[15]

Baruch Awerbuch and Robert Kleinberg. 2008. Online linear optimization and adaptive routing. J. Comput. Syst. Sci. 74, 1 (Feb. 2008), 97--114.

Digital Library

[16]

Mohammad Gheshlaghi Azar, Alessandro Lazaric, and Emma Brunskill. 2014. Online stochastic optimization under correlated bandit feedback. In Proceedings of the 31st International Conference on Machine Learning (ICML’14). 1557--1565.

Digital Library

[17]

Moshe Babaioff, Shaddin Dughmi, Robert D. Kleinberg, and Aleksandrs Slivkins. 2015. Dynamic pricing with limited supply. ACM Trans. Econ. Comput. 3, 1 (2015), 4.

Digital Library

[18]

Moshe Babaioff, Robert Kleinberg, and Aleksandrs Slivkins. 2015. Truthful mechanisms with implicit payment computation. J. ACM 62, 2 (2015), 10.

Digital Library

[19]

Moshe Babaioff, Yogeshwer Sharma, and Aleksandrs Slivkins. 2014. Characterizing truthful multi-armed bandit mechanisms. SIAM J. Comput. 43, 1 (2014), 194--230.

Digital Library

[20]

Ashwinkumar Badanidiyuru, Robert Kleinberg, and Aleksandrs Slivkins. 2018. Bandits with knapsacks. J. ACM 65, 3 (2018).

Digital Library

[21]

Dirk Bergemann and Juuso Välimäki. 2006. Bandit problems. In The New Palgrave Dictionary of Economics, 2nd ed., Steven Durlauf and Larry Blume (Eds.). Macmillan Press.

[22]

Donald Berry and Bert Fristedt. 1985. Bandit Problems: Sequential Allocation of Experiments. Chapman 8 Hall.

[23]

Donald A. Berry, Robert W. Chen, Alan Zame, David C. Heath, and Larry A. Shepp. 1997. Bandit problems with infinitely many arms. Ann. Stat. 25, 5 (1997), 2103--2116.

[24]

Omar Besbes and Assaf Zeevi. 2009. Dynamic pricing without knowing the demand function: Risk bounds and near-optimal algorithms. Operat. Res. 57, 6 (2009), 1407--1420.

Digital Library

[25]

Avrim Blum. 1997. Empirical support for winnow and weighted-majority-based algorithms: Results on a calendar scheduling domain. Mach. Learn. 26 (1997), 5--23.

Digital Library

[26]

Avrim Blum, Vijay Kumar, Atri Rudra, and Felix Wu. 2003. Online learning in online auctions. In Proceedings of the 14th ACM-SIAM Symposium on Discrete Algorithms (SODA’03). 202--204.

Digital Library

[27]

Sébastien Bubeck and Nicolo Cesa-Bianchi. 2012. Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Found. Trends Mach. Learn. 5, 1 (2012).

[28]

Sébastien Bubeck and Rémi Munos. 2010. Open loop optimistic planning. In Proceedings of the 23rd Conference on Learning Theory (COLT’10). 477--489.

[29]

Sébastien Bubeck, Rémi Munos, Gilles Stoltz, and Csaba Szepesvári. 2008. Online optimization in X-armed bandits. In Proceedings of the 21st Advances in Neural Information Processing Systems (NIPS’08). 201--208.

Digital Library

[30]

Sébastien Bubeck, Rémi Munos, Gilles Stoltz, and Csaba Szepesvari. 2011. Online optimization in X-armed bandits. J. Mach. Learn. Res. 12 (2011), 1587--1627.

Digital Library

[31]

Sébastien Bubeck, Gilles Stoltz, and Jia Yuan Yu. 2011. Lipschitz bandits without the Lipschitz constant. In Proceedings of the 22nd International Conference on Algorithmic Learning Theory (ALT’11). 144--158.

Digital Library

[32]

Adam Bull. 2015. Adaptive-treed bandits. Bernoulli J. Stat. 21, 4 (2015), 2289--2307.

[33]

G. Cantor. 1883. Über unendliche, lineare Punktmannichfaltigkeiten, 4. Math. Ann. 21 (1883), 51--58.

[34]

Nicolò Cesa-Bianchi, Yoav Freund, David Haussler, David P. Helmbold, Robert E. Schapire, and Manfred K. Warmuth. 1997. How to use expert advice. J. ACM 44, 3 (1997), 427--485.

Digital Library

[35]

Nicolò Cesa-Bianchi and Gábor Lugosi. 2006. Prediction, Learning, and Games. Cambridge University Press.

[36]

Hubert T.-H. Chan, Anupam Gupta, Bruce M. Maggs, and Shuheng Zhou. 2005. On hierarchical routing in bounded-growth metrics. In Proceedings of the 16th ACM-SIAM Symposium on Discrete Algorithms (SODA’05). 762--771.

Digital Library

[37]

Richard Cole and Lee-Ad Gottlieb. 2006. Searching dynamic point sets in spaces with bounded doubling dimension. In Proceedings of the 38th ACM Symposium on Theory of Computing (STOC’06). 574--583.

Digital Library

[38]

Eric Cope. 2009. Regret and convergence bounds for immediate-reward reinforcement learning with continuous action spaces. IEEE Trans. Auto. Control 54, 6 (2009), 1243--1253.

[39]

Thomas M. Cover and Joy A. Thomas. 1991. Elements of Information Theory. John Wiley 8 Sons, New York.

Digital Library

[40]

Varsha Dani, Thomas P. Hayes, and Sham Kakade. 2007. The price of bandit information for online optimization. In Proceedings of the 20th Advances in Neural Information Processing Systems (NIPS’07).

Digital Library

[41]

Varsha Dani, Thomas P. Hayes, and Sham Kakade. 2008. Stochastic linear optimization under bandit feedback. In Proceedings of the 21st Conference on Learning Theory (COLT’08). 355--366.

[42]

Thomas Desautels, Andreas Krause, and Joel Burdick. 2012. Parallelizing exploration-exploitation tradeoffs with Gaussian process bandit optimization. In Proceedings of the 29th International Conference on Machine Learning (ICML’12).

Digital Library

[43]

Nikhil Devanur and Sham M. Kakade. 2009. The price of truthfulness for pay-per-click auctions. In Proceedings of the 10th ACM Conference on Electronic Commerce (EC’09). 99--106.

Digital Library

[44]

Abraham Flaxman, Adam Kalai, and H. Brendan McMahan. 2005. Online convex optimization in the bandit setting: Gradient descent without a gradient. In Proceedings of the 16th ACM-SIAM Symposium on Discrete Algorithms (SODA’05). 385--394.

Digital Library

[45]

Christodoulos A. Floudas. 1999. Deterministic Global Optimization: Theory, Algorithms and Applications. Kluwer Academic Publishers.

[46]

Yoav Freund, Robert E. Schapire, Yoram Singer, and Manfred K. Warmuth. 1997. Using and combining predictors that specialize. In Proceedings of the 29th ACM Symposium on Theory of Computing (STOC’97). 334--343.

Digital Library

[47]

Aurélien Garivier and Olivier Cappé. 2011. The KL-UCB algorithm for bounded stochastic bandits and beyond. In Proceedings of the 24th Conference on Learning Theory (COLT’11).

[48]

E. N. Gilbert. 1952. A comparison of signalling alphabets. Bell Syst. Tech. J. 31 (May 1952), 504--522.

[49]

John Gittins, Kevin Glazebrook, and Richard Weber. 2011. Multi-Armed Bandit Allocation Indices. John Wiley 8 Sons.

[50]

Anupam Gupta, Mike Dinitz, and Kanat Tangwongsan. 2007. Private communication.

[51]

Anupam Gupta, Robert Krauthgamer, and James R. Lee. 2003. Bounded geometries, fractals, and low--distortion embeddings. In Proceedings of the 44th IEEE Symposium on Foundations of Computer Science (FOCS’03). 534--543.

Digital Library

[52]

Elad Hazan and Satyen Kale. 2011. Better algorithms for benign bandits. J. Mach. Learn. Res. 12 (2011), 1287--1311.

Digital Library

[53]

Elad Hazan and Nimrod Megiddo. 2007. Online learning with prior information. In Proceedings of the 20th Conference on Learning Theory (COLT’07). 499--513.

Digital Library

[54]

J. Heinonen. 2001. Lectures on Analysis on Metric Spaces. Springer-Verlag, New York.

[55]

Kirsten Hildrum, John Kubiatowicz, and Satish Rao. 2004. Object location in realistic networks. In Proceedings of the 16th ACM Symposium on Parallel Algorithms and Architectures (SPAA’04). 25--35.

Digital Library

[56]

Chien-Ju Ho, Aleksandrs Slivkins, and Jennifer Wortman Vaughan. 2016. Adaptive contract design for crowdsourcing markets: Bandit algorithms for repeated principal-agent problems. J. Artific. Intell. Res. 55 (2016), 317--359.

Digital Library

[57]

Junya Honda and Akimichi Takemura. 2010. An asymptotically optimal bandit algorithm for bounded support models. In Proceedings of the 23rd Conference on Learning Theory (COLT’10).

[58]

D. R. Karger and M. Ruhl. 2002. Finding nearest neighbors in growth-restricted metrics. In Proceedings of the 34th ACM Symposium on Theory of Computing (STOC’02). 63--66.

Digital Library

[59]

Jon Kleinberg, Aleksandrs Slivkins, and Tom Wexler. 2009. Triangulation and embedding using small sets of beacons. J. ACM 56, 6 (Sept. 2009).

Digital Library

[60]

Robert Kleinberg. 2004. Nearly tight bounds for the continuum-armed bandit problem. In Proceedings of the 18th Advances in Neural Information Processing Systems (NIPS’04).

Digital Library

[61]

Robert Kleinberg. 2005. Online Decision Problems with Large Strategy Sets. Ph.D. Dissertation. MIT.

Digital Library

[62]

Robert Kleinberg, Alexandru Niculescu-Mizil, and Yogeshwer Sharma. 2008. Regret bounds for sleeping experts and bandits. In Proceedings of the 21st Conference on Learning Theory (COLT’08). 425--436.

[63]

Robert Kleinberg and Aleksandrs Slivkins. 2010. Sharp dichotomies for regret minimization in metric spaces. In Proceedings of the 21st ACM-SIAM Symposium on Discrete Algorithms (SODA’10).

Digital Library

[64]

Robert Kleinberg, Aleksandrs Slivkins, and Eli Upfal. 2008. Multi-armed bandits in metric spaces. In Proceedings of the 40th ACM Symposium on Theory of Computing (STOC’08). 681--690.

Digital Library

[65]

Robert Kleinberg, Aleksandrs Slivkins, and Eli Upfal. 2008. Multi-Armed Bandits in Metric Spaces. Technical report. Retrieved from http://arxiv.org/abs/0809.4882.

[66]

Robert D. Kleinberg and Frank T. Leighton. 2003. The value of knowing a demand curve: Bounds on regret for online posted-price auctions. In Proceedings of the IEEE Symposium on Foundations of Computer Science (FOCS’03).

Digital Library

[67]

Levente Kocsis and Csaba Szepesvari. 2006. Bandit-based Monte-Carlo planning. In Proceedings of the 17th European Conference on Machine Learning (ECML’06). 282--293.

Digital Library

[68]

Andreas Krause and Cheng Soon Ong. 2011. Contextual Gaussian process bandit optimization. In Proceedings of the 25th Advances in Neural Information Processing Systems (NIPS’11). 2447--2455.

Digital Library

[69]

Tze Leung Lai and Herbert Robbins. 1985. Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6 (1985), 4--22.

Digital Library

[70]

Tyler Lu, Dávid Pál, and Martin Pál. 2010. Showing relevant ads via Lipschitz context multi-armed bandits. In Proceedings of the 14th International Conference on Artificial Intelligence and Statistics (AISTATS’10).

[71]

Stefan Magureanu, Richard Combes, and Alexandre Proutiere. 2014. Lipschitz bandits: Regret lower bound and optimal algorithms. In Proceedings of the 27th Conference on Learning Theory (COLT’14). 975--999.

[72]

Odalric-Ambrym Maillard and Rémi Munos. 2010. Online learning in adversarial Lipschitz environments. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD’10). 305--320.

Digital Library

[73]

Odalric-Ambrym Maillard and Rémi Munos. 2011. Adaptive bandits: Towards the best history-dependent strategy. In Proceedings of the 24th Conference on Learning Theory (COLT’11).

[74]

S. Mazurkiewicz and W. Sierpinski. 1920. Contribution à la topologie des ensembles dénombrables. Fund. Math. 1 (1920), 17--27.

[75]

Manor Mendel and Sariel Har-Peled. 2005. Fast construction of nets in low dimensional metrics, and their applications. In Proceedings of the 21st ACM Symposium on Computational Geometry (SoCG’05). 150--158.

Digital Library

[76]

Stanislav Minsker. 2013. Estimation of extreme values and associated level sets of a regression function via selective sampling. In Proceedings of the 26th Conference on Learning Theory (COLT’13). 105--121.

[77]

Michael Mitzenmacher and Eli Upfal. 2005. Probability and Computing: Randomized Algorithms and Probabilistic Analysis. Cambridge University Press.

Digital Library

[78]

Rémi Munos. 2011. Optimistic optimization of a deterministic function without the knowledge of its smoothness. In Proceedings of the 25th Conference on Advances in Neural Information Processing Systems (NIPS’11). 783--791.

Digital Library

[79]

Rémi Munos. 2014. From bandits to Monte-Carlo tree search: The optimistic principle applied to optimization and planning. Found. Trends Mach. Learn. 7, 1 (2014), 1--129.

Digital Library

[80]

Rémi Munos and Pierre-Arnaud Coquelin. 2007. Bandit algorithms for tree search. In Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence (UAI’07).

[81]

Sandeep Pandey, Deepak Agarwal, Deepayan Chakrabarti, and Vanja Josifovski. 2007. Bandits for taxonomies: A model-based approach. In Proceedings of the SIAM International Conference on Data Mining (SDM’07).

[82]

Sandeep Pandey, Deepayan Chakrabarti, and Deepak Agarwal. 2007. Multi-armed bandit problems with dependent arms. In Proceedings of the 24th International Conference on Machine Learning (ICML’07).

Digital Library

[83]

Filip Radlinski, Robert Kleinberg, and Thorsten Joachims. 2008. Learning diverse rankings with multi-armed bandits. In Proceedings of the 25th International Conference on Machine Learning (ICML’08). 784--791.

Digital Library

[84]

Herbert Robbins. 1952. Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. 58 (1952), 527--535.

[85]

Yossi Rubner, Carlo Tomasi, and Leonidas J. Guibas. 2000. A metric for distributions with applications to image databases. Int. J. Comput. Vision 40, 2 (2000), 99--121.

Digital Library

[86]

Manfred Schroeder. 1991. Fractal, Chaos and Power Laws: Minutes from an Infinite Paradise. W. H. Freeman and Co.

[87]

Shai Shalev-Shwartz and Shai Ben-David. 2014. Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press.

Digital Library

[88]

Aleksandrs Slivkins. 2007. Distance estimation and object location via rings of neighbors. Distributed Computing 19, 4 (Mar. 2007), 313--333.

Digital Library

[89]

Aleksandrs Slivkins. 2007. Towards fast decentralized construction of locality-aware overlay networks. In Proceedings of the 26th Annual ACM Symposium on Principles of Distributed Computing (PODC’07). 89--98.

Digital Library

[90]

Aleksandrs Slivkins. 2011. Multi-armed bandits on implicit metric spaces. In Proceedings of the 25th Advances in Neural Information Processing Systems (NIPS’11).

Digital Library

[91]

Aleksandrs Slivkins. 2014. Contextual bandits with similarity information. J. Mach. Learn. Res. 15, 1 (2014), 2533--2568.

Digital Library

[92]

Aleksandrs Slivkins, Filip Radlinski, and Sreenivas Gollapudi. 2013. Ranked bandits in metric spaces: Learning optimally diverse rankings over large document collections. J. Mach. Learn. Res. 14 (Feb. 2013), 399--436.

Digital Library

[93]

Aleksandrs Slivkins and Eli Upfal. 2008. Adapting to a changing environment: the Brownian restless bandits. In Proceedings of the 21st Conference on Learning Theory (COLT’08). 343--354.

[94]

Niranjan Srinivas, Andreas Krause, Sham Kakade, and Matthias Seeger. 2010. Gaussian process optimization in the bandit setting: No regret and experimental design. In Proceedings of the 27th International Conference on Machine Learning (ICML’10). 1015--1022.

Digital Library

[95]

Michel Talagrand. 2005. The Generic Chaining: Upper and Lower Bounds of Stochastic Processes. Springer.

[96]

Kunal Talwar. 2004. Bypassing the embedding: Algorithms for low-dimensional metrics. In Proceedings of the 36th ACM Symposium on Theory of Computing (STOC’04). 281--290.

Digital Library

[97]

William R. Thompson. 1933. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25, 3--4 (1933), 285--294.

[98]

Michal Valko, Alexandra Carpentier, and Rémi Munos. 2013. Stochastic simultaneous optimistic optimization. In Proceedings of the 30th International Conference on Machine Learning (ICML’13). 19--27.

Digital Library

[99]

R. R. Varshamov. 1957. Estimate of the number of signals in error correcting codes. Doklady Akadamii Nauk 177 (1957), 739--741.

[100]

V. Vovk. 1998. A game of prediction with expert advice. J. Comput. Syst. Sci. 56, 2 (1998), 153--173.

Digital Library

[101]

Yizao Wang, Jean-Yves Audibert, and Rémi Munos. 2008. Algorithms for infinitely many-armed bandits. In Advances in Neural Information Processing Systems. MIT Press, 1729--1736.

Digital Library

[102]

Zizhuo Wang, Shiming Deng, and Yinyu Ye. 2014. Close the gaps: A learning-while-doing algorithm for single-product revenue management problems. Operat. Res. 62, 2 (2014), 318--331.

Digital Library

Cited By

Cesa-Bianchi NCesari TColomboni RFusco FLeonardi SMohar BShinkar IO'Donnell R(2024)The Role of Transparency in Repeated First-Price Auctions with Unknown ValuationsProceedings of the 56th Annual ACM Symposium on Theory of Computing10.1145/3618260.3649658(225-236)Online publication date: 10-Jun-2024
https://dl.acm.org/doi/10.1145/3618260.3649658
de Montbrun ÉGerchinovitz S(2024)Certified Multifidelity Zeroth-Order OptimizationSIAM/ASA Journal on Uncertainty Quantification10.1137/23M159108612:4(1135-1164)Online publication date: 9-Oct-2024
https://doi.org/10.1137/23M1591086
Dai XWang ZYe JLui J(2024)Quantifying the Merits of Network-Assist Online Learning in Optimizing Network Protocols2024 IEEE/ACM 32nd International Symposium on Quality of Service (IWQoS)10.1109/IWQoS61813.2024.10682895(1-10)Online publication date: 19-Jun-2024
https://doi.org/10.1109/IWQoS61813.2024.10682895
Show More Cited By

Index Terms

Bandits and Experts in Metric Spaces
1. Theory of computation
  1. Design and analysis of algorithms
    1. Online algorithms
      1. Online learning algorithms
  2. Theory and algorithms for application domains
    1. Machine learning theory
      1. Online learning theory
      2. Regret bounds

Recommendations

Adversarial Bandits with Knapsacks
We consider Bandits with Knapsacks (henceforth, BwK), a general model for multi-armed bandits under supply/budget constraints. In particular, a bandit algorithm needs to solve a well-known knapsack problem: find an optimal packing of items into a limited-...
Multi-armed bandits in metric spaces
STOC '08: Proceedings of the fortieth annual ACM symposium on Theory of computing

In a multi-armed bandit problem, an online algorithm chooses from a set of strategies in a sequence of $n$ trials so as to maximize the total payoff of the chosen strategies. While the performance of bandit algorithms with a small finite strategy set is ...
Ranked bandits in metric spaces: learning diverse rankings over large document collections

Most learning to rank research has assumed that the utility of different documents is independent, which results in learned ranking functions that return redundant results. The few approaches that avoid this have rather unsatisfyingly lacked theoretical ...

Comments

Information & Contributors

Information

Published In

cover image Journal of the ACM

Journal of the ACM Volume 66, Issue 4

Networking, Computational Complexity, Design and Analysis of Algorithms, Real Computation, Algorithms, Online Algorithms and Computer-aided Verification

August 2019

299 pages

ISSN:0004-5411

EISSN:1557-735X

DOI:10.1145/3338848

Editor:
Éva Tardos
Cornell University

Issue’s Table of Contents

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 May 2019

Accepted: 01 November 2018

Revised: 01 May 2018

Received: 01 November 2015

Published in JACM Volume 66, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

NSF
ONR

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

30
Total Citations
View Citations
1,764
Total Downloads

Downloads (Last 12 months)428
Downloads (Last 6 weeks)57

Reflects downloads up to 10 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Cesa-Bianchi NCesari TColomboni RFusco FLeonardi SMohar BShinkar IO'Donnell R(2024)The Role of Transparency in Repeated First-Price Auctions with Unknown ValuationsProceedings of the 56th Annual ACM Symposium on Theory of Computing10.1145/3618260.3649658(225-236)Online publication date: 10-Jun-2024
https://dl.acm.org/doi/10.1145/3618260.3649658
de Montbrun ÉGerchinovitz S(2024)Certified Multifidelity Zeroth-Order OptimizationSIAM/ASA Journal on Uncertainty Quantification10.1137/23M159108612:4(1135-1164)Online publication date: 9-Oct-2024
https://doi.org/10.1137/23M1591086
Dai XWang ZYe JLui J(2024)Quantifying the Merits of Network-Assist Online Learning in Optimizing Network Protocols2024 IEEE/ACM 32nd International Symposium on Quality of Service (IWQoS)10.1109/IWQoS61813.2024.10682895(1-10)Online publication date: 19-Jun-2024
https://doi.org/10.1109/IWQoS61813.2024.10682895
Aladwani TAnagnostopoulos CKolomvatsos K(2024)Node and relevant data selection in distributed predictive analytics: A query-centric approachJournal of Network and Computer Applications10.1016/j.jnca.2024.104029232(104029)Online publication date: Dec-2024
https://doi.org/10.1016/j.jnca.2024.104029
Puthiya Parambath SAnagnostopoulos CMurray-Smith R(2024)Sequential query prediction based on multi-armed bandits with ensemble of transformer experts and immediate feedbackData Mining and Knowledge Discovery10.1007/s10618-024-01057-438:6(3758-3782)Online publication date: 2-Aug-2024
https://doi.org/10.1007/s10618-024-01057-4
Cai JChen RWainwright MZhao L(2023)Doubly High-Dimensional Contextual Bandits: An Interpretable Model for Joint Assortment-PricingSSRN Electronic Journal10.2139/ssrn.4568525Online publication date: 2023
https://doi.org/10.2139/ssrn.4568525
Jones MNguyen HNguyen TWilliams BChen YNeville J(2023)An efficient algorithm for fair multi-agent multi-armed bandit with low regretProceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v37i7.25985(8159-8167)Online publication date: 7-Feb-2023
https://dl.acm.org/doi/10.1609/aaai.v37i7.25985
Huang CYu HHuang JBerry R(2023)An Online Inference-Aided Incentive Framework for Information Elicitation Without VerificationIEEE Journal on Selected Areas in Communications10.1109/JSAC.2023.324270641:4(1167-1185)Online publication date: 1-Apr-2023
https://dl.acm.org/doi/10.1109/JSAC.2023.3242706
Wang ZMeng QZhang SLuo H(2023)The Power of Age-based Reward in Fresh Information AcquisitionIEEE INFOCOM 2023 - IEEE Conference on Computer Communications10.1109/INFOCOM53939.2023.10229008(1-10)Online publication date: 17-May-2023
https://doi.org/10.1109/INFOCOM53939.2023.10229008
Xia HHuang NFeng XZhang RLiu C(2023)Starlet: Network defense resource allocation with multi-armed bandits for cloud-edge crowd sensing in IoTDigital Communications and Networks10.1016/j.dcan.2023.03.009Online publication date: Mar-2023
https://doi.org/10.1016/j.dcan.2023.03.009
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents