Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Public Access

Bandits and Experts in Metric Spaces

Published: 31 May 2019 Publication History

Abstract

In a multi-armed bandit problem, an online algorithm chooses from a set of strategies in a sequence of trials to maximize the total payoff of the chosen strategies. While the performance of bandit algorithms with a small finite strategy set is well understood, bandit problems with large strategy sets are still a topic of active investigation, motivated by practical applications, such as online auctions and web advertisement. The goal of such research is to identify broad and natural classes of strategy sets and payoff functions that enable the design of efficient solutions.
In this work, we study a general setting for the multi-armed bandit problem, in which the strategies form a metric space, and the payoff function satisfies a Lipschitz condition with respect to the metric. We refer to this problem as the Lipschitz MAB problem. We present a solution for the multi-armed bandit problem in this setting. That is, for every metric space, we define an isometry invariant that bounds from below the performance of Lipschitz MAB algorithms for this metric space, and we present an algorithm that comes arbitrarily close to meeting this bound. Furthermore, our technique gives even better results for benign payoff functions. We also address the full-feedback (“best expert”) version of the problem, where after every round the payoffs from all arms are revealed.

References

[1]
Yasin Abbasi-Yadkori, Dávid Pál, and Csaba Szepesvári. 2011. Improved algorithms for linear stochastic bandits. In Proceedings of the 25th Advances in Neural Information Processing Systems (NIPS’11). 2312--2320.
[2]
Jacob Abernethy, Elad Hazan, and Alexander Rakhlin. 2008. Competing in the Dark: An Efficient Algorithm for Bandit Linear Optimization. In Proceedings of the 21st Conference on Learning Theory (COLT’08). 263--274.
[3]
Ittai Abraham and Dahlia Malkhi. 2005. Name independent routing for growth bounded networks. In Proceedings of the 17th ACM Symposium on Parallel Algorithms and Architectures (SPAA). 49--55.
[4]
Rajeev Agrawal. 1995. The continuum-armed bandit problem. SIAM J. Control Optimiz. 33, 6 (1995), 1926--1951.
[5]
Shipra Agrawal, Vashist Avadhanula, Vineet Goyal, and Assaf Zeevi. 2016. A near-optimal exploration-exploitation approach for assortment selection. In Proceedings of the 17th ACM Conference on Economics and Computation (ACM EC’16). 599--600.
[6]
Shipra Agrawal and Nikhil R. Devanur. 2014. Bandits with concave rewards and convex knapsacks. In Proceedings of the 15th ACM Conference on Economics and Computation (ACM EC’14).
[7]
Kareem Amin, Michael Kearns, and Umar Syed. 2011. Bandits, query learning, and the haystack dimension. In Proceedings of the 24th Conference on Learning Theory (COLT’11).
[8]
J.-Y. Audibert and S. Bubeck. 2010. Regret bounds and minimax policies under partial monitoring. J. Mach. Learn. Res. 11 (2010), 2785--2836.
[9]
J.-Y. Audibert, R. Munos, and Cs. Szepesvári. 2009. Exploration-exploitation trade-off using variance estimates in multi-armed bandits. Theoret. Comput. Sci. 410 (2009), 1876--1902.
[10]
Peter Auer. 2002. Using confidence bounds for exploitation-exploration trade-offs. J. Mach. Learn. Res. 3 (2002), 397--422.
[11]
Peter Auer, Nicolò Cesa-Bianchi, and Paul Fischer. 2002. Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47, 2--3 (2002), 235--256.
[12]
Peter Auer, Nicolò Cesa-Bianchi, Yoav Freund, and Robert E. Schapire. 2002. The nonstochastic multiarmed bandit problem. SIAM J. Comput. 32, 1 (2002), 48--77.
[13]
Peter Auer and Ronald Ortner. 2010. UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem. Periodica Mathematica Hungarica 61 (2010), 55--65.
[14]
Peter Auer, Ronald Ortner, and Csaba Szepesvári. 2007. Improved rates for the stochastic continuum-armed bandit problem. In Proceedings of the 20th Conference on Learning Theory (COLT’07). 454--468.
[15]
Baruch Awerbuch and Robert Kleinberg. 2008. Online linear optimization and adaptive routing. J. Comput. Syst. Sci. 74, 1 (Feb. 2008), 97--114.
[16]
Mohammad Gheshlaghi Azar, Alessandro Lazaric, and Emma Brunskill. 2014. Online stochastic optimization under correlated bandit feedback. In Proceedings of the 31st International Conference on Machine Learning (ICML’14). 1557--1565.
[17]
Moshe Babaioff, Shaddin Dughmi, Robert D. Kleinberg, and Aleksandrs Slivkins. 2015. Dynamic pricing with limited supply. ACM Trans. Econ. Comput. 3, 1 (2015), 4.
[18]
Moshe Babaioff, Robert Kleinberg, and Aleksandrs Slivkins. 2015. Truthful mechanisms with implicit payment computation. J. ACM 62, 2 (2015), 10.
[19]
Moshe Babaioff, Yogeshwer Sharma, and Aleksandrs Slivkins. 2014. Characterizing truthful multi-armed bandit mechanisms. SIAM J. Comput. 43, 1 (2014), 194--230.
[20]
Ashwinkumar Badanidiyuru, Robert Kleinberg, and Aleksandrs Slivkins. 2018. Bandits with knapsacks. J. ACM 65, 3 (2018).
[21]
Dirk Bergemann and Juuso Välimäki. 2006. Bandit problems. In The New Palgrave Dictionary of Economics, 2nd ed., Steven Durlauf and Larry Blume (Eds.). Macmillan Press.
[22]
Donald Berry and Bert Fristedt. 1985. Bandit Problems: Sequential Allocation of Experiments. Chapman 8 Hall.
[23]
Donald A. Berry, Robert W. Chen, Alan Zame, David C. Heath, and Larry A. Shepp. 1997. Bandit problems with infinitely many arms. Ann. Stat. 25, 5 (1997), 2103--2116.
[24]
Omar Besbes and Assaf Zeevi. 2009. Dynamic pricing without knowing the demand function: Risk bounds and near-optimal algorithms. Operat. Res. 57, 6 (2009), 1407--1420.
[25]
Avrim Blum. 1997. Empirical support for winnow and weighted-majority-based algorithms: Results on a calendar scheduling domain. Mach. Learn. 26 (1997), 5--23.
[26]
Avrim Blum, Vijay Kumar, Atri Rudra, and Felix Wu. 2003. Online learning in online auctions. In Proceedings of the 14th ACM-SIAM Symposium on Discrete Algorithms (SODA’03). 202--204.
[27]
Sébastien Bubeck and Nicolo Cesa-Bianchi. 2012. Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Found. Trends Mach. Learn. 5, 1 (2012).
[28]
Sébastien Bubeck and Rémi Munos. 2010. Open loop optimistic planning. In Proceedings of the 23rd Conference on Learning Theory (COLT’10). 477--489.
[29]
Sébastien Bubeck, Rémi Munos, Gilles Stoltz, and Csaba Szepesvári. 2008. Online optimization in X-armed bandits. In Proceedings of the 21st Advances in Neural Information Processing Systems (NIPS’08). 201--208.
[30]
Sébastien Bubeck, Rémi Munos, Gilles Stoltz, and Csaba Szepesvari. 2011. Online optimization in X-armed bandits. J. Mach. Learn. Res. 12 (2011), 1587--1627.
[31]
Sébastien Bubeck, Gilles Stoltz, and Jia Yuan Yu. 2011. Lipschitz bandits without the Lipschitz constant. In Proceedings of the 22nd International Conference on Algorithmic Learning Theory (ALT’11). 144--158.
[32]
Adam Bull. 2015. Adaptive-treed bandits. Bernoulli J. Stat. 21, 4 (2015), 2289--2307.
[33]
G. Cantor. 1883. Über unendliche, lineare Punktmannichfaltigkeiten, 4. Math. Ann. 21 (1883), 51--58.
[34]
Nicolò Cesa-Bianchi, Yoav Freund, David Haussler, David P. Helmbold, Robert E. Schapire, and Manfred K. Warmuth. 1997. How to use expert advice. J. ACM 44, 3 (1997), 427--485.
[35]
Nicolò Cesa-Bianchi and Gábor Lugosi. 2006. Prediction, Learning, and Games. Cambridge University Press.
[36]
Hubert T.-H. Chan, Anupam Gupta, Bruce M. Maggs, and Shuheng Zhou. 2005. On hierarchical routing in bounded-growth metrics. In Proceedings of the 16th ACM-SIAM Symposium on Discrete Algorithms (SODA’05). 762--771.
[37]
Richard Cole and Lee-Ad Gottlieb. 2006. Searching dynamic point sets in spaces with bounded doubling dimension. In Proceedings of the 38th ACM Symposium on Theory of Computing (STOC’06). 574--583.
[38]
Eric Cope. 2009. Regret and convergence bounds for immediate-reward reinforcement learning with continuous action spaces. IEEE Trans. Auto. Control 54, 6 (2009), 1243--1253.
[39]
Thomas M. Cover and Joy A. Thomas. 1991. Elements of Information Theory. John Wiley 8 Sons, New York.
[40]
Varsha Dani, Thomas P. Hayes, and Sham Kakade. 2007. The price of bandit information for online optimization. In Proceedings of the 20th Advances in Neural Information Processing Systems (NIPS’07).
[41]
Varsha Dani, Thomas P. Hayes, and Sham Kakade. 2008. Stochastic linear optimization under bandit feedback. In Proceedings of the 21st Conference on Learning Theory (COLT’08). 355--366.
[42]
Thomas Desautels, Andreas Krause, and Joel Burdick. 2012. Parallelizing exploration-exploitation tradeoffs with Gaussian process bandit optimization. In Proceedings of the 29th International Conference on Machine Learning (ICML’12).
[43]
Nikhil Devanur and Sham M. Kakade. 2009. The price of truthfulness for pay-per-click auctions. In Proceedings of the 10th ACM Conference on Electronic Commerce (EC’09). 99--106.
[44]
Abraham Flaxman, Adam Kalai, and H. Brendan McMahan. 2005. Online convex optimization in the bandit setting: Gradient descent without a gradient. In Proceedings of the 16th ACM-SIAM Symposium on Discrete Algorithms (SODA’05). 385--394.
[45]
Christodoulos A. Floudas. 1999. Deterministic Global Optimization: Theory, Algorithms and Applications. Kluwer Academic Publishers.
[46]
Yoav Freund, Robert E. Schapire, Yoram Singer, and Manfred K. Warmuth. 1997. Using and combining predictors that specialize. In Proceedings of the 29th ACM Symposium on Theory of Computing (STOC’97). 334--343.
[47]
Aurélien Garivier and Olivier Cappé. 2011. The KL-UCB algorithm for bounded stochastic bandits and beyond. In Proceedings of the 24th Conference on Learning Theory (COLT’11).
[48]
E. N. Gilbert. 1952. A comparison of signalling alphabets. Bell Syst. Tech. J. 31 (May 1952), 504--522.
[49]
John Gittins, Kevin Glazebrook, and Richard Weber. 2011. Multi-Armed Bandit Allocation Indices. John Wiley 8 Sons.
[50]
Anupam Gupta, Mike Dinitz, and Kanat Tangwongsan. 2007. Private communication.
[51]
Anupam Gupta, Robert Krauthgamer, and James R. Lee. 2003. Bounded geometries, fractals, and low--distortion embeddings. In Proceedings of the 44th IEEE Symposium on Foundations of Computer Science (FOCS’03). 534--543.
[52]
Elad Hazan and Satyen Kale. 2011. Better algorithms for benign bandits. J. Mach. Learn. Res. 12 (2011), 1287--1311.
[53]
Elad Hazan and Nimrod Megiddo. 2007. Online learning with prior information. In Proceedings of the 20th Conference on Learning Theory (COLT’07). 499--513.
[54]
J. Heinonen. 2001. Lectures on Analysis on Metric Spaces. Springer-Verlag, New York.
[55]
Kirsten Hildrum, John Kubiatowicz, and Satish Rao. 2004. Object location in realistic networks. In Proceedings of the 16th ACM Symposium on Parallel Algorithms and Architectures (SPAA’04). 25--35.
[56]
Chien-Ju Ho, Aleksandrs Slivkins, and Jennifer Wortman Vaughan. 2016. Adaptive contract design for crowdsourcing markets: Bandit algorithms for repeated principal-agent problems. J. Artific. Intell. Res. 55 (2016), 317--359.
[57]
Junya Honda and Akimichi Takemura. 2010. An asymptotically optimal bandit algorithm for bounded support models. In Proceedings of the 23rd Conference on Learning Theory (COLT’10).
[58]
D. R. Karger and M. Ruhl. 2002. Finding nearest neighbors in growth-restricted metrics. In Proceedings of the 34th ACM Symposium on Theory of Computing (STOC’02). 63--66.
[59]
Jon Kleinberg, Aleksandrs Slivkins, and Tom Wexler. 2009. Triangulation and embedding using small sets of beacons. J. ACM 56, 6 (Sept. 2009).
[60]
Robert Kleinberg. 2004. Nearly tight bounds for the continuum-armed bandit problem. In Proceedings of the 18th Advances in Neural Information Processing Systems (NIPS’04).
[61]
Robert Kleinberg. 2005. Online Decision Problems with Large Strategy Sets. Ph.D. Dissertation. MIT.
[62]
Robert Kleinberg, Alexandru Niculescu-Mizil, and Yogeshwer Sharma. 2008. Regret bounds for sleeping experts and bandits. In Proceedings of the 21st Conference on Learning Theory (COLT’08). 425--436.
[63]
Robert Kleinberg and Aleksandrs Slivkins. 2010. Sharp dichotomies for regret minimization in metric spaces. In Proceedings of the 21st ACM-SIAM Symposium on Discrete Algorithms (SODA’10).
[64]
Robert Kleinberg, Aleksandrs Slivkins, and Eli Upfal. 2008. Multi-armed bandits in metric spaces. In Proceedings of the 40th ACM Symposium on Theory of Computing (STOC’08). 681--690.
[65]
Robert Kleinberg, Aleksandrs Slivkins, and Eli Upfal. 2008. Multi-Armed Bandits in Metric Spaces. Technical report. Retrieved from http://arxiv.org/abs/0809.4882.
[66]
Robert D. Kleinberg and Frank T. Leighton. 2003. The value of knowing a demand curve: Bounds on regret for online posted-price auctions. In Proceedings of the IEEE Symposium on Foundations of Computer Science (FOCS’03).
[67]
Levente Kocsis and Csaba Szepesvari. 2006. Bandit-based Monte-Carlo planning. In Proceedings of the 17th European Conference on Machine Learning (ECML’06). 282--293.
[68]
Andreas Krause and Cheng Soon Ong. 2011. Contextual Gaussian process bandit optimization. In Proceedings of the 25th Advances in Neural Information Processing Systems (NIPS’11). 2447--2455.
[69]
Tze Leung Lai and Herbert Robbins. 1985. Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6 (1985), 4--22.
[70]
Tyler Lu, Dávid Pál, and Martin Pál. 2010. Showing relevant ads via Lipschitz context multi-armed bandits. In Proceedings of the 14th International Conference on Artificial Intelligence and Statistics (AISTATS’10).
[71]
Stefan Magureanu, Richard Combes, and Alexandre Proutiere. 2014. Lipschitz bandits: Regret lower bound and optimal algorithms. In Proceedings of the 27th Conference on Learning Theory (COLT’14). 975--999.
[72]
Odalric-Ambrym Maillard and Rémi Munos. 2010. Online learning in adversarial Lipschitz environments. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD’10). 305--320.
[73]
Odalric-Ambrym Maillard and Rémi Munos. 2011. Adaptive bandits: Towards the best history-dependent strategy. In Proceedings of the 24th Conference on Learning Theory (COLT’11).
[74]
S. Mazurkiewicz and W. Sierpinski. 1920. Contribution à la topologie des ensembles dénombrables. Fund. Math. 1 (1920), 17--27.
[75]
Manor Mendel and Sariel Har-Peled. 2005. Fast construction of nets in low dimensional metrics, and their applications. In Proceedings of the 21st ACM Symposium on Computational Geometry (SoCG’05). 150--158.
[76]
Stanislav Minsker. 2013. Estimation of extreme values and associated level sets of a regression function via selective sampling. In Proceedings of the 26th Conference on Learning Theory (COLT’13). 105--121.
[77]
Michael Mitzenmacher and Eli Upfal. 2005. Probability and Computing: Randomized Algorithms and Probabilistic Analysis. Cambridge University Press.
[78]
Rémi Munos. 2011. Optimistic optimization of a deterministic function without the knowledge of its smoothness. In Proceedings of the 25th Conference on Advances in Neural Information Processing Systems (NIPS’11). 783--791.
[79]
Rémi Munos. 2014. From bandits to Monte-Carlo tree search: The optimistic principle applied to optimization and planning. Found. Trends Mach. Learn. 7, 1 (2014), 1--129.
[80]
Rémi Munos and Pierre-Arnaud Coquelin. 2007. Bandit algorithms for tree search. In Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence (UAI’07).
[81]
Sandeep Pandey, Deepak Agarwal, Deepayan Chakrabarti, and Vanja Josifovski. 2007. Bandits for taxonomies: A model-based approach. In Proceedings of the SIAM International Conference on Data Mining (SDM’07).
[82]
Sandeep Pandey, Deepayan Chakrabarti, and Deepak Agarwal. 2007. Multi-armed bandit problems with dependent arms. In Proceedings of the 24th International Conference on Machine Learning (ICML’07).
[83]
Filip Radlinski, Robert Kleinberg, and Thorsten Joachims. 2008. Learning diverse rankings with multi-armed bandits. In Proceedings of the 25th International Conference on Machine Learning (ICML’08). 784--791.
[84]
Herbert Robbins. 1952. Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. 58 (1952), 527--535.
[85]
Yossi Rubner, Carlo Tomasi, and Leonidas J. Guibas. 2000. A metric for distributions with applications to image databases. Int. J. Comput. Vision 40, 2 (2000), 99--121.
[86]
Manfred Schroeder. 1991. Fractal, Chaos and Power Laws: Minutes from an Infinite Paradise. W. H. Freeman and Co.
[87]
Shai Shalev-Shwartz and Shai Ben-David. 2014. Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press.
[88]
Aleksandrs Slivkins. 2007. Distance estimation and object location via rings of neighbors. Distributed Computing 19, 4 (Mar. 2007), 313--333.
[89]
Aleksandrs Slivkins. 2007. Towards fast decentralized construction of locality-aware overlay networks. In Proceedings of the 26th Annual ACM Symposium on Principles of Distributed Computing (PODC’07). 89--98.
[90]
Aleksandrs Slivkins. 2011. Multi-armed bandits on implicit metric spaces. In Proceedings of the 25th Advances in Neural Information Processing Systems (NIPS’11).
[91]
Aleksandrs Slivkins. 2014. Contextual bandits with similarity information. J. Mach. Learn. Res. 15, 1 (2014), 2533--2568.
[92]
Aleksandrs Slivkins, Filip Radlinski, and Sreenivas Gollapudi. 2013. Ranked bandits in metric spaces: Learning optimally diverse rankings over large document collections. J. Mach. Learn. Res. 14 (Feb. 2013), 399--436.
[93]
Aleksandrs Slivkins and Eli Upfal. 2008. Adapting to a changing environment: the Brownian restless bandits. In Proceedings of the 21st Conference on Learning Theory (COLT’08). 343--354.
[94]
Niranjan Srinivas, Andreas Krause, Sham Kakade, and Matthias Seeger. 2010. Gaussian process optimization in the bandit setting: No regret and experimental design. In Proceedings of the 27th International Conference on Machine Learning (ICML’10). 1015--1022.
[95]
Michel Talagrand. 2005. The Generic Chaining: Upper and Lower Bounds of Stochastic Processes. Springer.
[96]
Kunal Talwar. 2004. Bypassing the embedding: Algorithms for low-dimensional metrics. In Proceedings of the 36th ACM Symposium on Theory of Computing (STOC’04). 281--290.
[97]
William R. Thompson. 1933. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25, 3--4 (1933), 285--294.
[98]
Michal Valko, Alexandra Carpentier, and Rémi Munos. 2013. Stochastic simultaneous optimistic optimization. In Proceedings of the 30th International Conference on Machine Learning (ICML’13). 19--27.
[99]
R. R. Varshamov. 1957. Estimate of the number of signals in error correcting codes. Doklady Akadamii Nauk 177 (1957), 739--741.
[100]
V. Vovk. 1998. A game of prediction with expert advice. J. Comput. Syst. Sci. 56, 2 (1998), 153--173.
[101]
Yizao Wang, Jean-Yves Audibert, and Rémi Munos. 2008. Algorithms for infinitely many-armed bandits. In Advances in Neural Information Processing Systems. MIT Press, 1729--1736.
[102]
Zizhuo Wang, Shiming Deng, and Yinyu Ye. 2014. Close the gaps: A learning-while-doing algorithm for single-product revenue management problems. Operat. Res. 62, 2 (2014), 318--331.

Cited By

View all
  • (2024)The Role of Transparency in Repeated First-Price Auctions with Unknown ValuationsProceedings of the 56th Annual ACM Symposium on Theory of Computing10.1145/3618260.3649658(225-236)Online publication date: 10-Jun-2024
  • (2024)Certified Multifidelity Zeroth-Order OptimizationSIAM/ASA Journal on Uncertainty Quantification10.1137/23M159108612:4(1135-1164)Online publication date: 9-Oct-2024
  • (2024)Quantifying the Merits of Network-Assist Online Learning in Optimizing Network Protocols2024 IEEE/ACM 32nd International Symposium on Quality of Service (IWQoS)10.1109/IWQoS61813.2024.10682895(1-10)Online publication date: 19-Jun-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Journal of the ACM
Journal of the ACM  Volume 66, Issue 4
Networking, Computational Complexity, Design and Analysis of Algorithms, Real Computation, Algorithms, Online Algorithms and Computer-aided Verification
August 2019
299 pages
ISSN:0004-5411
EISSN:1557-735X
DOI:10.1145/3338848
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 May 2019
Accepted: 01 November 2018
Revised: 01 May 2018
Received: 01 November 2015
Published in JACM Volume 66, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Lipschitz-continuity
  2. Multi-armed bandits
  3. covering dimension
  4. metric spaces
  5. online learning
  6. regret

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)428
  • Downloads (Last 6 weeks)57
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)The Role of Transparency in Repeated First-Price Auctions with Unknown ValuationsProceedings of the 56th Annual ACM Symposium on Theory of Computing10.1145/3618260.3649658(225-236)Online publication date: 10-Jun-2024
  • (2024)Certified Multifidelity Zeroth-Order OptimizationSIAM/ASA Journal on Uncertainty Quantification10.1137/23M159108612:4(1135-1164)Online publication date: 9-Oct-2024
  • (2024)Quantifying the Merits of Network-Assist Online Learning in Optimizing Network Protocols2024 IEEE/ACM 32nd International Symposium on Quality of Service (IWQoS)10.1109/IWQoS61813.2024.10682895(1-10)Online publication date: 19-Jun-2024
  • (2024)Node and relevant data selection in distributed predictive analytics: A query-centric approachJournal of Network and Computer Applications10.1016/j.jnca.2024.104029232(104029)Online publication date: Dec-2024
  • (2024)Sequential query prediction based on multi-armed bandits with ensemble of transformer experts and immediate feedbackData Mining and Knowledge Discovery10.1007/s10618-024-01057-438:6(3758-3782)Online publication date: 2-Aug-2024
  • (2023)Doubly High-Dimensional Contextual Bandits: An Interpretable Model for Joint Assortment-PricingSSRN Electronic Journal10.2139/ssrn.4568525Online publication date: 2023
  • (2023)An efficient algorithm for fair multi-agent multi-armed bandit with low regretProceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v37i7.25985(8159-8167)Online publication date: 7-Feb-2023
  • (2023)An Online Inference-Aided Incentive Framework for Information Elicitation Without VerificationIEEE Journal on Selected Areas in Communications10.1109/JSAC.2023.324270641:4(1167-1185)Online publication date: 1-Apr-2023
  • (2023)The Power of Age-based Reward in Fresh Information AcquisitionIEEE INFOCOM 2023 - IEEE Conference on Computer Communications10.1109/INFOCOM53939.2023.10229008(1-10)Online publication date: 17-May-2023
  • (2023)Starlet: Network defense resource allocation with multi-armed bandits for cloud-edge crowd sensing in IoTDigital Communications and Networks10.1016/j.dcan.2023.03.009Online publication date: Mar-2023
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media