research-article

Learning to trade off between exploration and exploitation in multiclass bandit prediction

Authors:

Hamed Valizadegan,

Shijun WangAuthors Info & Claims

KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 204 - 212

https://doi.org/10.1145/2020408.2020445

Published: 21 August 2011 Publication History

Abstract

We study multi-class bandit prediction, an online learning problem where the learner only receives a partial feedback in each trial indicating whether the predicted class label is correct. The exploration vs. exploitation tradeoff strategy is a well-known technique for online learning with incomplete feedback (i.e., bandit setup). Banditron [8], a multi-class online learning algorithm for bandit setting, maximizes the run-time gain by balancing between exploration and exploitation with a fixed tradeoff parameter. The performance of Banditron can be quite sensitive to the choice of the tradeoff parameter and therefore effective algorithms to automatically tune this parameter is desirable. In this paper, we propose three learning strategies to automatically adjust the tradeoff parameter for Banditron. Our extensive empirical study with multiple real-world data sets verifies the efficacy of the proposed approach in learning the exploration vs. exploitation tradeoff parameter.

References

[1]

Peter Auer, Nicolò Cesa-Bianchi, and Paul Fischer. Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47(2--3):235--256, 2002.

Digital Library

[2]

Alina Beygelzimer, John Langford, Lihong Li, Lev Reyzin, and Robert E. Schapire. An optimal high probability algorithm for the contextual bandit problem. Computational Research Repository, abs/1002.4058, 2010.

[3]

N. Cesa-Bianchi and G. Lugosi. Prediction, learning, and games. Cambridge Univ Pr, 2006.

Digital Library

[4]

Chih-Chung Chang and Chih-Jen Lin. Libsvm : a library for support vector machines, 2001.

Digital Library

[5]

Eyal Even-Dar, Shie Mannor, and Yishay Mansour. PAC bounds for multi-armed bandit and markov decision processes. In COLT '02: Proceedings of the 15th Annual Conference on Computational Learning Theory, pages 255--270, 2002.

Digital Library

[6]

Eyal Even-Dar, Shie Mannor, and Yishay Mansour. Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems. Journal of Machine Learning Research, 7:1079--1105, 2006.

Digital Library

[7]

A. Frank and A. Asuncion. UCI machine learning repository, 2010.

[8]

Sham M. Kakade, Shai Shalev-Shwartz, and Ambuj Tewari. Efficient bandit algorithms for online multiclass prediction. In ICML 2008: Proceedings of the 25th international conference on Machine learning, pages 440--447, 2008.

Digital Library

[9]

John Langford and Zhang Tong. The epoch-greedy algorithm for contextual multi-armed bandits. In NIPS 2007: Proceeding of the 20th Annual Conference on Neural Information Processing System, 2007.

[10]

D. D. Lewis, Y. Yang, T. Rose, and F. Li. Rcv1: A new benchmark collection for text categorization research. Journal of Machine Learning Research, 5:361--397, 2004.

Digital Library

[11]

Lihong Li, Wei Chu, John Langford, and Robert E. Schapire. A contextual-bandit approach to personalized news article recommendation. In WWW '10: Proceedings of the 19th international conference on World wide web, pages 661--670, New York, NY, USA, 2010. ACM.

Digital Library

[12]

Wei Li, Xuerui Wang, Ruofei Zhang, Ying Cui, Jianchang Mao, and Rong Jin. Exploitation and exploration in a performance based contextual advertising system. In KDD 2010: Knoledge Discovery and Data Mining, pages 27--36, 2010.

Digital Library

[13]

Shie Mannor and John N. Tsitsiklis. The sample complexity of exploration in the multi-armed bandit problem. Journal of Machine Learning Research, 5:623--648, 2004.

Digital Library

[14]

Herbert Robbins. some aspects of the sequential design of experiments. Bulletin of the American Mathematical Society, 58:527--535, 1952.

[15]

Herbert Robins. Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc., 58(5):527--535, 2010.

[16]

F. Rosenblatt. The perceptron: a probabilistic model for information storage and organization in the brain. Psychological review, 65:386--408, 1958.

Digital Library

[17]

Joannès Vermorel and Mehryar Mohri. Multi-armed bandit algorithms and empirical evaluation. In European Conference on Machine Learning, pages 437--448. Springer, 2005.

Digital Library

[18]

Shijun Wang, Rong Jin, and Hamed Valizadegan. A potential-based framework for online multi-class learning with partial feedback. In ISTATS 2010: Artificial Intelligence and Statistics, 2010.

[19]

C. Watkins. Learning from delayed Rewards. PhD thesis, Cambridge, 1989.

Cited By

Elreedy DAtiya AShaheen S(2021)Multi-Step Look-Ahead Optimization Methods for Dynamic Pricing With Demand LearningIEEE Access10.1109/ACCESS.2021.30875779(88478-88497)Online publication date: 2021
https://doi.org/10.1109/ACCESS.2021.3087577
Elreedy DAtiya AShaheen S(2021)Novel pricing strategies for revenue maximization and demand learning using an exploration–exploitation frameworkSoft Computing - A Fusion of Foundations, Methodologies and Applications10.1007/s00500-021-06047-y25:17(11711-11733)Online publication date: 1-Sep-2021
https://dl.acm.org/doi/10.1007/s00500-021-06047-y
AL-Saiagh WTiun SAL-Saffar AAwang SAl-khaleefa A(2018)Word sense disambiguation using hybrid swarm intelligence approachPLOS ONE10.1371/journal.pone.020869513:12(e0208695)Online publication date: 20-Dec-2018
https://doi.org/10.1371/journal.pone.0208695
Show More Cited By

Index Terms

Learning to trade off between exploration and exploitation in multiclass bandit prediction
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
    2. Machine learning approaches
      1. Markov decision processes
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Machine learning theory
      1. Markov decision processes

Recommendations

Multiclass Classification Using Dilute Bandit Feedback
PRICAI 2021: Trends in Artificial Intelligence
Abstract
This paper introduces a new online learning framework for multiclass classification called learning with diluted bandit feedback. At every time step, the algorithm predicts a candidate label set instead of a single label for the observed example. ... $^{\frac{}{}}$
New bounds on the price of bandit feedback for mistake-bounded online multiclass learning
Abstract
This paper is about two generalizations of the mistake bound model to online multiclass classification. In the standard model, the learner receives the correct classification at the end of each round, and in the bandit model, the ...
Online transfer learning with partial feedback
Abstract
Online learning for multi-class classification is a well-studied topic in machine learning. The standard multi-class classification online learning setting assumes continuous availability of the ground-truth class labels. However, in ...
Highlights
- We propose a partial feedback online transfer learning (PFOTL) algorithm.
- We ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining

August 2011

1446 pages

ISBN:9781450308137

DOI:10.1145/2020408

General Chair:
Chid Apte
IBM Research
,
Program Chairs:
Joydeep Ghosh
UT Austin
,
Padhraic Smyth
UC Irvine

Copyright © 2011 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 August 2011

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

KDD '11

Sponsor:

KDD '11: The 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 21 - 24, 2011

California, San Diego, USA

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
489
Total Downloads

Downloads (Last 12 months)17
Downloads (Last 6 weeks)2

Reflects downloads up to 23 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Elreedy DAtiya AShaheen S(2021)Multi-Step Look-Ahead Optimization Methods for Dynamic Pricing With Demand LearningIEEE Access10.1109/ACCESS.2021.30875779(88478-88497)Online publication date: 2021
https://doi.org/10.1109/ACCESS.2021.3087577
Elreedy DAtiya AShaheen S(2021)Novel pricing strategies for revenue maximization and demand learning using an exploration–exploitation frameworkSoft Computing - A Fusion of Foundations, Methodologies and Applications10.1007/s00500-021-06047-y25:17(11711-11733)Online publication date: 1-Sep-2021
https://dl.acm.org/doi/10.1007/s00500-021-06047-y
AL-Saiagh WTiun SAL-Saffar AAwang SAl-khaleefa A(2018)Word sense disambiguation using hybrid swarm intelligence approachPLOS ONE10.1371/journal.pone.020869513:12(e0208695)Online publication date: 20-Dec-2018
https://doi.org/10.1371/journal.pone.0208695
Nagi JNgo HGiusti AGambardella LSchmidhuber JDi Caro G(2012)Incremental learning using partial feedback for gesture-based human-swarm interaction2012 IEEE RO-MAN: The 21st IEEE International Symposium on Robot and Human Interactive Communication10.1109/ROMAN.2012.6343865(898-905)Online publication date: Sep-2012
https://doi.org/10.1109/ROMAN.2012.6343865
Ralaivola LFavre BGotab PBechet FDamnati G(2011)Applying Multiclass Bandit algorithms to call-type classification2011 IEEE Workshop on Automatic Speech Recognition & Understanding10.1109/ASRU.2011.6163970(431-436)Online publication date: Dec-2011
https://doi.org/10.1109/ASRU.2011.6163970

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents