Article

Ising bandits with side information

Authors:

Adam Prügel-BennettAuthors Info & Claims

ECMLPKDD'15: Proceedings of the 2015th European Conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I

Pages 448 - 463

https://doi.org/10.1007/978-3-319-23528-8_28

Published: 07 September 2015 Publication History

Abstract

We develop an online learning algorithm for bandits on a graph with side information where there is an underlying Ising distribution over the vertices at low temperatures. We are motivated from practical settings where the graph state in a social or a computer hosts network (potentially) changes at every trial; intrinsically partitioning the graph thus requiring the learning algorithm to play the bandit from the current partition. Our algorithm essentially functions as a two stage process. In the first stage it uses "minimum-cut" as the regularity measure to compute the state of the network by using the side label received and acting as a graph classifier. The classifier internally uses a polynomial time linear programming relaxation technique that incorporates the known information to predict the unknown states. The second stage ensures that the bandits are sampled from the appropriate partition of the graph with the potential for exploring the other part. We achieve this by running the adversarial multi armed bandit for the edges in the current partition while exploring the "cut" edges. We empirically evaluate the strength of our approach through synthetic and real world datasets. We also indicate the potential for a linear time exact algorithm for calculating the max-flow as an alternative to the linear programming relaxation, besides promising bounded mistakes/regret in the number of times the "cut" changes.

References

[1]

Alamgir, M., von Luxburg, U.: Phase transition in the family of p-resistances. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) NIPS, pp. 379-387 (2011)

Digital Library

[2]

Amin, K., Kearns, M., Syed, U.: Graphical models for bandit problems (2012). arXiv preprint arXiv:1202.3782

Digital Library

[3]

Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: Gambling in a rigged casino: the adversarial multi-armed bandit problem. In: Proceedings of the 36th Annual Symposium on Foundations of Computer Science, 1995, pp. 322-331. IEEE (1995)

Digital Library

[4]

Belkin, M., Matveeva, I., Niyogi, P.: Regularization and semi-supervised learning on large graphs. In: Shawe-Taylor, J., Singer, Y. (eds.) COLT 2004. LNCS (LNAI), vol. 3120, pp. 624-638. Springer, Heidelberg (2004)

[5]

Belkin, M., Niyogi, P.: Semi-supervised learning on riemannian manifolds. Mach. Learn. 56(1-3), 209-239 (2004)

Digital Library

[6]

Blum, A., Chawla, S.: Learning from labeled and unlabeled data using graph mincuts. In: ICML, pp. 19-26 (2001)

Digital Library

[7]

Di Castro, D., Gentile, C., Mannor, S.: Bandits with an edge. In: CoRR, abs/1109.2296 (2011)

[8]

Ford, L.R., Fulkerson, D.R.: Maximal Flow through a Network. Canadian Journal of Mathematics 8, 399-404 (1956). http://www.rand.org/pubs/papers/P605/

[9]

Gentile, C., Li, S., Zappella, G.: Online clustering of bandits (2014). arXiv preprint arXiv:1401.8257

Digital Library

[10]

Gentile, C., Orabona, F.: On multilabel classification and ranking with bandit feedback. The Journal of Machine Learning Research 15(1), 2451-2487 (2014)

Digital Library

[11]

Herbster, M.: Exploiting cluster-structure to predict the labeling of a graph. In: Freund, Y., Györfi, L., Turán, G., Zeugmann, T. (eds.) ALT 2008. LNCS (LNAI), vol. 5254, pp. 54-69. Springer, Heidelberg (2008)

Digital Library

[12]

Herbster, M., Lever, G.: Predicting the labelling of a graph via minimum p-seminorm interpolation. In: Proceedings of the 22nd Annual Conference on Learning Theory (COLT 2009) (2009)

[13]

Herbster, M., Lever, G.: Predicting the labelling of a graph via minimum p-seminorm interpolation. In: COLT (2009)

[14]

Herbster, M., Lever, G., Pontil, M.: Online prediction on large diameter graphs. In: Advances in Neural Information Processing Systems, pp. 649-656 (2009)

Digital Library

[15]

Herbster, M., Pontil, M., Wainer, L.: Online learning over graphs. In: Proceedings of the 22nd international conference on Machine learning ICML 2005, pp. 305-312. ACM, New York (2005)

Digital Library

[16]

Nadler, B., Srebro, N., Zhou, X.: Statistical analysis of semi-supervised learning: the limit of infinite unlabelled data. In: NIPS, pp. 1330-1338 (2009)

Digital Library

[17]

Trevisan, L.: Lecture 15:cs261:optimization (2011). http://theory.stanford.edu/trevisan/cs261/lecture15.pdf

[18]

Valko, M., Munos, R., Kveton, B., Kocák, T.: Spectral bandits for smooth graph functions. In: 31th International Conference on Machine Learning (2014)

Digital Library

[19]

Zhu, X., Ghahramani, Z.: Towards semi-supervised classification with markov random fields. Tech. Rep. CMU-CALD-02-106, Carnegie Mellon University (2002)

[20]

Zhu, X., Ghahramani, Z., Lafferty, J.D.: Semi-supervised learning using gaussian fields and harmonic functions. In: ICML, pp. 912-919 (2003)

Digital Library

Recommendations

Dueling bandits with weak regret
ICML'17: Proceedings of the 34th International Conference on Machine Learning - Volume 70

We consider online content recommendation with implicit feedback through pairwise comparisons, formalized as the so-called dueling bandit problem. We study the dueling bandit problem in the Condorcet winner setting, and consider two notions of regret: ...
Nonstochastic bandits: Countable decision set, unbounded costs and reactive environments

The nonstochastic multi-armed bandit problem, first studied by Auer, Cesa-Bianchi, Freund, and Schapire in 1995, is a game of repeatedly choosing one decision from a set of decisions (''experts''), under partial observation: In each round t, only the ...
Information-gathering in latent bandits
Abstract
In the latent bandit problem, the learner has access to reward distributions and – for the non-stationary variant – transition models of the environment. The reward distributions are conditioned on the arm and unknown latent states. ...
Highlights
- We investigate the use of information gathering in latent bandits.
- We develop a ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

ECMLPKDD'15: Proceedings of the 2015th European Conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I

September 2015

707 pages

ISBN:9783319235271

Editors:
Annalisa Appice
University of Bari Aldo Moro, Bari, Italy
,
Pedro Pereira Rodrigues
University of Porto, Porto, Portugal
,
Vítor Santos Costa
University of Porto - CRACS/INESC TEC, Porto, Portugal
,
Carlos Soares
University of Porto - INESC TEC, Porto, Portugal
,
João Gama
University of Porto - INESC TEC, Porto, Portugal

Sponsors

Huawei Technologies Co. Ltd.: Huawei Technologies Co. Ltd.
Zalando: Zalando
ONRGlobal: U.S. Office of Naval Research Global
BNPPARIBAS: BNP PARIBAS
Amazon: Amazon.com

Publisher

Springer

Gewerbestrasse 11 CH-6330, Cham (ZG), Switzerland

Publication History

Published: 07 September 2015

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 16 Oct 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents