Online Network Source Optimization with Graph-Kernel MAB

Toni, Laura; Frossard, Pascal

doi:10.1007/978-3-031-43418-1_15

Laura Toni¹² &
Pascal Frossard¹³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14171))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

1287 Accesses

Abstract

We propose Grab-UCB, a graph-kernel multi-arms bandit algorithm to learn online the optimal source placement in large scale networks, such that the reward obtained from a priori unknown network processes is maximized. The uncertainty calls for online learning, which suffers however from the curse of dimensionality. To achieve sample efficiency, we describe the network processes with an adaptive graph dictionary model, which typically leads to sparse spectral representations. This enables a data-efficient learning framework, whose learning rate scales with the dimension of the spectral representation model instead of the one of the network. We then propose Grab-UCB, an online sequential decision strategy that learns the parameters of the spectral representation while optimizing the action strategy. We derive the performance guarantees that depend on network parameters, which further influence the learning curve of the sequential decision strategy We introduce a computationally simplified solving method, Grab-arm-Light, an algorithm that walks along the edges of the polytope representing the objective function. Simulations results show that the proposed online learning algorithm outperforms baseline offline methods that typically separate the learning phase from the testing one. The results confirm the theoretical findings, and further highlight the gain of the proposed online learning strategy in terms of cumulative regret, sample efficiency and computational complexity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Mechanisms with learning for stochastic multi-armed bandit problems

Article 30 June 2016

Online Weakly DR-Submodular Optimization with Stochastic Long-Term Constraints

Online $$\textrm{L}^{\natural }$$ -Convex Minimization

Notes

1.
This includes many reward shapes such as subsampled or filtered signal as well as mean value.
2.
It is worth noting that the formalism introduced in this Section extends to most problem on learning on network process, but for the sake of brevity and clarity we discuss only the source optimization problem.
3.
Graph filter defined in the spectral domain of the graph, typically in the form of the power series of the graph Laplacian [40].
4.
Available at https://lts2.epfl.ch/gsp/.

References

Acemoglu, D., Ozdaglar, A.: Opinion dynamics and learning in social networks. Dyn. Games Appl. 1(1), 3–49 (2011)
Article MathSciNet Google Scholar
Agrawal, S., Goyal, N.: Analysis of Thompson sampling for the multi-armed bandit problem. In: Conference on Learning Theory. JMLR Workshop and Conference Proceedings, pp. 1–26 (2012)
Google Scholar
Barabási, A.L., Albert, R.: Emergence of scaling in random networks. Science 286(5439), 509–512 (1999)
Article MathSciNet Google Scholar
Bellemare, M.G., et al.: A geometric perspective on optimal representations for reinforcement learning. CoRR abs/1901.11530 (2019)
Camilleri, R., Jamieson, K., Katz-Samuels, J.: High-dimensional experimental design and kernel bandits. In: Meila, M., Zhang, T. (eds.) Proceedings of International Conference on Machine Learning (ICML) (2021)
Google Scholar
Caron, S., Kveton, B., Lelarge, M., Bhagat, S.: Leveraging side observations in stochastic bandits. ArXiv abs/1210.4839 (2012)
Cesa-Bianchi, N., Gentile, C., Zappella, G.: A gang of bandits. In: Proceedings of Advances in Neural Information Processing Systems (NIPS), pp. 737–745 (2013)
Google Scholar
Chowdhury, S.R., Gopalan, A.: On kernelized multi-armed bandits. In: Precup, D., Teh, Y.W. (eds.) Proceedings of International Conference on Machine Learning (ICML) (2017)
Google Scholar
Chu, W., Li, L., Reyzin, L., Schapire, R.E.: Contextual bandits with linear payoff functions. In: Proceedings of Artificial Intelligence and Statistics Conference (AISTATS), vol. 15, pp. 208–214 (2011)
Google Scholar
Esposito, E., Fusco, F., van der Hoeven, D., Cesa-Bianchi, N.: Learning on the edge: online learning with stochastic feedback graphs. arXiv:2210.04229 (2022)
Gentile, C., Li, S., Zappella, G.: Online clustering of bandits. In: Proceedings of International Conference on Machine Learning (ICML) (2014)
Google Scholar
Ghari, P.M., Shen, Y.: Online learning with probabilistic feedback. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2022)
Google Scholar
Hanawal, M.K., Saligrama, V.: Cost effective algorithms for spectral bandits. In: Proceedings of IEEE Conference on Communication, Control, and Computing (2015)
Google Scholar
He, Y., Wai, H.T.: Detecting central nodes from low-rank excited graph signals via structured factor analysis. arXiv preprint arXiv:2109.13573 (2021)
Hsieh, Y.G., Kasiviswanathan, S.P., Kveton, B., Blöbaum, P.: Thompson sampling with diffusion generative prior (2023)
Google Scholar
Hölzle, U.: Our commitment to climate-conscious data center cooling. https://blog.google/outreach-initiatives/sustainability/our-commitment-to-climate-conscious-data-center-cooling/ (2022)
Idé, T., Murugesan, K., Bouneffouf, D., Abe, N.: Targeted advertising on social networks using online variational tensor regression. arXiv:2208.10627 (2022)
Jones, N.: How to stop data centres from gobbling up the world’s electricity. Nature 561(7722), 163–167 (2018)
Article Google Scholar
Kassraie, P., Krause, A., Bogunovic, I.: Graph neural network bandits. In: Conference on Neural Information Processing Systems (NeurIPS) (2022)
Google Scholar
Kocák, T., Valko, M., Munos, R., Agrawal, S.: Spectral thompson sampling. In: Proceedings of AAAI Conference on Artificial Intelligence (2014)
Google Scholar
Korda, N., Szorenyi, B., Li, S.: Distributed clustering of linear bandits in peer to peer networks. In: Proceedings of International Conference on Machine Learning (ICML) (2016)
Google Scholar
Lattimore, T., Szepesvári, C.: Bandit algorithms. arXiv (2018)
Google Scholar
Lee, C.W., Luo, H., Zhang, M.: A closer look at small-loss bounds for bandits with graph feedback. In: Proceedings of International Conference on Algorithmic Learning Theory (ALT) (2020)
Google Scholar
Li, S., Gentile, C., Karatzoglou, A., Zappella, G.: Data-dependent clustering in exploration-exploitation algorithms. arXiv preprint arXiv:1502.03473 (2015)
Li, S., Gentile, C., Karatzoglou, A., Zappella, G.: Online context-dependent clustering in recommendations based on exploration-exploitation algorithms. ArXiv abs/1608.03544 (2016)
Li, S., Karatzoglou, A., Gentile, C.: Collaborative filtering bandits. In: Proceedings of International ACM Conference on Research and Development in Information Retrieval (2016)
Google Scholar
Lykouris, T., Tardos, E., Wali, D.: Feedback graph regret bounds for thompson sampling and ucb. In: Proceedings of International Conference on Algorithmic Learning Theory (ALT) (2020)
Google Scholar
Mohaghegh Neyshabouri, M., Gokcesu, K., Gokcesu, H., Ozkan, H., Kozat, S.S.: Asymptotically optimal contextual bandit algorithm using hierarchical structures. IEEE Trans. Neural Netw. Learn. Syst. 30(3), 923–937 (2019)
Article MathSciNet Google Scholar
Movric, K.H., Lewis, F.L.: Cooperative optimal control for multi-agent systems on directed graph topologies. IEEE Trans. Autom. Control 59(3), 769–774 (2014)
Article MathSciNet Google Scholar
Nassif, R., Vlaski, S., Sayed, A.H.: Adaptation and learning over networks under subspace constraints. ArXiv 1905.08750 (2019)
Ortega, A., Frossard, P., Kovačević, J., Moura, J.M.F., Vandergheynst, P.: Graph signal processing: overview, challenges, and applications. Proc. IEEE 106(5), 808–828 (2018)
Article Google Scholar
Perra, N., Rocha, L.E.: Modelling opinion dynamics in the age of algorithmic personalisation. Sci. Rep. 9(1), 1–11 (2019)
Article Google Scholar
Ramakrishna, R., Scaglione, A.: Grid-graph signal processing (grid-gsp): a graph signal processing framework for the power grid. IEEE Trans. Signal Process. 69, 2725–2739 (2021)
Article MathSciNet Google Scholar
Salami, H., Ying, B., Sayed, A.H.: Social learning over weakly connected graphs. IEEE Trans. Signal Inf. Process. Netw. 3(2), 222–238 (2017)
MathSciNet Google Scholar
Shuman, D.I., Narang, S.K., Frossard, P., Ortega, A., Vandergheynst, P.: The emerging field of signal processing on graphs: extending high-dimensional data analysis to networks and other irregular domains. IEEE Signal Process. Maga. 30(3), 83–98 (2013)
Article Google Scholar
Slivkins, A.: Contextual bandits with similarity information. J. Mach. Learn. Res. 15(1), 2533–2568 (2014)
MathSciNet Google Scholar
Tang, S.: When social advertising meets viral marketing: sequencing social advertisements for influence maximization. In: AAAI (2018)
Google Scholar
Thaker, P.K., Malu, M., Rao, N., Dasarathy, G.: Maximizing and satisficing in multi-armed bandits with graph information (2022)
Google Scholar
Thanou, D., Dong, X., Kressner, D., Frossard, P.: Learning heat diffusion graphs. IEEE Trans. Signal Inf. Process. Netw. 3(3), 484–499 (2017)
MathSciNet Google Scholar
Thanou, D., Shuman, D.I., Frossard, P.: Learning parametric dictionaries for signals on graphs. IEEE Trans. Signal Process. 62(15), 3849–3862 (2014)
Article MathSciNet Google Scholar
Toni, L., Frossard, P.: Online network source optimization with graph-kernel MAB. https://arxiv.org/abs/2307.03641 (2023)
Valko, M., Korda, N., Munos, R., Flaounas, I., Cristianini, N.: Finite-time analysis of kernelised contextual bandits (2013)
Google Scholar
Valko, M., Munos, R.: Cheap bandits. In: Proceedings of International Conference on Machine Learning (ICML) (2015)
Google Scholar
Valko, M., Munos, R., Kveton, B., Kocak, T.: Spectral bandits for smooth graph functions. In: Proceedings of International Conference on Machine Learning (ICML) (2014)
Google Scholar
Wai, H.T., Segarra, S., Ozdaglar, A.E., Scaglione, A., Jadbabaie, A.: Blind community detection from low-rank excitations of a graph filter. IEEE Trans. Signal Process. 68, 436–451 (2019)
Article MathSciNet Google Scholar
Waradpande, V., Kudenko, D., Khosla, M.: Deep reinforcement learning with graph-based state representations. arXiv:2004.13965 (2020)
Yang, K., Dong, X., Toni, L.: Laplacian-regularized graph bandits: algorithms and theoretical analysis. In: Proceedings of International Conference on Artificial Intelligence and Statistics (AISTATS) (2020)
Google Scholar
Yang, K., Toni, L.: Graph-based recommendation system. In: 2018 IEEE Global Conference on Signal and Information Processing (GlobalSIP) (2018)
Google Scholar
Yang, L., Wang, M.: Reinforcement learning in feature space: matrix bandit, kernels, and regret bound. In: III, H.D., Singh, A. (eds.) Proceedings of International Conference on Machine Learning (ICML), pp. 10746–10756 (2020)
Google Scholar
Yuan, K., Ying, B., Zhao, X., Sayed, A.H.: Exact Diffusion for Distributed Optimization and Learning – Part I: Algorithm Development. ArXiv abs/1702.05122 (2017)
Zhang, H., Feng, T., Yang, G.H., Liang, H.: Distributed cooperative optimal control for multiagent systems on directed graphs: an inverse optimal approach. IEEE Trans. Cybern. 45(7), 1315–1326 (2015)
Article Google Scholar

Download references

Author information

Authors and Affiliations

EEE Department, University College London, London, UK
Laura Toni
LTS4, EPFL, Lausanne, Switzerland
Pascal Frossard

Authors

Laura Toni
View author publications
You can also search for this author in PubMed Google Scholar
Pascal Frossard
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Laura Toni .

Editor information

Editors and Affiliations

University of Michigan, Ann Arbor, MI, USA
Danai Koutra
University of Vienna, Vienna, Austria
Claudia Plant
Max Planck Institute for Software Systems, Kaiserslautern, Germany
Manuel Gomez Rodriguez
Politecnico di Torino, Turin, Italy
Elena Baralis
CENTAI, Turin, Italy
Francesco Bonchi

Ethics declarations

Ethical Statement

Our work is mostly of theoretical nature, and we do not foresee any direct ethical implications. There is always a risk, as for most works of theoretical and algorithmic nature in machine learning, that the work would be diverted from its original objective, and largely modified to design extensions in non-ethical applications. However, this is not obviously envisaged by the authors at the time of the writing.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Toni, L., Frossard, P. (2023). Online Network Source Optimization with Graph-Kernel MAB. In: Koutra, D., Plant, C., Gomez Rodriguez, M., Baralis, E., Bonchi, F. (eds) Machine Learning and Knowledge Discovery in Databases: Research Track. ECML PKDD 2023. Lecture Notes in Computer Science(), vol 14171. Springer, Cham. https://doi.org/10.1007/978-3-031-43418-1_15

Download citation

DOI: https://doi.org/10.1007/978-3-031-43418-1_15
Published: 17 September 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43417-4
Online ISBN: 978-3-031-43418-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)

Online Network Source Optimization with Graph-Kernel MAB

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Mechanisms with learning for stochastic multi-armed bandit problems

Online Weakly DR-Submodular Optimization with Stochastic Long-Term Constraints

Online $$\textrm{L}^{\natural }$$ -Convex Minimization

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Ethics declarations

Ethical Statement

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Subscribe and save

Buy Now

Navigation

Online Network Source Optimization with Graph-Kernel MAB

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Mechanisms with learning for stochastic multi-armed bandit problems

Online Weakly DR-Submodular Optimization with Stochastic Long-Term Constraints

Online $$\textrm{L}^{\natural }$$ -Convex Minimization

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Ethics declarations

Ethical Statement

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation