Abstract
We propose Grab-UCB, a graph-kernel multi-arms bandit algorithm to learn online the optimal source placement in large scale networks, such that the reward obtained from a priori unknown network processes is maximized. The uncertainty calls for online learning, which suffers however from the curse of dimensionality. To achieve sample efficiency, we describe the network processes with an adaptive graph dictionary model, which typically leads to sparse spectral representations. This enables a data-efficient learning framework, whose learning rate scales with the dimension of the spectral representation model instead of the one of the network. We then propose Grab-UCB, an online sequential decision strategy that learns the parameters of the spectral representation while optimizing the action strategy. We derive the performance guarantees that depend on network parameters, which further influence the learning curve of the sequential decision strategy We introduce a computationally simplified solving method, Grab-arm-Light, an algorithm that walks along the edges of the polytope representing the objective function. Simulations results show that the proposed online learning algorithm outperforms baseline offline methods that typically separate the learning phase from the testing one. The results confirm the theoretical findings, and further highlight the gain of the proposed online learning strategy in terms of cumulative regret, sample efficiency and computational complexity.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
This includes many reward shapes such as subsampled or filtered signal as well as mean value.
- 2.
It is worth noting that the formalism introduced in this Section extends to most problem on learning on network process, but for the sake of brevity and clarity we discuss only the source optimization problem.
- 3.
Graph filter defined in the spectral domain of the graph, typically in the form of the power series of the graph Laplacian [40].
- 4.
Available at https://lts2.epfl.ch/gsp/.
References
Acemoglu, D., Ozdaglar, A.: Opinion dynamics and learning in social networks. Dyn. Games Appl. 1(1), 3–49 (2011)
Agrawal, S., Goyal, N.: Analysis of Thompson sampling for the multi-armed bandit problem. In: Conference on Learning Theory. JMLR Workshop and Conference Proceedings, pp. 1–26 (2012)
Barabási, A.L., Albert, R.: Emergence of scaling in random networks. Science 286(5439), 509–512 (1999)
Bellemare, M.G., et al.: A geometric perspective on optimal representations for reinforcement learning. CoRR abs/1901.11530 (2019)
Camilleri, R., Jamieson, K., Katz-Samuels, J.: High-dimensional experimental design and kernel bandits. In: Meila, M., Zhang, T. (eds.) Proceedings of International Conference on Machine Learning (ICML) (2021)
Caron, S., Kveton, B., Lelarge, M., Bhagat, S.: Leveraging side observations in stochastic bandits. ArXiv abs/1210.4839 (2012)
Cesa-Bianchi, N., Gentile, C., Zappella, G.: A gang of bandits. In: Proceedings of Advances in Neural Information Processing Systems (NIPS), pp. 737–745 (2013)
Chowdhury, S.R., Gopalan, A.: On kernelized multi-armed bandits. In: Precup, D., Teh, Y.W. (eds.) Proceedings of International Conference on Machine Learning (ICML) (2017)
Chu, W., Li, L., Reyzin, L., Schapire, R.E.: Contextual bandits with linear payoff functions. In: Proceedings of Artificial Intelligence and Statistics Conference (AISTATS), vol. 15, pp. 208–214 (2011)
Esposito, E., Fusco, F., van der Hoeven, D., Cesa-Bianchi, N.: Learning on the edge: online learning with stochastic feedback graphs. arXiv:2210.04229 (2022)
Gentile, C., Li, S., Zappella, G.: Online clustering of bandits. In: Proceedings of International Conference on Machine Learning (ICML) (2014)
Ghari, P.M., Shen, Y.: Online learning with probabilistic feedback. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2022)
Hanawal, M.K., Saligrama, V.: Cost effective algorithms for spectral bandits. In: Proceedings of IEEE Conference on Communication, Control, and Computing (2015)
He, Y., Wai, H.T.: Detecting central nodes from low-rank excited graph signals via structured factor analysis. arXiv preprint arXiv:2109.13573 (2021)
Hsieh, Y.G., Kasiviswanathan, S.P., Kveton, B., Blöbaum, P.: Thompson sampling with diffusion generative prior (2023)
Hölzle, U.: Our commitment to climate-conscious data center cooling. https://blog.google/outreach-initiatives/sustainability/our-commitment-to-climate-conscious-data-center-cooling/ (2022)
Idé, T., Murugesan, K., Bouneffouf, D., Abe, N.: Targeted advertising on social networks using online variational tensor regression. arXiv:2208.10627 (2022)
Jones, N.: How to stop data centres from gobbling up the world’s electricity. Nature 561(7722), 163–167 (2018)
Kassraie, P., Krause, A., Bogunovic, I.: Graph neural network bandits. In: Conference on Neural Information Processing Systems (NeurIPS) (2022)
Kocák, T., Valko, M., Munos, R., Agrawal, S.: Spectral thompson sampling. In: Proceedings of AAAI Conference on Artificial Intelligence (2014)
Korda, N., Szorenyi, B., Li, S.: Distributed clustering of linear bandits in peer to peer networks. In: Proceedings of International Conference on Machine Learning (ICML) (2016)
Lattimore, T., Szepesvári, C.: Bandit algorithms. arXiv (2018)
Lee, C.W., Luo, H., Zhang, M.: A closer look at small-loss bounds for bandits with graph feedback. In: Proceedings of International Conference on Algorithmic Learning Theory (ALT) (2020)
Li, S., Gentile, C., Karatzoglou, A., Zappella, G.: Data-dependent clustering in exploration-exploitation algorithms. arXiv preprint arXiv:1502.03473 (2015)
Li, S., Gentile, C., Karatzoglou, A., Zappella, G.: Online context-dependent clustering in recommendations based on exploration-exploitation algorithms. ArXiv abs/1608.03544 (2016)
Li, S., Karatzoglou, A., Gentile, C.: Collaborative filtering bandits. In: Proceedings of International ACM Conference on Research and Development in Information Retrieval (2016)
Lykouris, T., Tardos, E., Wali, D.: Feedback graph regret bounds for thompson sampling and ucb. In: Proceedings of International Conference on Algorithmic Learning Theory (ALT) (2020)
Mohaghegh Neyshabouri, M., Gokcesu, K., Gokcesu, H., Ozkan, H., Kozat, S.S.: Asymptotically optimal contextual bandit algorithm using hierarchical structures. IEEE Trans. Neural Netw. Learn. Syst. 30(3), 923–937 (2019)
Movric, K.H., Lewis, F.L.: Cooperative optimal control for multi-agent systems on directed graph topologies. IEEE Trans. Autom. Control 59(3), 769–774 (2014)
Nassif, R., Vlaski, S., Sayed, A.H.: Adaptation and learning over networks under subspace constraints. ArXiv 1905.08750 (2019)
Ortega, A., Frossard, P., Kovačević, J., Moura, J.M.F., Vandergheynst, P.: Graph signal processing: overview, challenges, and applications. Proc. IEEE 106(5), 808–828 (2018)
Perra, N., Rocha, L.E.: Modelling opinion dynamics in the age of algorithmic personalisation. Sci. Rep. 9(1), 1–11 (2019)
Ramakrishna, R., Scaglione, A.: Grid-graph signal processing (grid-gsp): a graph signal processing framework for the power grid. IEEE Trans. Signal Process. 69, 2725–2739 (2021)
Salami, H., Ying, B., Sayed, A.H.: Social learning over weakly connected graphs. IEEE Trans. Signal Inf. Process. Netw. 3(2), 222–238 (2017)
Shuman, D.I., Narang, S.K., Frossard, P., Ortega, A., Vandergheynst, P.: The emerging field of signal processing on graphs: extending high-dimensional data analysis to networks and other irregular domains. IEEE Signal Process. Maga. 30(3), 83–98 (2013)
Slivkins, A.: Contextual bandits with similarity information. J. Mach. Learn. Res. 15(1), 2533–2568 (2014)
Tang, S.: When social advertising meets viral marketing: sequencing social advertisements for influence maximization. In: AAAI (2018)
Thaker, P.K., Malu, M., Rao, N., Dasarathy, G.: Maximizing and satisficing in multi-armed bandits with graph information (2022)
Thanou, D., Dong, X., Kressner, D., Frossard, P.: Learning heat diffusion graphs. IEEE Trans. Signal Inf. Process. Netw. 3(3), 484–499 (2017)
Thanou, D., Shuman, D.I., Frossard, P.: Learning parametric dictionaries for signals on graphs. IEEE Trans. Signal Process. 62(15), 3849–3862 (2014)
Toni, L., Frossard, P.: Online network source optimization with graph-kernel MAB. https://arxiv.org/abs/2307.03641 (2023)
Valko, M., Korda, N., Munos, R., Flaounas, I., Cristianini, N.: Finite-time analysis of kernelised contextual bandits (2013)
Valko, M., Munos, R.: Cheap bandits. In: Proceedings of International Conference on Machine Learning (ICML) (2015)
Valko, M., Munos, R., Kveton, B., Kocak, T.: Spectral bandits for smooth graph functions. In: Proceedings of International Conference on Machine Learning (ICML) (2014)
Wai, H.T., Segarra, S., Ozdaglar, A.E., Scaglione, A., Jadbabaie, A.: Blind community detection from low-rank excitations of a graph filter. IEEE Trans. Signal Process. 68, 436–451 (2019)
Waradpande, V., Kudenko, D., Khosla, M.: Deep reinforcement learning with graph-based state representations. arXiv:2004.13965 (2020)
Yang, K., Dong, X., Toni, L.: Laplacian-regularized graph bandits: algorithms and theoretical analysis. In: Proceedings of International Conference on Artificial Intelligence and Statistics (AISTATS) (2020)
Yang, K., Toni, L.: Graph-based recommendation system. In: 2018 IEEE Global Conference on Signal and Information Processing (GlobalSIP) (2018)
Yang, L., Wang, M.: Reinforcement learning in feature space: matrix bandit, kernels, and regret bound. In: III, H.D., Singh, A. (eds.) Proceedings of International Conference on Machine Learning (ICML), pp. 10746–10756 (2020)
Yuan, K., Ying, B., Zhao, X., Sayed, A.H.: Exact Diffusion for Distributed Optimization and Learning – Part I: Algorithm Development. ArXiv abs/1702.05122 (2017)
Zhang, H., Feng, T., Yang, G.H., Liang, H.: Distributed cooperative optimal control for multiagent systems on directed graphs: an inverse optimal approach. IEEE Trans. Cybern. 45(7), 1315–1326 (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Ethics declarations
Ethical Statement
Our work is mostly of theoretical nature, and we do not foresee any direct ethical implications. There is always a risk, as for most works of theoretical and algorithmic nature in machine learning, that the work would be diverted from its original objective, and largely modified to design extensions in non-ethical applications. However, this is not obviously envisaged by the authors at the time of the writing.
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Toni, L., Frossard, P. (2023). Online Network Source Optimization with Graph-Kernel MAB. In: Koutra, D., Plant, C., Gomez Rodriguez, M., Baralis, E., Bonchi, F. (eds) Machine Learning and Knowledge Discovery in Databases: Research Track. ECML PKDD 2023. Lecture Notes in Computer Science(), vol 14171. Springer, Cham. https://doi.org/10.1007/978-3-031-43418-1_15
Download citation
DOI: https://doi.org/10.1007/978-3-031-43418-1_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43417-4
Online ISBN: 978-3-031-43418-1
eBook Packages: Computer ScienceComputer Science (R0)