abstract

Differentially Private Reinforcement Learning with Linear Function Approximation

Author:

Xingyu ZhouAuthors Info & Claims

SIGMETRICS/PERFORMANCE '22: Abstract Proceedings of the 2022 ACM SIGMETRICS/IFIP PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems

Pages 77 - 78

https://doi.org/10.1145/3489048.3522648

Published: 06 June 2022 Publication History

Abstract

Motivated by the wide adoption of reinforcement learning (RL) in real-world personalized services, where users' sensitive and private information needs to be protected, we study regret minimization in finite-horizon Markov decision processes (MDPs) under the constraints of differential privacy (DP). Compared to existing private RL algorithms that work only on tabular finite-state, finite-actions MDPs, we take the first step towards privacy-preserving learning in MDPs with large state and action spaces. Specifically, we consider MDPs with linear function approximation (in particular linear mixture MDPs) under the notion of joint differential privacy (JDP), where the RL agent is responsible for protecting users' sensitive data. We design two private RL algorithms that are based on value iteration and policy optimization, respectively, and show that they enjoy sub-linear regret performance while guaranteeing privacy protection. Moreover, the regret bounds are independent of the number of states, and scale at most logarithmically with the number of actions, making the algorithms suitable for privacy protection in nowadays large-scale personalized services. Our results are achieved via a general procedure for learning in linear mixture MDPs under changing regularizers, which not only generalizes previous results for non-private learning, but also serves as a building block for general private reinforcement learning.

References

[1]

Debabrota Basu, Christos Dimitrakakis, and Aristide Tossou. Differential privacy for multi-armed bandits: What is it and what is its cost? arXiv preprint arXiv:1905.12298, 2019.

[2]

Qi Cai, Zhuoran Yang, Chi Jin, and Zhaoran Wang. Provably efficient exploration in policy optimization. In International Conference on Machine Learning, pages 1283--1294. PMLR, 2020.

[3]

Sayak Ray Chowdhury and Xingyu Zhou. Differentially private regret minimization in episodic markov decision processes. arXiv preprint arXiv:2112.10599, 2021.

[4]

Yan Duan, Xi Chen, Rein Houthooft, John Schulman, and Pieter Abbeel. Benchmarking deep reinforcement learning for continuous control. In International conference on machine learning, pages 1329--1338. PMLR, 2016.

Digital Library

[5]

Cynthia Dwork. Differential privacy: A survey of results. In International conference on theory and applications of models of computation, pages 1--19. Springer, 2008.

Digital Library

[6]

Yonathan Efroni, Lior Shani, Aviv Rosenberg, and Shie Mannor. Optimistic policy optimization with bandit feedback. arXiv preprint arXiv:2002.08243, 2020.

[7]

Evrard Garcelon, Vianney Perchet, Ciara Pike-Burke, and Matteo Pirotta. Local differentially private regret minimization in reinforcement learning. arXiv preprint arXiv:2010.07778, 2020.

[8]

Goren Gordon, Samuel Spaulding, Jacqueline Kory Westlund, Jin Joo Lee, Luke Plummer, Marayna Martinez, Madhurima Das, and Cynthia Breazeal. Affective personalization of a social robot tutor for children's second language skills. In Proceedings of the AAAI conference on artificial intelligence, volume 30, 2016.

[9]

Sham M Kakade. A natural policy gradient. Advances in neural information processing systems, 14, 2001.

[10]

Vijay R Konda and John N Tsitsiklis. Actor-critic algorithms. In Advances in neural information processing systems, pages 1008--1014. Citeseer, 2000.

Digital Library

[11]

Boyi Liu, Qi Cai, Zhuoran Yang, and Zhaoran Wang. Neural proximal/trust region policy optimization attains globally optimal policy. arXiv preprint arXiv:1906.10306, 2019.

[12]

Nikita Mishra and Abhradeep Thakurta. (nearly) optimal differentially private stochastic multi-arm bandits. In Proceedings of the Thirty-First Conference on Uncertainty in Artificial Intelligence, pages 592--601, 2015.

Digital Library

[13]

John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. Trust region policy optimization. In International conference on machine learning, pages 1889--1897. PMLR, 2015.

Digital Library

[14]

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.

[15]

Roshan Shariff and Or Sheffet. Differentially private contextual linear bandits. arXiv preprint arXiv:1810.00068, 2018.

[16]

David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, et al. Mastering the game of go without human knowledge. nature, 550 (7676): 354--359, 2017.

[17]

Aristide Tossou and Christos Dimitrakakis. Algorithms for differentially private multi-armed bandits. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 30, 2016.

[18]

Aristide Tossou and Christos Dimitrakakis. Achieving privacy in the adversarial multi-armed bandit. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 31, 2017.

[19]

Giuseppe Vietri, Borja Balle, Akshay Krishnamurthy, and Steven Wu. Private reinforcement learning with pac and regret guarantees. In International Conference on Machine Learning, pages 9754--9764. PMLR, 2020.

[20]

Lingxiao Wang, Qi Cai, Zhuoran Yang, and Zhaoran Wang. Neural policy gradient methods: Global optimality and rates of convergence. arXiv preprint arXiv:1909.01150, 2019.

[21]

William Yang Wang, Jiwei Li, and Xiaodong He. Deep reinforcement learning for nlp. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts, pages 19--21, 2018.

[22]

Ronald J Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8 (3--4): 229--256, 1992.

[23]

Xingyu Zhou. Differentially private reinforcement learning with linear function approximation. arXiv preprint arXiv:2201.07052, 2022.

[24]

Xingyu Zhou and Jian Tan. Local differential privacy for bayesian optimization. arXiv preprint arXiv:2010.06709, 2020.

Cited By

Zhao FRen XYang SZhao PZhang RXu X(2023)Federated multi-objective reinforcement learningInformation Sciences: an International Journal10.1016/j.ins.2022.12.083624:C(811-832)Online publication date: 1-May-2023
https://dl.acm.org/doi/10.1016/j.ins.2022.12.083
Lei YYe DShen SSui YZhu TZhou W(2022)New challenges in reinforcement learning: a survey of security and privacyArtificial Intelligence Review10.1007/s10462-022-10348-556:7(7195-7236)Online publication date: 6-Dec-2022
https://dl.acm.org/doi/10.1007/s10462-022-10348-5

Recommendations

Differentially Private Reinforcement Learning with Linear Function Approximation
POMACS

Motivated by the wide adoption of reinforcement learning (RL) in real-world personalized services, where users' sensitive and private information needs to be protected, we study regret minimization in finite-horizon Markov decision processes (MDPs) under ...
Prediction of the Resource Consumption of Distributed Deep Learning Systems
SIGMETRICS/PERFORMANCE '22: Abstract Proceedings of the 2022 ACM SIGMETRICS/IFIP PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems

Predicting resource consumption for the distributed training of deep learning models is of paramount importance, as it can inform a priori users of how long their training would take and enable users to manage the cost of training. Yet, no such ...
Competitive Online Optimization with Multiple Inventories: A Divide-and-Conquer Approach
SIGMETRICS/PERFORMANCE '22: Abstract Proceedings of the 2022 ACM SIGMETRICS/IFIP PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems

We study an online inventory trading problem where a user seeks to maximize the aggregate revenue of trading multiple inventories over a time horizon. The trading constraints and concave revenue functions are revealed sequentially in time, and the user ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGMETRICS/PERFORMANCE '22: Abstract Proceedings of the 2022 ACM SIGMETRICS/IFIP PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems

June 2022

132 pages

ISBN:9781450391412

DOI:10.1145/3489048

General Chairs:
D Manjunath
IIT Bombay, India
,
Jayakrishnan Nair
IIT Bombay, India
,
Program Chairs:
Niklas Carlsson
Linköping University, Sweden
,
Edith Cohen
Google Research, USA
,
Philippe Robert
INRIA, France

ACM SIGMETRICS Performance Evaluation Review Volume 50, Issue 1
SIGMETRICS '22
June 2022
118 pages
ISSN:0163-5999
DOI:10.1145/3547353
Editor:
Zhenhua Liu
Stony Brook University
Issue’s Table of Contents

Copyright © 2022 Owner/Author.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

SIGMETRICS: ACM Special Interest Group on Measurement and Evaluation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 June 2022

Check for updates

Author Tags

Qualifiers

Abstract

Conference

SIGMETRICS/PERFORMANCE '22

Sponsor:

SIGMETRICS

SIGMETRICS/PERFORMANCE '22: ACM SIGMETRICS/IFIP PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems

June 6 - 10, 2022

Mumbai, India

Acceptance Rates

Overall Acceptance Rate 459 of 2,691 submissions, 17%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
62
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhao FRen XYang SZhao PZhang RXu X(2023)Federated multi-objective reinforcement learningInformation Sciences: an International Journal10.1016/j.ins.2022.12.083624:C(811-832)Online publication date: 1-May-2023
https://dl.acm.org/doi/10.1016/j.ins.2022.12.083
Lei YYe DShen SSui YZhu TZhou W(2022)New challenges in reinforcement learning: a survey of security and privacyArtificial Intelligence Review10.1007/s10462-022-10348-556:7(7195-7236)Online publication date: 6-Dec-2022
https://dl.acm.org/doi/10.1007/s10462-022-10348-5

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten