Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Dynamic Regret Minimization for Control of Non-stationary Linear Dynamical Systems

Published: 28 February 2022 Publication History

Abstract

We consider the problem of controlling a Linear Quadratic Regulator (LQR) system over a finite horizon T with fixed and known cost matrices Q,R, but unknown and non-stationary dynamics A_t, B_t. The sequence of dynamics matrices can be arbitrary, but with a total variation, V_T, assumed to be o(T) and unknown to the controller. Under the assumption that a sequence of stabilizing, but potentially sub-optimal controllers is available for all t, we present an algorithm that achieves the optimal dynamic regret of O(V_T^2/5 T^3/5 ). With piecewise constant dynamics, our algorithm achieves the optimal regret of O(sqrtST ) where S is the number of switches. The crux of our algorithm is an adaptive non-stationarity detection strategy, which builds on an approach recently developed for contextual Multi-armed Bandit problems. We also argue that non-adaptive forgetting (e.g., restarting or using sliding window learning with a static window size) may not be regret optimal for the LQR problem, even when the window size is optimally tuned with the knowledge of $V_T$. The main technical challenge in the analysis of our algorithm is to prove that the ordinary least squares (OLS) estimator has a small bias when the parameter to be estimated is non-stationary. Our analysis also highlights that the key motif driving the regret is that the LQR problem is in spirit a bandit problem with linear feedback and locally quadratic cost. This motif is more universal than the LQR problem itself, and therefore we believe our results should find wider application.

References

[1]
Yasin Abbasi-Yadkori, Dávid Pál, and Csaba Szepesvári. 2011. Improved algorithms for linear stochastic bandits. In Advances in Neural Information Processing Systems. 2312--2320.
[2]
Yasin Abbasi-Yadkori and Csaba Szepesvári. 2011. Regret bounds for the adaptive control of linear quadratic systems. In Conference on Learning Theory. 1--26.
[3]
Michael Athans. 1971. The role and use of the stochastic linear-quadratic-Gaussian problem in control system design. IEEE transactions on automatic control, Vol. 16, 6 (1971), 529--552.
[4]
Dimitri Bertsekas. 2012. Dynamic programming and optimal control: Volume I. Vol. 1. Athena scientific.
[5]
Omar Besbes, Yonatan Gur, and Assaf Zeevi. 2014. Stochastic multi-armed-bandit problem with non-stationary rewards. Advances in neural information processing systems, Vol. 27 (2014), 199--207.
[6]
Nicholas M Boffi, Stephen Tu, and Jean-Jacques E Slotine. 2021. Regret bounds for adaptive nonlinear control. In Learning for Dynamics and Control. PMLR, 471--483.
[7]
Asaf Cassel, Alon Cohen, and Tomer Koren. 2020. Logarithmic regret for learning linear quadratic regulators efficiently. In International Conference on Machine Learning. PMLR, 1328--1337.
[8]
Yifang Chen, Chung-Wei Lee, Haipeng Luo, and Chen-Yu Wei. 2019. A New Algorithm for Non-stationary Contextual Bandits: Efficient, Optimal and Parameter-free. In COLT. 696--726. http://proceedings.mlr.press/v99/chen19b.html
[9]
Wang Chi Cheung, David Simchi-Levi, and Ruihao Zhu. 2019 a. Learning to Optimize under Non-Stationarity. In Proceedings of Machine Learning Research (Proceedings of Machine Learning Research, Vol. 89), Kamalika Chaudhuri and Masashi Sugiyama (Eds.). PMLR, 1079--1087. http://proceedings.mlr.press/v89/cheung19b.html
[10]
Wang Chi Cheung, David Simchi-Levi, and Ruihao Zhu. 2019 b. Non-stationary reinforcement learning: The blessing of (more) optimism. Available at SSRN 3397818 (2019).
[11]
Gregory C Chow. 1976. Control methods for macroeconomic policy analysis. The American Economic Review, Vol. 66, 2 (1976), 340--345.
[12]
Alon Cohen, Avinatan Hassidim, Tomer Koren, Nevena Lazic, Yishay Mansour, and Kunal Talwar. 2018. Online linear quadratic control. arXiv preprint arXiv:1806.07104 (2018).
[13]
Alon Cohen, Tomer Koren, and Yishay Mansour. 2019. Learning Linear-Quadratic Regulators Efficiently with only $sqrtT$ Regret. arxiv: 1902.06223 [cs.LG]
[14]
Amit Daniely, Alon Gonen, and Shai Shalev-Shwartz. 2015. Strongly adaptive online learning. In International Conference on Machine Learning . PMLR, 1405--1411.
[15]
Sarah Dean, Horia Mania, Nikolai Matni, Benjamin Recht, and Stephen Tu. 2018. Regret Bounds for Robust Adaptive Control of the Linear Quadratic Regulator. In Proceedings of the 32nd International Conference on Neural Information Processing Systems (Montréal, Canada) (NIPS'18). Curran Associates Inc., Red Hook, NY, USA, 4192--4201.
[16]
Mohamad Kazem Shirani Faradonbeh, Ambuj Tewari, and George Michailidis. 2018. Finite-time adaptive stabilization of linear systems. IEEE Trans. Automat. Control, Vol. 64, 8 (2018), 3498--3505.
[17]
Mohamad Kazem Shirani Faradonbeh, Ambuj Tewari, and George Michailidis. 2020. Input perturbations for adaptive control and learning. Automatica, Vol. 117 (2020), 108950.
[18]
P.M. Gahinet, A.J. Laub, C.S. Kenney, and G.A. Hewer. 1990. Sensitivity of the stable discrete-time Lyapunov equation. IEEE Trans. Automat. Control, Vol. 35, 11 (1990), 1209--1217. https://doi.org/10.1109/9.59806
[19]
Pratik Gajane, Ronald Ortner, and Peter Auer. 2018. A sliding-window algorithm for Markov decision processes with arbitrarily changing rewards and transitions. arXiv preprint arXiv:1805.10066 (2018).
[20]
Aurélien Garivier and Eric Moulines. 2011. On upper-confidence bound policies for switching bandit problems. In International Conference on Algorithmic Learning Theory. Springer, 174--188.
[21]
Gautam Goel and Babak Hassibi. 2021. Regret-optimal Estimation and Contro. arXiv preprint arXiv:2106.12097 (2021).
[22]
Paula Gradu, Elad Hazan, and Edgar Minasyan. 2020. Adaptive regret for control of time-varying dynamics. arXiv preprint arXiv:2007.04393 (2020).
[23]
Bruce Hajek. 1982. Hitting-time and occupation-time bounds implied by drift analysis with applications. Advances in Applied probability (1982), 502--525.
[24]
Elad Hazan, Sham Kakade, and Karan Singh. 2020. The nonstochastic control problem. In Algorithmic Learning Theory. PMLR, 408--421.
[25]
Elad Hazan and Comandur Seshadhri. 2009. Efficient learning algorithms for changing environments. In Proceedings of the 26th annual international conference on machine learning. 393--400.
[26]
Mark Herbster and Manfred K Warmuth. 1998. Tracking the best expert. Machine learning, Vol. 32, 2 (1998), 151--178.
[27]
Morteza Ibrahimi, Adel Javanmard, and Benjamin V Roy. 2012. Efficient reinforcement learning for high dimensional linear quadratic systems. In Advances in Neural Information Processing Systems. 2636--2644.
[28]
Yassir Jedra and Alexandre Proutiere. 2021. Minimal Expected Regret in Linear Quadratic Control. arxiv: 2109.14429 [cs.LG]
[29]
Beatrice Laurent and Pascal Massart. 2000. Adaptive estimation of a quadratic functional by model selection. Annals of Statistics (2000), 1302--1338.
[30]
Sergey Levine, Chelsea Finn, Trevor Darrell, and Pieter Abbeel. 2016. End-to-end training of deep visuomotor policies. The Journal of Machine Learning Research, Vol. 17, 1 (2016), 1334--1373.
[31]
Horia Mania, Stephen Tu, and Benjamin Recht. 2019. Certainty equivalence is efficient for linear quadratic control. arXiv preprint arXiv:1902.07826 (2019).
[32]
Prasad A Naik. 2014. Marketing dynamics: A primer on estimation and control. Foundations and Trends in Marketing, Vol. 9, 3 (2014), 175--266.
[33]
Ronald Ortner, Pratik Gajane, and Peter Auer. 2020. Variational regret bounds for reinforcement learning. In Uncertainty in Artificial Intelligence. PMLR, 81--90.
[34]
Yoan Russac, Claire Vernade, and Olivier Cappé. 2019. Weighted Linear Bandits for Non-Stationary Environments. In Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. dtextquotesingle Alché-Buc, E. Fox, and R. Garnett (Eds.), Vol. 32. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2019/file/263fc48aae39f219b4c71d9d4bb4aed2-Paper.pdf
[35]
Max Simchowitz and Dylan Foster. 2020. Naive exploration is optimal for online LQR. In International Conference on Machine Learning . PMLR, 8937--8948.
[36]
Max Simchowitz, Karan Singh, and Elad Hazan. 2020. Improper learning for non-stochastic control. In Conference on Learning Theory. PMLR, 3320--3436.
[37]
Russ Tedrake. 2009. Underactuated robotics: Learning, planning, and control for efficient and agile machines course notes for MIT 6.832. Working draft edition, Vol. 3 (2009).
[38]
Roman Vershynin. 2010. Introduction to the non-asymptotic analysis of random matrices. arXiv preprint arXiv:1011.3027 (2010).
[39]
Chen-Yu Wei and Haipeng Luo. 2021. Non-stationary Reinforcement Learning without Prior Knowledge: An Optimal Black-box Approach. arXiv preprint arXiv:2102.05406 (2021).
[40]
Jia Yuan Yu, Shie Mannor, and Nahum Shimkin. 2009. Markov decision processes with arbitrary reward processes. Mathematics of Operations Research, Vol. 34, 3 (2009), 737--757.
[41]
Peng Zhao and Lijun Zhang. 2021. Non-stationary linear bandits revisited. arXiv preprint arXiv:2103.05324 (2021).
[42]
Martin Zinkevich. 2003. Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the 20th international conference on machine learning (icml-03) . 928--936.

Cited By

View all
  • (2023)Rate-Matching the Regret Lower-Bound in the Linear Quadratic Regulator with Unknown Dynamics2023 62nd IEEE Conference on Decision and Control (CDC)10.1109/CDC49753.2023.10384167(536-541)Online publication date: 13-Dec-2023
  • (2023)Online Adversarial Stabilization of Unknown Linear Time-Varying Systems2023 62nd IEEE Conference on Decision and Control (CDC)10.1109/CDC49753.2023.10383849(8320-8327)Online publication date: 13-Dec-2023
  • (2022)Dynamic Regret Minimization for Control of Non-stationary Linear Dynamical SystemsACM SIGMETRICS Performance Evaluation Review10.1145/3547353.352264950:1(75-76)Online publication date: 7-Jul-2022

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Measurement and Analysis of Computing Systems
Proceedings of the ACM on Measurement and Analysis of Computing Systems  Volume 6, Issue 1
POMACS
March 2022
695 pages
EISSN:2476-1249
DOI:10.1145/3522731
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 February 2022
Published in POMACS Volume 6, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. dynamic regret
  2. linear quadratic regulator
  3. non-stationary learning
  4. ordinary least squares estimator

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)83
  • Downloads (Last 6 weeks)5
Reflects downloads up to 02 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Rate-Matching the Regret Lower-Bound in the Linear Quadratic Regulator with Unknown Dynamics2023 62nd IEEE Conference on Decision and Control (CDC)10.1109/CDC49753.2023.10384167(536-541)Online publication date: 13-Dec-2023
  • (2023)Online Adversarial Stabilization of Unknown Linear Time-Varying Systems2023 62nd IEEE Conference on Decision and Control (CDC)10.1109/CDC49753.2023.10383849(8320-8327)Online publication date: 13-Dec-2023
  • (2022)Dynamic Regret Minimization for Control of Non-stationary Linear Dynamical SystemsACM SIGMETRICS Performance Evaluation Review10.1145/3547353.352264950:1(75-76)Online publication date: 7-Jul-2022

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media