research-article

Dynamic Regret Minimization for Control of Non-stationary Linear Dynamical Systems

Authors:

Mladen KolarAuthors Info & Claims

Proceedings of the ACM on Measurement and Analysis of Computing Systems, Volume 6, Issue 1

Article No.: 9, Pages 1 - 72

https://doi.org/10.1145/3508029

Published: 28 February 2022 Publication History

Abstract

We consider the problem of controlling a Linear Quadratic Regulator (LQR) system over a finite horizon T with fixed and known cost matrices Q,R, but unknown and non-stationary dynamics A_t, B_t. The sequence of dynamics matrices can be arbitrary, but with a total variation, V_T, assumed to be o(T) and unknown to the controller. Under the assumption that a sequence of stabilizing, but potentially sub-optimal controllers is available for all t, we present an algorithm that achieves the optimal dynamic regret of O(V_T^2/5 T^3/5 ). With piecewise constant dynamics, our algorithm achieves the optimal regret of O(sqrtST ) where S is the number of switches. The crux of our algorithm is an adaptive non-stationarity detection strategy, which builds on an approach recently developed for contextual Multi-armed Bandit problems. We also argue that non-adaptive forgetting (e.g., restarting or using sliding window learning with a static window size) may not be regret optimal for the LQR problem, even when the window size is optimally tuned with the knowledge of $V_T$. The main technical challenge in the analysis of our algorithm is to prove that the ordinary least squares (OLS) estimator has a small bias when the parameter to be estimated is non-stationary. Our analysis also highlights that the key motif driving the regret is that the LQR problem is in spirit a bandit problem with linear feedback and locally quadratic cost. This motif is more universal than the LQR problem itself, and therefore we believe our results should find wider application.

References

[1]

Yasin Abbasi-Yadkori, Dávid Pál, and Csaba Szepesvári. 2011. Improved algorithms for linear stochastic bandits. In Advances in Neural Information Processing Systems. 2312--2320.

[2]

Yasin Abbasi-Yadkori and Csaba Szepesvári. 2011. Regret bounds for the adaptive control of linear quadratic systems. In Conference on Learning Theory. 1--26.

[3]

Michael Athans. 1971. The role and use of the stochastic linear-quadratic-Gaussian problem in control system design. IEEE transactions on automatic control, Vol. 16, 6 (1971), 529--552.

[4]

Dimitri Bertsekas. 2012. Dynamic programming and optimal control: Volume I. Vol. 1. Athena scientific.

Digital Library

[5]

Omar Besbes, Yonatan Gur, and Assaf Zeevi. 2014. Stochastic multi-armed-bandit problem with non-stationary rewards. Advances in neural information processing systems, Vol. 27 (2014), 199--207.

[6]

Nicholas M Boffi, Stephen Tu, and Jean-Jacques E Slotine. 2021. Regret bounds for adaptive nonlinear control. In Learning for Dynamics and Control. PMLR, 471--483.

[7]

Asaf Cassel, Alon Cohen, and Tomer Koren. 2020. Logarithmic regret for learning linear quadratic regulators efficiently. In International Conference on Machine Learning. PMLR, 1328--1337.

[8]

Yifang Chen, Chung-Wei Lee, Haipeng Luo, and Chen-Yu Wei. 2019. A New Algorithm for Non-stationary Contextual Bandits: Efficient, Optimal and Parameter-free. In COLT. 696--726. http://proceedings.mlr.press/v99/chen19b.html

[9]

Wang Chi Cheung, David Simchi-Levi, and Ruihao Zhu. 2019 a. Learning to Optimize under Non-Stationarity. In Proceedings of Machine Learning Research (Proceedings of Machine Learning Research, Vol. 89), Kamalika Chaudhuri and Masashi Sugiyama (Eds.). PMLR, 1079--1087. http://proceedings.mlr.press/v89/cheung19b.html

[10]

Wang Chi Cheung, David Simchi-Levi, and Ruihao Zhu. 2019 b. Non-stationary reinforcement learning: The blessing of (more) optimism. Available at SSRN 3397818 (2019).

[11]

Gregory C Chow. 1976. Control methods for macroeconomic policy analysis. The American Economic Review, Vol. 66, 2 (1976), 340--345.

[12]

Alon Cohen, Avinatan Hassidim, Tomer Koren, Nevena Lazic, Yishay Mansour, and Kunal Talwar. 2018. Online linear quadratic control. arXiv preprint arXiv:1806.07104 (2018).

[13]

Alon Cohen, Tomer Koren, and Yishay Mansour. 2019. Learning Linear-Quadratic Regulators Efficiently with only $sqrtT$ Regret. arxiv: 1902.06223 [cs.LG]

[14]

Amit Daniely, Alon Gonen, and Shai Shalev-Shwartz. 2015. Strongly adaptive online learning. In International Conference on Machine Learning . PMLR, 1405--1411.

[15]

Sarah Dean, Horia Mania, Nikolai Matni, Benjamin Recht, and Stephen Tu. 2018. Regret Bounds for Robust Adaptive Control of the Linear Quadratic Regulator. In Proceedings of the 32nd International Conference on Neural Information Processing Systems (Montréal, Canada) (NIPS'18). Curran Associates Inc., Red Hook, NY, USA, 4192--4201.

Digital Library

[16]

Mohamad Kazem Shirani Faradonbeh, Ambuj Tewari, and George Michailidis. 2018. Finite-time adaptive stabilization of linear systems. IEEE Trans. Automat. Control, Vol. 64, 8 (2018), 3498--3505.

[17]

Mohamad Kazem Shirani Faradonbeh, Ambuj Tewari, and George Michailidis. 2020. Input perturbations for adaptive control and learning. Automatica, Vol. 117 (2020), 108950.

[18]

P.M. Gahinet, A.J. Laub, C.S. Kenney, and G.A. Hewer. 1990. Sensitivity of the stable discrete-time Lyapunov equation. IEEE Trans. Automat. Control, Vol. 35, 11 (1990), 1209--1217. https://doi.org/10.1109/9.59806

[19]

Pratik Gajane, Ronald Ortner, and Peter Auer. 2018. A sliding-window algorithm for Markov decision processes with arbitrarily changing rewards and transitions. arXiv preprint arXiv:1805.10066 (2018).

[20]

Aurélien Garivier and Eric Moulines. 2011. On upper-confidence bound policies for switching bandit problems. In International Conference on Algorithmic Learning Theory. Springer, 174--188.

Digital Library

[21]

Gautam Goel and Babak Hassibi. 2021. Regret-optimal Estimation and Contro. arXiv preprint arXiv:2106.12097 (2021).

[22]

Paula Gradu, Elad Hazan, and Edgar Minasyan. 2020. Adaptive regret for control of time-varying dynamics. arXiv preprint arXiv:2007.04393 (2020).

[23]

Bruce Hajek. 1982. Hitting-time and occupation-time bounds implied by drift analysis with applications. Advances in Applied probability (1982), 502--525.

[24]

Elad Hazan, Sham Kakade, and Karan Singh. 2020. The nonstochastic control problem. In Algorithmic Learning Theory. PMLR, 408--421.

[25]

Elad Hazan and Comandur Seshadhri. 2009. Efficient learning algorithms for changing environments. In Proceedings of the 26th annual international conference on machine learning. 393--400.

Digital Library

[26]

Mark Herbster and Manfred K Warmuth. 1998. Tracking the best expert. Machine learning, Vol. 32, 2 (1998), 151--178.

[27]

Morteza Ibrahimi, Adel Javanmard, and Benjamin V Roy. 2012. Efficient reinforcement learning for high dimensional linear quadratic systems. In Advances in Neural Information Processing Systems. 2636--2644.

[28]

Yassir Jedra and Alexandre Proutiere. 2021. Minimal Expected Regret in Linear Quadratic Control. arxiv: 2109.14429 [cs.LG]

[29]

Beatrice Laurent and Pascal Massart. 2000. Adaptive estimation of a quadratic functional by model selection. Annals of Statistics (2000), 1302--1338.

[30]

Sergey Levine, Chelsea Finn, Trevor Darrell, and Pieter Abbeel. 2016. End-to-end training of deep visuomotor policies. The Journal of Machine Learning Research, Vol. 17, 1 (2016), 1334--1373.

Digital Library

[31]

Horia Mania, Stephen Tu, and Benjamin Recht. 2019. Certainty equivalence is efficient for linear quadratic control. arXiv preprint arXiv:1902.07826 (2019).

[32]

Prasad A Naik. 2014. Marketing dynamics: A primer on estimation and control. Foundations and Trends in Marketing, Vol. 9, 3 (2014), 175--266.

[33]

Ronald Ortner, Pratik Gajane, and Peter Auer. 2020. Variational regret bounds for reinforcement learning. In Uncertainty in Artificial Intelligence. PMLR, 81--90.

[34]

Yoan Russac, Claire Vernade, and Olivier Cappé. 2019. Weighted Linear Bandits for Non-Stationary Environments. In Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. dtextquotesingle Alché-Buc, E. Fox, and R. Garnett (Eds.), Vol. 32. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2019/file/263fc48aae39f219b4c71d9d4bb4aed2-Paper.pdf

[35]

Max Simchowitz and Dylan Foster. 2020. Naive exploration is optimal for online LQR. In International Conference on Machine Learning . PMLR, 8937--8948.

[36]

Max Simchowitz, Karan Singh, and Elad Hazan. 2020. Improper learning for non-stochastic control. In Conference on Learning Theory. PMLR, 3320--3436.

[37]

Russ Tedrake. 2009. Underactuated robotics: Learning, planning, and control for efficient and agile machines course notes for MIT 6.832. Working draft edition, Vol. 3 (2009).

[38]

Roman Vershynin. 2010. Introduction to the non-asymptotic analysis of random matrices. arXiv preprint arXiv:1011.3027 (2010).

[39]

Chen-Yu Wei and Haipeng Luo. 2021. Non-stationary Reinforcement Learning without Prior Knowledge: An Optimal Black-box Approach. arXiv preprint arXiv:2102.05406 (2021).

[40]

Jia Yuan Yu, Shie Mannor, and Nahum Shimkin. 2009. Markov decision processes with arbitrary reward processes. Mathematics of Operations Research, Vol. 34, 3 (2009), 737--757.

Digital Library

[41]

Peng Zhao and Lijun Zhang. 2021. Non-stationary linear bandits revisited. arXiv preprint arXiv:2103.05324 (2021).

[42]

Martin Zinkevich. 2003. Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the 20th international conference on machine learning (icml-03) . 928--936.

Digital Library

Cited By

Wang FJanson L(2023)Rate-Matching the Regret Lower-Bound in the Linear Quadratic Regulator with Unknown Dynamics2023 62nd IEEE Conference on Decision and Control (CDC)10.1109/CDC49753.2023.10384167(536-541)Online publication date: 13-Dec-2023
https://doi.org/10.1109/CDC49753.2023.10384167
Yu JGupta VWierman A(2023)Online Adversarial Stabilization of Unknown Linear Time-Varying Systems2023 62nd IEEE Conference on Decision and Control (CDC)10.1109/CDC49753.2023.10383849(8320-8327)Online publication date: 13-Dec-2023
https://doi.org/10.1109/CDC49753.2023.10383849
Luo YGupta VKolar M(2022)Dynamic Regret Minimization for Control of Non-stationary Linear Dynamical SystemsACM SIGMETRICS Performance Evaluation Review10.1145/3547353.352264950:1(75-76)Online publication date: 7-Jul-2022
https://dl.acm.org/doi/10.1145/3547353.3522649

Index Terms

Dynamic Regret Minimization for Control of Non-stationary Linear Dynamical Systems
1. Mathematics of computing
  1. Probability and statistics
    1. Stochastic processes
2. Theory of computation
  1. Design and analysis of algorithms
    1. Mathematical optimization
      1. Continuous optimization
        Stochastic control and optimization
    2. Online algorithms
      1. Online learning algorithms

Recommendations

Dynamic Regret Minimization for Control of Non-stationary Linear Dynamical Systems
SIGMETRICS '22

We consider the problem of controlling a Linear Quadratic Regulator (LQR) system over a finite horizon T with fixed and known cost matrices Q,R, but unknown and non-stationary dynamics At, Bt. The sequence of dynamics matrices can be arbitrary, but with ...
Dynamic Regret Minimization for Control of Non-stationary Linear Dynamical Systems
SIGMETRICS/PERFORMANCE '22: Abstract Proceedings of the 2022 ACM SIGMETRICS/IFIP PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems

We consider the problem of controlling a Linear Quadratic Regulator (LQR) system over a finite horizon T with fixed and known cost matrices Q,R, but unknown and non-stationary dynamics At, Bt. The sequence of dynamics matrices can be arbitrary, but with ...
A new look at dynamic regret for non-stationary stochastic bandits

We study the non-stationary stochastic multi-armed bandit problem, where the reward statistics of each arm may change several times during the course of learning. The performance of a learning algorithm is evaluated in terms of its dynamic regret, which ...

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Measurement and Analysis of Computing Systems

Proceedings of the ACM on Measurement and Analysis of Computing Systems Volume 6, Issue 1

POMACS

March 2022

695 pages

EISSN:2476-1249

DOI:10.1145/3522731

Editors:
Augustin Chaintreau
Columbia University
,
Leana Golubchik
University of Southern California
,
Zhi-Li Zhang
University of Minnesota

Issue’s Table of Contents

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 February 2022

Published in POMACS Volume 6, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
245
Total Downloads

Downloads (Last 12 months)83
Downloads (Last 6 weeks)5

Reflects downloads up to 02 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Wang FJanson L(2023)Rate-Matching the Regret Lower-Bound in the Linear Quadratic Regulator with Unknown Dynamics2023 62nd IEEE Conference on Decision and Control (CDC)10.1109/CDC49753.2023.10384167(536-541)Online publication date: 13-Dec-2023
https://doi.org/10.1109/CDC49753.2023.10384167
Yu JGupta VWierman A(2023)Online Adversarial Stabilization of Unknown Linear Time-Varying Systems2023 62nd IEEE Conference on Decision and Control (CDC)10.1109/CDC49753.2023.10383849(8320-8327)Online publication date: 13-Dec-2023
https://doi.org/10.1109/CDC49753.2023.10383849
Luo YGupta VKolar M(2022)Dynamic Regret Minimization for Control of Non-stationary Linear Dynamical SystemsACM SIGMETRICS Performance Evaluation Review10.1145/3547353.352264950:1(75-76)Online publication date: 7-Jul-2022
https://dl.acm.org/doi/10.1145/3547353.3522649

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents