The Fallacy of Minimizing Cumulative Regret in the Sequential Task Setting

Xu, Ziping; Zhang, Kelly W.; Murphy, Susan A.

Statistics > Machine Learning

arXiv:2403.10946 (stat)

[Submitted on 16 Mar 2024 (v1), last revised 24 Oct 2024 (this version, v2)]

Title:The Fallacy of Minimizing Cumulative Regret in the Sequential Task Setting

Authors:Ziping Xu, Kelly W. Zhang, Susan A. Murphy

View PDF HTML (experimental)

Abstract:Online Reinforcement Learning (RL) is typically framed as the process of minimizing cumulative regret (CR) through interactions with an unknown environment. However, real-world RL applications usually involve a sequence of tasks, and the data collected in the first task is used to warm-start the second task. The performance of the warm-start policy is measured by simple regret (SR). While minimizing both CR and SR is generally a conflicting objective, previous research has shown that in stationary environments, both can be optimized in terms of the duration of the task, $T$.
In practice, however, in real-world applications, human-in-the-loop decisions between tasks often results in non-stationarity. For instance, in clinical trials, scientists may adjust target health outcomes between implementations. Our results show that task non-stationarity leads to a more restrictive trade-off between CR and SR. To balance these competing goals, the algorithm must explore excessively, leading to a CR bound worse than the typical optimal rate of $T^{1/2}$. These findings are practically significant, indicating that increased exploration is necessary in non-stationary environments to accommodate task changes, impacting the design of RL algorithms in fields such as healthcare and beyond.

Subjects:	Machine Learning (stat.ML); Machine Learning (cs.LG)
Cite as:	arXiv:2403.10946 [stat.ML]
	(or arXiv:2403.10946v2 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.2403.10946

Submission history

From: Ziping Xu [view email]
[v1] Sat, 16 Mar 2024 15:29:22 UTC (685 KB)
[v2] Thu, 24 Oct 2024 20:04:43 UTC (1,390 KB)

Statistics > Machine Learning

Title:The Fallacy of Minimizing Cumulative Regret in the Sequential Task Setting

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:The Fallacy of Minimizing Cumulative Regret in the Sequential Task Setting

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators