Reinforcement Learning with Delayed, Composite, and Partially Anonymous Reward

Mondal, Washim Uddin; Aggarwal, Vaneet

Computer Science > Machine Learning

arXiv:2305.02527 (cs)

[Submitted on 4 May 2023 (v1), last revised 28 Aug 2023 (this version, v2)]

Title:Reinforcement Learning with Delayed, Composite, and Partially Anonymous Reward

Authors:Washim Uddin Mondal, Vaneet Aggarwal

View PDF

Abstract:We investigate an infinite-horizon average reward Markov Decision Process (MDP) with delayed, composite, and partially anonymous reward feedback. The delay and compositeness of rewards mean that rewards generated as a result of taking an action at a given state are fragmented into different components, and they are sequentially realized at delayed time instances. The partial anonymity attribute implies that a learner, for each state, only observes the aggregate of past reward components generated as a result of different actions taken at that state, but realized at the observation instance. We propose an algorithm named $\mathrm{DUCRL2}$ to obtain a near-optimal policy for this setting and show that it achieves a regret bound of $\tilde{\mathcal{O}}\left(DS\sqrt{AT} + d (SA)^3\right)$ where $S$ and $A$ are the sizes of the state and action spaces, respectively, $D$ is the diameter of the MDP, $d$ is a parameter upper bounded by the maximum reward delay, and $T$ denotes the time horizon. This demonstrates the optimality of the bound in the order of $T$, and an additive impact of the delay.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2305.02527 [cs.LG]
	(or arXiv:2305.02527v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2305.02527

Submission history

From: Washim Mondal [view email]
[v1] Thu, 4 May 2023 03:31:30 UTC (153 KB)
[v2] Mon, 28 Aug 2023 15:52:36 UTC (234 KB)

Computer Science > Machine Learning

Title:Reinforcement Learning with Delayed, Composite, and Partially Anonymous Reward

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Reinforcement Learning with Delayed, Composite, and Partially Anonymous Reward

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators