Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

"Deterministic MDPs with Adversarial Rewards and Bandit Feedback."

Raman Arora, Ofer Dekel, Ambuj Tewari (2012)

Details and statistics

DOI:

access: closed

type: Conference or Workshop Paper

metadata version: 2020-11-23