Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleJuly 2024
Single-trajectory distributionally robust reinforcement learning
ICML'24: Proceedings of the 41st International Conference on Machine LearningArticle No.: 1194, Pages 29644–29666To mitigate the limitation that the classical reinforcement learning (RL) framework heavily relies on identical training and test environments, Distributionally Robust RL (DRRL) has been proposed to enhance performance across a range of environments, ...
- research-articleJuly 2024
KEPC-Push: a knowledge-enhanced proactive content push strategy for edge-assisted video feed streaming
USENIX ATC'24: Proceedings of the 2024 USENIX Conference on Usenix Annual Technical ConferenceArticle No.: 19, Pages 321–338Video Feed Streaming (e.g., TikTok, Reels) is increasingly popular nowadays. Users will be scheduled to the distribution infrastructure, including content distribution network (CDN) and multi-access edge computing (MEC) nodes, to access the content. Our ...
- research-articleFebruary 2024
Learning diverse risk preferences in population-based self-play
AAAI'24/IAAI'24/EAAI'24: Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial IntelligenceArticle No.: 1440, Pages 12910–12918https://doi.org/10.1609/aaai.v38i11.29188Among the remarkable successes of Reinforcement Learning (RL), self-play algorithms have played a crucial role in solving competitive games. However, current self-play RL methods commonly optimize the agent to maximize the expected win-rates against its ...
- research-articleDecember 2023
Cross-domain policy adaptation via value-guided data filtering
NIPS '23: Proceedings of the 37th International Conference on Neural Information Processing SystemsArticle No.: 3210, Pages 73395–73421Generalizing policies across different domains with dynamics mismatch poses a significant challenge in reinforcement learning. For example, a robot learns the policy in a simulator, but when it is deployed in the real world, the dynamics of the ...
- research-articleAugust 2023
Mean-semivariance policy optimization via risk-averse reinforcement learning (extended abstract)
IJCAI '23: Proceedings of the Thirty-Second International Joint Conference on Artificial IntelligenceArticle No.: 784, Pages 6925–6930https://doi.org/10.24963/ijcai.2023/784Keeping risk under control is often more crucial than maximizing expected rewards in real-world decision-making situations, such as finance, robotics, autonomous driving, etc. The most natural choice of risk measures is variance, while it penalizes the ...
- research-articleJuly 2023
What is essential for unseen goal generalization of offline goal-conditioned RL?
ICML'23: Proceedings of the 40th International Conference on Machine LearningArticle No.: 1650, Pages 39543–39571Offline goal-conditioned RL (GCRL) offers a way to train general-purpose agents from fully offline datasets. In addition to being conservative within the dataset, the generalization ability to achieve unseen goals is another fundamental challenge for ...
- articleDecember 2022
Mean-Semivariance Policy Optimization via Risk-Averse Reinforcement Learning
Keeping risk under control is often more crucial than maximizing expected reward in real-world decision-making situations, such as finance, robotics, autonomous driving, etc. The most natural choice of risk measures is variance, while it penalizes the ...
- research-articleNovember 2022
Exploit reward shifting in value-based deep-RL: optimistic curiosity-based exploration and conservative exploitation via linear reward shaping
NIPS '22: Proceedings of the 36th International Conference on Neural Information Processing SystemsArticle No.: 2734, Pages 37719–37734In this work, we study the simple yet universally applicable case of reward shaping in value-based Deep Reinforcement Learning (DRL). We show that reward shifting in the form of a linear transformation is equivalent to changing the initialization of the ...
- research-articleNovember 2022
RORL: robust offline reinforcement learning via conservative smoothing
NIPS '22: Proceedings of the 36th International Conference on Neural Information Processing SystemsArticle No.: 1732, Pages 23851–23866Offline reinforcement learning (RL) provides a promising direction to exploit massive amount of offline data for complex decision-making tasks. Due to the distribution shift issue, current offline RL algorithms are generally designed to be conservative ...
- research-articleNovember 2022
Mildly conservative Q-learning for offline reinforcement learning
NIPS '22: Proceedings of the 36th International Conference on Neural Information Processing SystemsArticle No.: 125, Pages 1711–1724Offline reinforcement learning (RL) defines the task of learning from a static logged dataset without continually interacting with the environment. The distribution shift between the learned policy and the behavior policy makes it necessary for the value ...
- research-articleJune 2022
Learning-Based Joint QoE Optimization for Adaptive Video Streaming Based on Smart Edge
IEEE Transactions on Network and Service Management (ITNSM), Volume 19, Issue 2Pages 1789–1806https://doi.org/10.1109/TNSM.2022.3145619The latest increase in HTTP-based adaptive video streaming over the Internet enables a growing number of clients to compete for a shared bottleneck bandwidth. This competition may affect users’ Quality of Experience (QoE) negatively, especially in ...
- research-articleApril 2022
MagNet: Cooperative Edge Caching by Automatic Content Congregating
WWW '22: Proceedings of the ACM Web Conference 2022Pages 3280–3288https://doi.org/10.1145/3485447.3512146Nowadays, the surge of Internet contents and the need for high Quality of Experience (QoE) put the backbone network under unprecedented pressure. The emerging edge caching solutions help ease the pressure by caching contents closer to users. However, ...
- research-articleApril 2022
Knowledge-based Temporal Fusion Network for Interpretable Online Video Popularity Prediction
WWW '22: Proceedings of the ACM Web Conference 2022Pages 2879–2887https://doi.org/10.1145/3485447.3511934Predicting the popularity of online videos has many real-world applications, such as recommendation, precise advertising, and edge caching strategies. Despite many efforts have been dedicated to the online video popularity prediction, there still exist ...
- research-articleDecember 2021
Believe what you see: implicit constraint approach for offline multi-agent reinforcement learning
NIPS '21: Proceedings of the 35th International Conference on Neural Information Processing SystemsArticle No.: 788, Pages 10299–10312Learning from datasets without interaction with environments (Offline Learning) is an essential step to apply Reinforcement Learning (RL) algorithms in real-world scenarios. However, compared with the single-agent counterpart, offline multiagent RL ...
- research-articleMay 2021
Modeling the Interaction between Agents in Cooperative Multi-Agent Reinforcement Learning
AAMAS '21: Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent SystemsPages 853–861Value-based methods of multi-agent reinforcement learning (MARL), especially the value decomposition methods, have been demonstrated on a range of challenging cooperative tasks. However, current methods pay little attention to the interaction between ...
- research-articleJune 2019
Steward: smart edge based joint QoE optimization for adaptive video streaming
NOSSDAV '19: Proceedings of the 29th ACM Workshop on Network and Operating Systems Support for Digital Audio and VideoPages 31–36https://doi.org/10.1145/3304112.3325603With the increase of HTTP-based adaptive video streaming over the Internet, multiple clients may compete for a shared bottleneck bandwidth, which brings some damage to the fairness and stability of Quality of Experience (QoE). This paper presents Steward,...