default search action
A. Rupam Mahmood
Person information
- affiliation: University of Alberta, Reinforcement Learning & Artificial Intelligence Lab, Edmonton, AB, Canada
- affiliation: Alberta Machine Intelligence Institute (Amii), Edmonton, AB, Canada
- affiliation: Kindred AI, Toronto, ON, Canada
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2024
- [j7]Shibhansh Dohare, J. Fernando Hernandez-Garcia, Qingfeng Lan, Parash Rahman, A. Rupam Mahmood, Richard S. Sutton:
Loss of plasticity in deep continual learning. Nat. 632(8026): 768-774 (2024) - [c26]Bram Grooten, Tristan Tomilin, Gautham Vasan, Matthew E. Taylor, A. Rupam Mahmood, Meng Fang, Mykola Pechenizkiy, Decebal Constantin Mocanu:
MaDi: Learning to Mask Distractions for Generalization in Visual Deep Reinforcement Learning. AAMAS 2024: 733-742 - [c25]Mohamed Elsayed, A. Rupam Mahmood:
Addressing Loss of Plasticity and Catastrophic Forgetting in Continual Learning. ICLR 2024 - [c24]Haque Ishfaq, Qingfeng Lan, Pan Xu, A. Rupam Mahmood, Doina Precup, Anima Anandkumar, Kamyar Azizzadenesheli:
Provable and Practical: Efficient Exploration in Reinforcement Learning via Langevin Monte Carlo. ICLR 2024 - [c23]Mohamed Elsayed, Homayoon Farrahi, Felix Dangel, A. Rupam Mahmood:
Revisiting Scalable Hessian Diagonal Approximations for Applications in Reinforcement Learning. ICML 2024 - [c22]Fengdi Che, Chenjun Xiao, Jincheng Mei, Bo Dai, Ramki Gummadi, Oscar A. Ramirez, Christopher K. Harris, A. Rupam Mahmood, Dale Schuurmans:
Target Networks and Over-parameterization Stabilize Off-policy Bootstrapping with Function Approximation. ICML 2024 - [i35]Mohamed Elsayed, A. Rupam Mahmood:
Addressing Loss of Plasticity and Catastrophic Forgetting in Continual Learning. CoRR abs/2404.00781 (2024) - [i34]Fengdi Che, Chenjun Xiao, Jincheng Mei, Bo Dai, Ramki Gummadi, Oscar A. Ramirez, Christopher K. Harris, A. Rupam Mahmood, Dale Schuurmans:
Target Networks and Over-parameterization Stabilize Off-policy Bootstrapping with Function Approximation. CoRR abs/2405.21043 (2024) - [i33]Mohamed Elsayed, Homayoon Farrahi, Felix Dangel, A. Rupam Mahmood:
Revisiting Scalable Hessian Diagonal Approximations for Applications in Reinforcement Learning. CoRR abs/2406.03276 (2024) - [i32]Haque Ishfaq, Yixin Tan, Yu Yang, Qingfeng Lan, Jianfeng Lu, A. Rupam Mahmood, Doina Precup, Pan Xu:
More Efficient Randomized Exploration for Reinforcement Learning via Approximate Sampling. CoRR abs/2406.12241 (2024) - [i31]Gautham Vasan, Yan Wang, Fahim Shahriar, James Bergstra, Martin Jägersand, A. Rupam Mahmood:
Revisiting Sparse Rewards for Goal-Reaching Reinforcement Learning. CoRR abs/2407.00324 (2024) - [i30]Mohamed Elsayed, Qingfeng Lan, Clare Lyle, A. Rupam Mahmood:
Weight Clipping for Deep Continual and Reinforcement Learning. CoRR abs/2407.01704 (2024) - 2023
- [j6]Qingfeng Lan, Yangchen Pan, Jun Luo, A. Rupam Mahmood:
Memory-efficient Reinforcement Learning with Value-based Knowledge Consolidation. Trans. Mach. Learn. Res. 2023 (2023) - [c21]Fengdi Che, Gautham Vasan, A. Rupam Mahmood:
Correcting discount-factor mismatch in on-policy policy gradient methods. ICML 2023: 4218-4240 - [c20]Yan Wang, Gautham Vasan, A. Rupam Mahmood:
Real-Time Reinforcement Learning for Vision-Based Robotics Utilizing Local and Remote Computers. ICRA 2023: 9435-9441 - [c19]Homayoon Farrahi, A. Rupam Mahmood:
Reducing the Cost of Cycle-Time Tuning for Real-World Policy Optimization. IJCNN 2023: 1-8 - [c18]Amirmohammad Karimi, Jun Jin, Jun Luo, A. Rupam Mahmood, Martin Jägersand, Samuele Tosatto:
Dynamic Decision Frequency with Continuous Options. IROS 2023: 7545-7552 - [c17]Jiamin He, Fengdi Che, Yi Wan, A. Rupam Mahmood:
Loosely consistent emphatic temporal-difference learning. UAI 2023: 849-859 - [i29]Qingfeng Lan, A. Rupam Mahmood, Shuicheng Yan, Zhongwen Xu:
Learning to Optimize for Reinforcement Learning. CoRR abs/2302.01470 (2023) - [i28]Mohamed Elsayed, A. Rupam Mahmood:
Utility-based Perturbed Gradient Descent: An Optimizer for Continual Learning. CoRR abs/2302.03281 (2023) - [i27]Homayoon Farrahi, A. Rupam Mahmood:
Reducing the Cost of Cycle-Time Tuning for Real-World Policy Optimization. CoRR abs/2305.05760 (2023) - [i26]Haque Ishfaq, Qingfeng Lan, Pan Xu, A. Rupam Mahmood, Doina Precup, Anima Anandkumar, Kamyar Azizzadenesheli:
Provable and Practical: Efficient Exploration in Reinforcement Learning via Langevin Monte Carlo. CoRR abs/2305.18246 (2023) - [i25]Fengdi Che, Gautham Vasan, A. Rupam Mahmood:
Correcting discount-factor mismatch in on-policy policy gradient methods. CoRR abs/2306.13284 (2023) - [i24]Shibhansh Dohare, J. Fernando Hernandez-Garcia, Parash Rahman, Richard S. Sutton, A. Rupam Mahmood:
Maintaining Plasticity in Deep Continual Learning. CoRR abs/2306.13812 (2023) - [i23]Qingfeng Lan, A. Rupam Mahmood:
Elephant Neural Networks: Born to Be a Continual Learner. CoRR abs/2310.01365 (2023) - [i22]Bram Grooten, Tristan Tomilin, Gautham Vasan, Matthew E. Taylor, A. Rupam Mahmood, Meng Fang, Mykola Pechenizkiy, Decebal Constantin Mocanu:
MaDi: Learning to Mask Distractions for Generalization in Visual Deep Reinforcement Learning. CoRR abs/2312.15339 (2023) - 2022
- [j5]Alan Chan, Hugo Silva, Sungsu Lim, Tadashi Kozuno, A. Rupam Mahmood, Martha White:
Greedification Operators for Policy Optimization: Investigating Forward and Reverse KL Divergences. J. Mach. Learn. Res. 23: 253:1-253:79 (2022) - [c16]Qingfeng Lan, Samuele Tosatto, Homayoon Farrahi, Rupam Mahmood:
Model-free Policy Learning with Reward Gradients. AISTATS 2022: 4217-4234 - [c15]Shivam Garg, Samuele Tosatto, Yangchen Pan, Martha White, Rupam Mahmood:
An Alternate Policy Gradient Estimator for Softmax Policies. AISTATS 2022: 6630-6689 - [c14]Samuele Tosatto, Andrew Patterson, Martha White, Rupam Mahmood:
A Temporal-Difference Approach to Policy Gradient Estimation. ICML 2022: 21609-21632 - [c13]Yufeng Yuan, A. Rupam Mahmood:
Asynchronous Reinforcement Learning for Real-Time Control of Physical Robots. ICRA 2022: 5546-5552 - [i21]Samuele Tosatto, Andrew Patterson, Martha White, A. Rupam Mahmood:
A Temporal-Difference Approach to Policy Gradient Estimation. CoRR abs/2202.02396 (2022) - [i20]Yufeng Yuan, Rupam Mahmood:
Asynchronous Reinforcement Learning for Real-Time Control of Physical Robots. CoRR abs/2203.12759 (2022) - [i19]Qingfeng Lan, Yangchen Pan, Jun Luo, A. Rupam Mahmood:
Memory-efficient Reinforcement Learning with Knowledge Consolidation. CoRR abs/2205.10868 (2022) - [i18]Yan Wang, Gautham Vasan, A. Rupam Mahmood:
Real-Time Reinforcement Learning for Vision-Based Robotics Utilizing Local and Remote Computers. CoRR abs/2210.02317 (2022) - [i17]Mohamed Elsayed, A. Rupam Mahmood:
HesScale: Scalable Computation of Hessian Diagonals. CoRR abs/2210.11639 (2022) - [i16]Amirmohammad Karimi, Jun Jin, Jun Luo, A. Rupam Mahmood, Martin Jägersand, Samuele Tosatto:
Variable-Decision Frequency Option Critic. CoRR abs/2212.04407 (2022) - 2021
- [c12]Michael Przystupa, Masood Dehghan, Martin Jägersand, A. Rupam Mahmood:
Analyzing Neural Jacobian Methods in Applications of Visual Servoing and Kinematic Control. ICRA 2021: 14276-14283 - [i15]Qingfeng Lan, A. Rupam Mahmood:
Model-free Policy Learning with Reward Gradients. CoRR abs/2103.05147 (2021) - [i14]Michael Przystupa, Masood Dehghan, Martin Jägersand, A. Rupam Mahmood:
Analyzing Neural Jacobian Methods in Applications of Visual Servoing and Kinematic Control. CoRR abs/2106.06083 (2021) - [i13]Alan Chan, Hugo Silva, Sungsu Lim, Tadashi Kozuno, A. Rupam Mahmood, Martha White:
Greedification Operators for Policy Optimization: Investigating Forward and Reverse KL Divergences. CoRR abs/2107.08285 (2021) - [i12]Shibhansh Dohare, A. Rupam Mahmood, Richard S. Sutton:
Continual Backprop: Stochastic Gradient Descent with Persistent Randomness. CoRR abs/2108.06325 (2021) - [i11]Shivam Garg, Samuele Tosatto, Yangchen Pan, Martha White, A. Rupam Mahmood:
An Alternate Policy Gradient Estimator for Softmax Policies. CoRR abs/2112.11622 (2021) - 2020
- [j4]Oliver Limoyo, Bryan Chan, Filip Maric, Brandon Wagstaff, A. Rupam Mahmood, Jonathan Kelly:
Heteroscedastic Uncertainty for Robust Generative Latent Dynamics. IEEE Robotics Autom. Lett. 5(4): 6654-6661 (2020) - [i10]Oliver Limoyo, Bryan Chan, Filip Maric, Brandon Wagstaff, A. Rupam Mahmood, Jonathan Kelly:
Heteroscedastic Uncertainty for Robust Generative Latent Dynamics. CoRR abs/2008.08157 (2020)
2010 – 2019
- 2019
- [c11]Dmytro Korenkevych, A. Rupam Mahmood, Gautham Vasan, James Bergstra:
Autoregressive Policies for Continuous Control Deep Reinforcement Learning. IJCAI 2019: 2754-2762 - [i9]Dmytro Korenkevych, A. Rupam Mahmood, Gautham Vasan, James Bergstra:
Autoregressive Policies for Continuous Control Deep Reinforcement Learning. CoRR abs/1903.11524 (2019) - 2018
- [j3]Huizhen Yu, Ashique Rupam Mahmood, Richard S. Sutton:
On Generalized Bellman Equations and Temporal-Difference Learning. J. Mach. Learn. Res. 19: 48:1-48:49 (2018) - [c10]A. Rupam Mahmood, Dmytro Korenkevych, Gautham Vasan, William Ma, James Bergstra:
Benchmarking Reinforcement Learning Algorithms on Real-World Robots. CoRL 2018: 561-591 - [c9]A. Rupam Mahmood, Dmytro Korenkevych, Brent J. Komer, James Bergstra:
Setting up a Reinforcement Learning Task with a Real-World Robot. IROS 2018: 4635-4640 - [i8]A. Rupam Mahmood, Dmytro Korenkevych, Brent J. Komer, James Bergstra:
Setting up a Reinforcement Learning Task with a Real-World Robot. CoRR abs/1803.07067 (2018) - [i7]A. Rupam Mahmood, Dmytro Korenkevych, Gautham Vasan, William Ma, James Bergstra:
Benchmarking Reinforcement Learning Algorithms on Real-World Robots. CoRR abs/1809.07731 (2018) - 2017
- [c8]Huizhen Yu, Ashique Rupam Mahmood, Richard S. Sutton:
On Generalized Bellman Equations and Temporal-Difference Learning. Canadian AI 2017: 3-14 - [i6]Ashique Rupam Mahmood, Huizhen Yu, Richard S. Sutton:
Multi-step Off-policy Learning Without Importance Sampling Ratios. CoRR abs/1702.03006 (2017) - [i5]Huizhen Yu, Ashique Rupam Mahmood, Richard S. Sutton:
On Generalized Bellman Equations and Temporal-Difference Learning. CoRR abs/1704.04463 (2017) - 2016
- [j2]Richard S. Sutton, Ashique Rupam Mahmood, Martha White:
An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning. J. Mach. Learn. Res. 17: 73:1-73:29 (2016) - [j1]Harm van Seijen, Ashique Rupam Mahmood, Patrick M. Pilarski, Marlos C. Machado, Richard S. Sutton:
True Online Temporal-Difference Learning. J. Mach. Learn. Res. 17: 145:1-145:40 (2016) - 2015
- [c7]Ashique Rupam Mahmood, Richard S. Sutton:
Off-policy learning based on weighted importance sampling with linear computational complexity. UAI 2015: 552-561 - [i4]Richard S. Sutton, Ashique Rupam Mahmood, Martha White:
An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning. CoRR abs/1503.04269 (2015) - [i3]Harm van Seijen, Ashique Rupam Mahmood, Patrick M. Pilarski, Richard S. Sutton:
An Empirical Evaluation of True Online TD(λ). CoRR abs/1507.00353 (2015) - [i2]Ashique Rupam Mahmood, Huizhen Yu, Martha White, Richard S. Sutton:
Emphatic Temporal-Difference Learning. CoRR abs/1507.01569 (2015) - [i1]Harm van Seijen, Ashique Rupam Mahmood, Patrick M. Pilarski, Marlos C. Machado, Richard S. Sutton:
True Online Temporal-Difference Learning. CoRR abs/1512.04087 (2015) - 2014
- [c6]Richard S. Sutton, Ashique Rupam Mahmood, Doina Precup, Hado van Hasselt:
A new Q(lambda) with interim forward view and Monte Carlo equivalence. ICML 2014: 568-576 - [c5]Ashique Rupam Mahmood, Hado van Hasselt, Richard S. Sutton:
Weighted importance sampling for off-policy learning with linear function approximation. NIPS 2014: 3014-3022 - [c4]Hado van Hasselt, Ashique Rupam Mahmood, Richard S. Sutton:
Off-policy TD( l) with a true online equivalence. UAI 2014: 330-339 - 2013
- [c3]Ashique Rupam Mahmood, Richard S. Sutton:
Representation Search through Generate and Test. AAAI Workshop: Learning Rich Representations from Low-Level Sensors 2013 - [c2]Ashique Rupam Mahmood, Richard S. Sutton:
Position Paper: Representation Search through Generate and Test. SARA 2013 - 2012
- [c1]Ashique Rupam Mahmood, Richard S. Sutton, Thomas Degris, Patrick M. Pilarski:
Tuning-free step-size adaptation. ICASSP 2012: 2121-2124
Coauthor Index
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from , , and to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from and to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from .
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2024-10-07 21:19 CEST by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint