default search action
Bernardo Ávila Pires
Person information
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2024
- [c14]Daniele Calandriello, Zhaohan Daniel Guo, Rémi Munos, Mark Rowland, Yunhao Tang, Bernardo Ávila Pires, Pierre Harvey Richemond, Charline Le Lan, Michal Valko, Tianqi Liu, Rishabh Joshi, Zeyu Zheng, Bilal Piot:
Human Alignment of Large Language Models through Online Preference Optimisation. ICML 2024 - [c13]Yunhao Tang, Zhaohan Daniel Guo, Zeyu Zheng, Daniele Calandriello, Rémi Munos, Mark Rowland, Pierre Harvey Richemond, Michal Valko, Bernardo Ávila Pires, Bilal Piot:
Generalized Preference Optimization: A Unified Approach to Offline Alignment. ICML 2024 - [i20]Yunhao Tang, Zhaohan Daniel Guo, Zeyu Zheng, Daniele Calandriello, Rémi Munos, Mark Rowland, Pierre Harvey Richemond, Michal Valko, Bernardo Ávila Pires, Bilal Piot:
Generalized Preference Optimization: A Unified Approach to Offline Alignment. CoRR abs/2402.05749 (2024) - [i19]Yunhao Tang, Mark Rowland, Rémi Munos, Bernardo Ávila Pires, Will Dabney:
Off-policy Distributional Q(λ): Distributional RL without Importance Sampling. CoRR abs/2402.05766 (2024) - [i18]Daniele Calandriello, Daniel Guo, Rémi Munos, Mark Rowland, Yunhao Tang, Bernardo Ávila Pires, Pierre Harvey Richemond, Charline Le Lan, Michal Valko, Tianqi Liu, Rishabh Joshi, Zeyu Zheng, Bilal Piot:
Human Alignment of Large Language Models through Online Preference Optimisation. CoRR abs/2403.08635 (2024) - [i17]Yunhao Tang, Zhaohan Daniel Guo, Zeyu Zheng, Daniele Calandriello, Yuan Cao, Eugene Tarassov, Rémi Munos, Bernardo Ávila Pires, Michal Valko, Yong Cheng, Will Dabney:
Understanding the performance gap between online and offline alignment algorithms. CoRR abs/2405.08448 (2024) - [i16]Pierre Harvey Richemond, Yunhao Tang, Daniel Guo, Daniele Calandriello, Mohammad Gheshlaghi Azar, Rafael Rafailov, Bernardo Ávila Pires, Eugene Tarassov, Lucas Spangher, Will Ellsworth, Aliaksei Severyn, Jonathan Mallinson, Lior Shani, Gil Shamir, Rishabh Joshi, Tianqi Liu, Rémi Munos, Bilal Piot:
Offline Regularised Reinforcement Learning for Large Language Models Alignment. CoRR abs/2405.19107 (2024) - [i15]Khimya Khetarpal, Zhaohan Daniel Guo, Bernardo Ávila Pires, Yunhao Tang, Clare Lyle, Mark Rowland, Nicolas Heess, Diana Borsa, Arthur Guez, Will Dabney:
A Unifying Framework for Action-Conditional Self-Predictive Reinforcement Learning. CoRR abs/2406.02035 (2024) - 2023
- [c12]Clare Lyle, Zeyu Zheng, Evgenii Nikishin, Bernardo Ávila Pires, Razvan Pascanu, Will Dabney:
Understanding Plasticity in Neural Networks. ICML 2023: 23190-23211 - [c11]Yunhao Tang, Zhaohan Daniel Guo, Pierre Harvey Richemond, Bernardo Ávila Pires, Yash Chandak, Rémi Munos, Mark Rowland, Mohammad Gheshlaghi Azar, Charline Le Lan, Clare Lyle, András György, Shantanu Thakoor, Will Dabney, Bilal Piot, Daniele Calandriello, Michal Valko:
Understanding Self-Predictive Learning for Reinforcement Learning. ICML 2023: 33632-33656 - [c10]Yunhao Tang, Tadashi Kozuno, Mark Rowland, Anna Harutyunyan, Rémi Munos, Bernardo Ávila Pires, Michal Valko:
DoMo-AC: Doubly Multi-step Off-policy Actor-Critic Algorithm. ICML 2023: 33657-33673 - [i14]Bernardo Ávila Pires, Feryal M. P. Behbahani, Hubert Soyer, Kyriacos Nikiforou, Thomas Keck, Satinder Singh:
Hierarchical Reinforcement Learning in Complex 3D Environments. CoRR abs/2302.14451 (2023) - [i13]Clare Lyle, Zeyu Zheng, Evgenii Nikishin, Bernardo Ávila Pires, Razvan Pascanu, Will Dabney:
Understanding plasticity in neural networks. CoRR abs/2303.01486 (2023) - [i12]Yunhao Tang, Tadashi Kozuno, Mark Rowland, Anna Harutyunyan, Rémi Munos, Bernardo Ávila Pires, Michal Valko:
DoMo-AC: Doubly Multi-step Off-policy Actor-Critic Algorithm. CoRR abs/2305.18501 (2023) - 2022
- [c9]Zhaohan Guo, Shantanu Thakoor, Miruna Pislar, Bernardo Ávila Pires, Florent Altché, Corentin Tallec, Alaa Saade, Daniele Calandriello, Jean-Bastien Grill, Yunhao Tang, Michal Valko, Rémi Munos, Mohammad Gheshlaghi Azar, Bilal Piot:
BYOL-Explore: Exploration by Bootstrapped Prediction. NeurIPS 2022 - [c8]Yunhao Tang, Rémi Munos, Mark Rowland, Bernardo Ávila Pires, Will Dabney, Marc G. Bellemare:
The Nature of Temporal Difference Errors in Multi-step Distributional Reinforcement Learning. NeurIPS 2022 - [i11]Zhaohan Daniel Guo, Shantanu Thakoor, Miruna Pislar, Bernardo Ávila Pires, Florent Altché, Corentin Tallec, Alaa Saade, Daniele Calandriello, Jean-Bastien Grill, Yunhao Tang, Michal Valko, Rémi Munos, Mohammad Gheshlaghi Azar, Bilal Piot:
BYOL-Explore: Exploration by Bootstrapped Prediction. CoRR abs/2206.08332 (2022) - [i10]Yunhao Tang, Mark Rowland, Rémi Munos, Bernardo Ávila Pires, Will Dabney, Marc G. Bellemare:
The Nature of Temporal Difference Errors in Multi-step Distributional Reinforcement Learning. CoRR abs/2207.07570 (2022) - [i9]Yunhao Tang, Zhaohan Daniel Guo, Pierre Harvey Richemond, Bernardo Ávila Pires, Yash Chandak, Rémi Munos, Mark Rowland, Mohammad Gheshlaghi Azar, Charline Le Lan, Clare Lyle, András György, Shantanu Thakoor, Will Dabney, Bilal Piot, Daniele Calandriello, Michal Valko:
Understanding Self-Predictive Learning for Reinforcement Learning. CoRR abs/2212.03319 (2022) - 2021
- [i8]Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, Alaa Saade, Shantanu Thakoor, Bilal Piot, Bernardo Ávila Pires, Michal Valko, Thomas Mesnard, Tor Lattimore, Rémi Munos:
Geometric Entropic Exploration. CoRR abs/2101.02055 (2021) - [i7]Pol Moreno, Edward Hughes, Kevin R. McKee, Bernardo Ávila Pires, Théophane Weber:
Neural Recursive Belief States in Multi-Agent Reinforcement Learning. CoRR abs/2102.02274 (2021) - 2020
- [c7]Zhaohan Daniel Guo, Bernardo Ávila Pires, Bilal Piot, Jean-Bastien Grill, Florent Altché, Rémi Munos, Mohammad Gheshlaghi Azar:
Bootstrap Latent-Predictive Representations for Multitask Reinforcement Learning. ICML 2020: 3875-3886 - [c6]Jean-Bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre H. Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Ávila Pires, Zhaohan Guo, Mohammad Gheshlaghi Azar, Bilal Piot, Koray Kavukcuoglu, Rémi Munos, Michal Valko:
Bootstrap Your Own Latent - A New Approach to Self-Supervised Learning. NeurIPS 2020 - [i6]Zhaohan Daniel Guo, Bernardo Ávila Pires, Bilal Piot, Jean-Bastien Grill, Florent Altché, Rémi Munos, Mohammad Gheshlaghi Azar:
Bootstrap Latent-Predictive Representations for Multitask Reinforcement Learning. CoRR abs/2004.14646 (2020) - [i5]Jean-Bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre H. Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Ávila Pires, Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, Bilal Piot, Koray Kavukcuoglu, Rémi Munos, Michal Valko:
Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning. CoRR abs/2006.07733 (2020)
2010 – 2019
- 2019
- [i4]Mohammad Gheshlaghi Azar, Bilal Piot, Bernardo A. Pires, Jean-Bastien Grill, Florent Altché, Rémi Munos:
World Discovery Models. CoRR abs/1902.07685 (2019) - 2018
- [i3]Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, Bilal Piot, Bernardo A. Pires, Toby Pohlen, Rémi Munos:
Neural Predictive Belief Representations. CoRR abs/1811.06407 (2018) - 2016
- [c5]Bernardo Ávila Pires:
Policy Error Bounds for Model-Based Reinforcement Learning with Factored Linear Models. COLT 2016: 121-151 - [i2]Bernardo Ávila Pires, Csaba Szepesvári:
Policy Error Bounds for Model-Based Reinforcement Learning with Factored Linear Models. CoRR abs/1602.06346 (2016) - [i1]Bernardo Ávila Pires, Csaba Szepesvári:
Multiclass Classification Calibration Functions. CoRR abs/1609.06385 (2016) - 2015
- [c4]Bernardo Ávila Pires, Csaba Szepesvári:
Pathological Effects of Variance on Classification-Based Policy Iteration. AAAI Workshop: Learning for General Competency in Video Games 2015 - 2014
- [c3]Hengshuai Yao, Csaba Szepesvári, Bernardo Ávila Pires, Xinhua Zhang:
Pseudo-MDPs and factored linear action models. ADPRL 2014: 1-9 - 2013
- [c2]Bernardo Ávila Pires, Csaba Szepesvári, Mohammad Ghavamzadeh:
Cost-sensitive Multiclass Classification Risk Bounds. ICML (3) 2013: 1391-1399 - 2012
- [c1]Bernardo Ávila Pires, Csaba Szepesvári:
Statistical linear estimation with penalized estimators: an application to reinforcement learning. ICML 2012
Coauthor Index
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from , , and to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from and to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from .
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2024-10-04 20:02 CEST by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint