NSF Award Search: Award # 1815300 - RI: Small: Feature Encoding for Reinforcement Learning

Award Abstract # 1815300

RI: Small: Feature Encoding for Reinforcement Learning

NSF Org:	IIS Div Of Information & Intelligent Systems
Recipient:	DUKE UNIVERSITY
Initial Amendment Date:	July 26, 2018
Latest Amendment Date:	September 14, 2018
Award Number:	1815300
Award Instrument:	Continuing Grant
Program Manager:	Vladimir Pavlovic vpavlovi@nsf.gov (703)292-8318 IIS Div Of Information & Intelligent Systems CSE Direct For Computer & Info Scie & Enginr
Start Date:	August 1, 2018
End Date:	July 31, 2023 (Estimated)
Total Intended Award Amount:	$499,968.00
Total Awarded Amount to Date:	$499,968.00
Funds Obligated to Date:	FY 2018 = $499,968.00
History of Investigator:	Ronald Parr (Principal Investigator) parr@cs.duke.edu Lawrence Carin (Co-Principal Investigator)
Recipient Sponsored Research Office:	Duke University 2200 W MAIN ST DURHAM NC US 27705-4640 (919)684-3030
Sponsor Congressional District:	04
Primary Place of Performance:	Duke University NC US 27705-4677
Primary Place of Performance Congressional District:	04
Unique Entity Identifier (UEI):	TP7EK8DZV6N5
Parent UEI:
NSF Program(s):	Robust Intelligence
Primary Program Source:	01001819DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s):	7495, 7923
Program Element Code(s):	749500
Award Agency Code:	4900
Fund Agency Code:	4900
Assistance Listing Number(s):	47.070

ABSTRACT

This project focuses on the subfield of machine learning referred to as Reinforcement Learning (RL), in which algorithms or robots learn by trial and error. As with many areas of machine learning, there has been a surge of interest in "deep learning" approaches to reinforcement learning, i.e, "Deep RL." Deep learning uses computational models motivated by structures found in the brains of animals. Deep RL has enjoyed some stunning successes, including a recent advance by which a program learned to play the Asian game of Go better than the best human player. Notably, this level of performance was achieved without any human guidance. Given only the rules of the game, the program learned by playing against itself. Although games are intriguing and attention-grabbing, this feat was merely a technology demonstration. Firms are seeking to deploy Deep RL methods to increase the efficiency of their operations across a range of applications such as data center management and robotics. To realize fully the potential of Deep RL, further research is required to make the training process more predictable, reliable, and efficient. Current techniques require massive amounts of training data and computation, and subtle changes in the configuration of the system can cause huge differences in the quality of the results obtained. Thus, even though RL systems can learn autonomously by trial and error, a large amount of human intuition, experience and experimentation may be required to lay the groundwork for these systems to succeed. This proposal seeks to develop new techniques and theory to make high quality deep RL results more widely and easily obtainable. In addition, this proposal will provide opportunities for undergraduates to be involved in research through Duke's Data+ initiative.

The proposed research is partly inspired by past work on feature selection and discovery for reinforcement learning. Much of that work focused primarily on linear value function approximation. Its relevance to deep reinforcement learning is that methods such as Deep Q-learning have a linear final layer. The preceding, nonlinear layers can, therefore, be interpreted as performing feature discovery for what is ultimately a linear value function approximation process. Sufficient conditions on the features that were specified for successful linear value function approximation in earlier work can now be re-interpreted as an intermediate objective function for the penultimate layer of a deep network. The proposed research aims to achieve the following objectives: 1) Develop a theory of feature construction that explains and informs deep reinforcement learning methods, 2) develop improved approaches to value function approximation that are applicable to deep reinforcement learning, 3) develop improved approaches to policy search that are applicable to deep reinforcement learning, and 4) develop new algorithms for exploration in reinforcement learning that take advantage of learned feature representations, and 5) perform computational experiments demonstrating the efficacy of the new algorithms developed on benchmark problems.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Nemecek, Mark and Parr, Ronald "Policy Caches with Successor Features" Proceedings of Machine Learning Research , v.139 , 2021 Citation Details

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

This project focused on reinforcement learning, by which an intelligent agent interacts with an environment and learns, through experience, to optimize the long term benefit for acting in the environment. Unlike supervised learning, which treats every decision as an independent choice disconnected from future decisions, reinforcement learning must take into account that decisions have consequences that impact the choices available in the future. A solution to a reinforcement learning problem is, therefore, a policy that potentially describes a long term strategy for action in a complex environment.

Reinforcement learning has gained notoriety in recent years for is ability, when combined with deep learning (deep reinforcement learning), to solve problems such as the board game go, or video games. In addition to these attention-grabbing examples, people have become increasingly interested in using reinforcement for practical problems such as industrial automation, autonomous vehicles, and dynamic medical treatment regimes.

Despite the promising initial successes of deep reinforcement learning, there is a long way to go in understanding when an why reinforcement learning methods succeed, and how solutions to reinforcement learning problems can generalize or transfer to new problems. It is generally agreed that understanding the representation of the problem - either as presented to the learner, or as learned in the early layers of a neural network - is central to addressing these questions.

This project came to focus on a technique called successor features, which assumes that the reward or cost for operating in an environment can be expressed as a weighted combination functions called reward features. Leveraging this assumption, prior work has shown how a policy that is learned for one set of feature weights (a task) can be evaluated and used for a different set of feature weights (a new task). However, previous work provided only weak guidance on how close to optimal a policy learned for one task would be when used in a new task.

A key result of this project was a new technique for understanding how policies for old tasks could be applied to new tasks. This technique provided much clearer guidance on how close to optimal old solutions might be when they are applied to new tasks. This allows an intelligent agent to make an informed choice about whether to try to learn a new policy, or whether it is good enough to stick with an old policy.

The project also considered several extensions to this basic idea. The first extended the basic idea to continuous action spaces, building on a technique call deep radial basis value functions. A second extension combined successor features with hierarchical reinforcement learning. This combination is particularly interesting because it gives insight into how solutions can be shared not only across tasks, within subtasks of a larger, overall task.

Last Modified: 11/29/2023
Modified by: Ronald Parr

Please report errors in award information by writing to: awardsearch@nsf.gov.

Top