research-article

Faster program adaptation through reward attribution inference

Authors:

Jervis PintoAuthors Info & Claims

ACM SIGPLAN Notices, Volume 48, Issue 3

Pages 103 - 111

https://doi.org/10.1145/2480361.2371417

Published: 26 September 2012 Publication History

Abstract

In the adaptation-based programming (ABP) paradigm, programs may contain variable parts (function calls, parameter values, etc.) that can be take a number of different values. Programs also contain reward statements with which a programmer can provide feedback about how well a program is performing with respect to achieving its goals (for example, achieving a high score on some scale). By repeatedly running the program, a machine learning component will, guided by the rewards, gradually adjust the automatic choices made in the variable program parts so that they converge toward an optimal strategy.

ABP is a method for semi-automatic program generation in which the choices and rewards offered by programmers allow standard machine-learning techniques to explore a design space defined by the programmer to find an optimal instance of a program template. ABP effectively provides a DSL that allows non-machine-learning experts to exploit machine learning to generate self-optimizing programs.

Unfortunately, in many cases the placement and structuring of choices and rewards can have a detrimental effect on how an optimal solution to a program-generation problem can be found. To address this problem, we have developed a dataflow analysis that computes influence tracks of choices and rewards. This information can be exploited by an augmented machine-learning technique to ignore misleading rewards and to generally attribute rewards better to the choices that have actually influenced them. Moreover, this technique allows us to detect errors in the adaptive program that might arise out of program maintenance. Our evaluation shows that the dataflow analysis can lead to improvements in performance.

References

[1]

P. Abbeel and A. Y. Ng. Apprenticeship learning via inverse reinforcement learning. In International Conference on Machine Learning, pages 1--8, 2004.

Digital Library

[2]

F. Agakov, E. Bonilla, J. Cavazos, B. Franke, G. Fursin, M. F. P. O'Boyle, J. Thomson, M. Toussaint, and C. K. I. Williams. Using machine learning to focus iterative optimization. In Proceedings of the International Symposium on Code Generation and Optimization, CGO '06, pages 295--305, Washington, DC, USA, 2006. IEEE Computer Society.

Digital Library

[3]

A. V. Aho, M. S. Lam, R. Sethi, and J. D. Ullman. Compilers: Principles, Techniques, and Tools (2nd Edition). Addison Wesley, 2nd edition, 2006.

Digital Library

[4]

D. Andre and S. Russell. State abstraction for programmable reinforcement learning agents. In Eighteenth National Conference on Artificial Intelligence, pages 119--125, 2002.

Digital Library

[5]

David Andre. Programmabler Reinforcement Learning Agents. PhD thesis, University of California at Berkeley, 2003.

Digital Library

[6]

T. Bauer, M. Erwig, A. Fern, and J. Pinto. Adaptation-Based Program Generation in Java. In ACM SIGPLAN Workshop on Partial Evaluation and Program Manipulation, pages 81--90, 2011.

Digital Library

[7]

T. Bauer, M. Erwig, A. Fern, and J. Pinto. Adaptation-Based Programming in Haskell. In IFIP Working Conference on Domain-Specific Languages, pages 1--23, 2011.

[8]

S. Bhat, C. L. Isbell, and M. Mateas. On the difficulty of modular reinforcement learning for real-world partial programming. In Proceedings of the 21st national conference on Artificial intelligence - Volume 1, AAAI'06, pages 318--323. AAAI Press, 2006.

Digital Library

[9]

R. Maclin, J. Shavlik, L. Torrey, T. Walker, and E. Wild. Giving advice about preferred actions to reinforcement learners via knowledge-based kernel regression. In Proceedings of the Twentieth National Conference on Artificial Intelligence, pages 819--824, 2005.

Digital Library

[10]

B. Marthi. Concurrent hierarchical reinforcement learning. In Proceedings of the 20th national conference on Artificial intelligence - Volume 4, AAAI'05, pages 1652--1653. AAAI Press, 2005.

Digital Library

[11]

J. Pinto, A. Fern, T. Bauer, and M. Erwig. Robust learning for adaptive programs by leveraging program structure. In ICMLA, pages 943--948, 2010.

Digital Library

[12]

J. Pinto, A. Fern, T. Bauer, and M. Erwig. Improving policy gradient estimates with influence information. Journal of Machine Learning Research, 20: 1--18, 2011.

[13]

R. Sutton. Temporal credit assignment in reinforcement learning. PhD thesis, University of Massachusetts Amherst, 1984. AAI8410337.

Digital Library

[14]

R. Sutton and A. Barto. Reinforcement Learning: An Introduction. MIT Press, 2000.

Digital Library

[15]

C. J. C. H. Watkins. Learning from Delayed Rewards. PhD thesis, Cambridge University, 1989.

[16]

J. Willcock, A. Lumsdaine, and D. Quinlan. Reusable, generic program analyses and transformations. SIGPLAN Not., 45(2): 5--14, October 2009.

Digital Library

Cited By

Sun XLeavens GGarcia APăsăreanu C(2018)Towards learning-augmented languagesProceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3236024.3275432(959-961)Online publication date: 26-Oct-2018
https://dl.acm.org/doi/10.1145/3236024.3275432
Sun XLeavens GGarcia APăsăreanu C(2018)Towards learning-augmented languagesProceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3236024.3275432(959-961)Online publication date: 26-Oct-2018
https://dl.acm.org/doi/10.1145/3236024.3275432

Index Terms

Faster program adaptation through reward attribution inference
1. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Preprocessors

Recommendations

Faster program adaptation through reward attribution inference
GPCE '12: Proceedings of the 11th International Conference on Generative Programming and Component Engineering

In the adaptation-based programming (ABP) paradigm, programs may contain variable parts (function calls, parameter values, etc.) that can be take a number of different values. Programs also contain reward statements with which a programmer can provide ...
Adaptation-based programming in java
PEPM '11: Proceedings of the 20th ACM SIGPLAN workshop on Partial evaluation and program manipulation

Writing deterministic programs is often difficult for problems whose optimal solutions depend on unpredictable properties of the programs' inputs. Difficulty is also encountered for problems where the programmer is uncertain about how to best implement ...
Reward Shaping in Episodic Reinforcement Learning
AAMAS '17: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems

Recent advancements in reinforcement learning confirm that reinforcement learning techniques can solve large scale problems leading to high quality autonomous decision making. It is a matter of time until we will see large scale applications of ...

Comments

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices

ACM SIGPLAN Notices Volume 48, Issue 3

GPCE '12

March 2013

140 pages

ISSN:0362-1340

EISSN:1558-1160

DOI:10.1145/2480361

Issue’s Table of Contents

GPCE '12: Proceedings of the 11th International Conference on Generative Programming and Component Engineering
September 2012
148 pages
ISBN:9781450311298
DOI:10.1145/2371401
General Chair:
Klaus Ostermann
University of Marburg, Germany
,
Program Chair:
Walter Binder
University of Lugano, Switzerland

Copyright © 2012 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 September 2012

Published in SIGPLAN Volume 48, Issue 3

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Division of Computing and Communication Foundations

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
169
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)1

Reflects downloads up to 10 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Sun XLeavens GGarcia APăsăreanu C(2018)Towards learning-augmented languagesProceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3236024.3275432(959-961)Online publication date: 26-Oct-2018
https://dl.acm.org/doi/10.1145/3236024.3275432
Sun XLeavens GGarcia APăsăreanu C(2018)Towards learning-augmented languagesProceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3236024.3275432(959-961)Online publication date: 26-Oct-2018
https://dl.acm.org/doi/10.1145/3236024.3275432

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents