Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Faster program adaptation through reward attribution inference

Published: 26 September 2012 Publication History

Abstract

In the adaptation-based programming (ABP) paradigm, programs may contain variable parts (function calls, parameter values, etc.) that can be take a number of different values. Programs also contain reward statements with which a programmer can provide feedback about how well a program is performing with respect to achieving its goals (for example, achieving a high score on some scale). By repeatedly running the program, a machine learning component will, guided by the rewards, gradually adjust the automatic choices made in the variable program parts so that they converge toward an optimal strategy.
ABP is a method for semi-automatic program generation in which the choices and rewards offered by programmers allow standard machine-learning techniques to explore a design space defined by the programmer to find an optimal instance of a program template. ABP effectively provides a DSL that allows non-machine-learning experts to exploit machine learning to generate self-optimizing programs.
Unfortunately, in many cases the placement and structuring of choices and rewards can have a detrimental effect on how an optimal solution to a program-generation problem can be found. To address this problem, we have developed a dataflow analysis that computes influence tracks of choices and rewards. This information can be exploited by an augmented machine-learning technique to ignore misleading rewards and to generally attribute rewards better to the choices that have actually influenced them. Moreover, this technique allows us to detect errors in the adaptive program that might arise out of program maintenance. Our evaluation shows that the dataflow analysis can lead to improvements in performance.

References

[1]
P. Abbeel and A. Y. Ng. Apprenticeship learning via inverse reinforcement learning. In International Conference on Machine Learning, pages 1--8, 2004.
[2]
F. Agakov, E. Bonilla, J. Cavazos, B. Franke, G. Fursin, M. F. P. O'Boyle, J. Thomson, M. Toussaint, and C. K. I. Williams. Using machine learning to focus iterative optimization. In Proceedings of the International Symposium on Code Generation and Optimization, CGO '06, pages 295--305, Washington, DC, USA, 2006. IEEE Computer Society.
[3]
A. V. Aho, M. S. Lam, R. Sethi, and J. D. Ullman. Compilers: Principles, Techniques, and Tools (2nd Edition). Addison Wesley, 2nd edition, 2006.
[4]
D. Andre and S. Russell. State abstraction for programmable reinforcement learning agents. In Eighteenth National Conference on Artificial Intelligence, pages 119--125, 2002.
[5]
David Andre. Programmabler Reinforcement Learning Agents. PhD thesis, University of California at Berkeley, 2003.
[6]
T. Bauer, M. Erwig, A. Fern, and J. Pinto. Adaptation-Based Program Generation in Java. In ACM SIGPLAN Workshop on Partial Evaluation and Program Manipulation, pages 81--90, 2011.
[7]
T. Bauer, M. Erwig, A. Fern, and J. Pinto. Adaptation-Based Programming in Haskell. In IFIP Working Conference on Domain-Specific Languages, pages 1--23, 2011.
[8]
S. Bhat, C. L. Isbell, and M. Mateas. On the difficulty of modular reinforcement learning for real-world partial programming. In Proceedings of the 21st national conference on Artificial intelligence - Volume 1, AAAI'06, pages 318--323. AAAI Press, 2006.
[9]
R. Maclin, J. Shavlik, L. Torrey, T. Walker, and E. Wild. Giving advice about preferred actions to reinforcement learners via knowledge-based kernel regression. In Proceedings of the Twentieth National Conference on Artificial Intelligence, pages 819--824, 2005.
[10]
B. Marthi. Concurrent hierarchical reinforcement learning. In Proceedings of the 20th national conference on Artificial intelligence - Volume 4, AAAI'05, pages 1652--1653. AAAI Press, 2005.
[11]
J. Pinto, A. Fern, T. Bauer, and M. Erwig. Robust learning for adaptive programs by leveraging program structure. In ICMLA, pages 943--948, 2010.
[12]
J. Pinto, A. Fern, T. Bauer, and M. Erwig. Improving policy gradient estimates with influence information. Journal of Machine Learning Research, 20: 1--18, 2011.
[13]
R. Sutton. Temporal credit assignment in reinforcement learning. PhD thesis, University of Massachusetts Amherst, 1984. AAI8410337.
[14]
R. Sutton and A. Barto. Reinforcement Learning: An Introduction. MIT Press, 2000.
[15]
C. J. C. H. Watkins. Learning from Delayed Rewards. PhD thesis, Cambridge University, 1989.
[16]
J. Willcock, A. Lumsdaine, and D. Quinlan. Reusable, generic program analyses and transformations. SIGPLAN Not., 45(2): 5--14, October 2009.

Cited By

View all
  • (2018)Towards learning-augmented languagesProceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3236024.3275432(959-961)Online publication date: 26-Oct-2018
  • (2018)Towards learning-augmented languagesProceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3236024.3275432(959-961)Online publication date: 26-Oct-2018

Index Terms

  1. Faster program adaptation through reward attribution inference

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 48, Issue 3
    GPCE '12
    March 2013
    140 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/2480361
    Issue’s Table of Contents
    • cover image ACM Conferences
      GPCE '12: Proceedings of the 11th International Conference on Generative Programming and Component Engineering
      September 2012
      148 pages
      ISBN:9781450311298
      DOI:10.1145/2371401
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 26 September 2012
    Published in SIGPLAN Volume 48, Issue 3

    Check for updates

    Author Tags

    1. partial programming
    2. program adaptation
    3. reinforcement learning

    Qualifiers

    • Research-article

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 10 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2018)Towards learning-augmented languagesProceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3236024.3275432(959-961)Online publication date: 26-Oct-2018
    • (2018)Towards learning-augmented languagesProceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3236024.3275432(959-961)Online publication date: 26-Oct-2018

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media