research-article

Reinforcement Learning for Multi-Step Expert Advice

Authors:

Patrick Philipp,

Achim RettingerAuthors Info & Claims

AAMAS '17: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems

Pages 962 - 971

Published: 08 May 2017 Publication History

Abstract

Complex tasks for heterogeneous data sources, such as finding and linking named entities in text documents or detecting objects in images, often require multiple steps to be solved in a processing pipeline. In most of the cases, there exist numerous, exchangeable software components for a single step, each an "expert" for data with certain characteristics. Which expert to apply to which observed data instance in which step becomes a challenge that is even hard for humans to decide. In this work, we treat the problem as Single-Agent System (SAS) where a centralized agent learns how to best exploit experts. We therefore define locality-sensitive relational measures for experts and data points, so-called "meta-dependencies", to assess expert performances, and use them for decision-making via Online Model-Free- and Batch Reinforcement Learning (RL) approaches, building on techniques from Contextual Bandits (CBs) and Statistical Relational Learning (SRL). The resulting system automatically learns to pick the best pipeline of experts for a given set of data points. We evaluate our approach for Entity Linking on text corpora with heterogeneous characteristics (such as news articles or tweets). Our empirical results improve the estimation of expert accuracies as well as the out-of-the-box performance of the original experts without manual tuning.

References

[1]

K. Amin, S. Kale, G. Tesauro, and D. S. Turaga. Budgeted prediction with expert advice. In AAAI, pages 2490--2496, 2015.

Digital Library

[2]

S. Arora, E. Hazan, and S. Kale. The multiplicative weights update method: a meta-algorithm and applications. Theory of Computing, 8(1):121--164, 2012.

[3]

P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire. The non-stochastic multiarmed bandit problem. SIAM J. Comput., 32(1):48--77, 2002.

Digital Library

[4]

S. H. Bach, M. Broecheler, L. Getoor, and D. P. O'Leary. Scaling MPE inference for constrained continuous markov random fields with consensus optimization. In NIPS, pages 2663--2671, 2012.

Digital Library

[5]

S. H. Bach, B. Huang, B. London, and L. Getoor. Hinge-loss markov random fields: Convex inference for structured prediction. In UAI, 2013.

Digital Library

[6]

A. E. C. Basave, G. Rizzo, A. Varga, M. Rowe, M. Stankovic, and A. Dadzie. Making sense of microposts (#microposts2014) named entity extraction & linking challenge. In Making Sense of Microposts (WWW), pages 54--60, 2014.

[7]

A. Beygelzimer, J. Langford, L. Li, L. Reyzin, and R. E. Schapire. Contextual bandit algorithms with supervised learning guarantees. In AISTATS, pages 19--26, 2011.

[8]

C. Boutilier. Sequential optimality and coordination in multiagent systems. In IJCAI, pages 478--485, 1999.

Digital Library

[9]

P. Brazdil, C. Giraud-Carrier, C. Soares, and R. Vilalta. Metalearning: Applications to Data Mining. Springer, 1 edition, 2008.

Digital Library

[10]

M. Bröcheler, L. Mihalkova, and L. Getoor. Probabilistic similarity logic. In UAI, pages 73--82, 2010.

Digital Library

[11]

N. Cesa-Bianchi, Y. Freund, D. Haussler, D. P. Helmbold, R. E. Schapire, and M. K. Warmuth. How to use expert advice. J. ACM, 44(3):427--485, 1997.

Digital Library

[12]

X. L. Dong, E. Gabrilovich, G. Heitz, W. Horn, K. Murphy, S. Sun, and W. Zhang. From data fusion to knowledge fusion. PVLDB, 7(10):881--892, 2014.

Digital Library

[13]

X. L. Dong, E. Gabrilovich, K. Murphy, V. Dang, W. Horn, C. Lugaresi, S. Sun, and W. Zhang. Knowledge-based trust: Estimating the trustworthiness of web sources. PVLDB, 8(9):938--949, 2015.

Digital Library

[14]

M. Feurer, A. Klein, K. Eggensperger, J. T. Springenberg, M. Blum, and F. Hutter. Efficient and robust automated machine learning. In NIPS, pages 2962--2970, 2015.

Digital Library

[15]

J. R. Finkel, T. Grenager, and C. D. Manning. Incorporating non-local information into information extraction systems by gibbs sampling. In ACL, 2005.

Digital Library

[16]

Y. Freund and R. E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci., 55(1):119 -- 139, 1997.

Digital Library

[17]

J. Hoffart, M. A. Yosef, I. Bordino, H. Fürstenau, M. Pinkal, M. Spaniol, B. Taneva, S. Thater, and G. Weikum. Robust disambiguation of named entities in text. In EMNLP, pages 782--792, 2011.

Digital Library

[18]

S. Kapetanakis and D. Kudenko. Reinforcement learning of coordination in cooperative multi-agent systems. In AAAI, pages 326--331, 2002.

Digital Library

[19]

A. Kimmig, S. Bach, M. Broecheler, B. Huang, and L. Getoor. A short introduction to probabilistic soft logic. In NIPS Workshop on Probabilistic Programming: Foundations and Applications, pages 1--4, 2012.

[20]

L. Kotthoff. Algorithm selection for combinatorial search problems: A survey. AI Magazine, 35(3):48--60, 2014.

Digital Library

[21]

A. Krishnamurthy, A. Agarwal, and J. Langford. PAC reinforcement learning with rich observations. In NIPS, pages 1840--1848, 2016.

[22]

M. G. Lagoudakis and R. Parr. Least-squares policy iteration. Journal of Machine Learning Research, 4:1107--1149, 2003.

Digital Library

[23]

S. Lange, T. Gabel, and M. Riedmiller. Batch Reinforcement Learning, pages 45--73. Springer Berlin Heidelberg, 2012.

[24]

J. Lehmann, R. Isele, M. Jakob, A. Jentzsch, D. Kontokostas, P. N. Mendes, S. Hellmann, M. Morsey, P. van Kleef, S. Auer, and C. Bizer. Dbpedia - A large-scale, multilingual knowledge base extracted from wikipedia. Semantic Web, 6(2):167--195, 2015.

[25]

L. Li, M. L. Littman, and C. R. Mansley. Online exploration in least-squares policy iteration. In AAMAS, pages 733--739, 2009.

Digital Library

[26]

K. Liu and Q. Zhao. Distributed learning in multi-armed bandit with multiple players. Trans. Sig. Proc., 58(11):5667--5681, 2010.

Digital Library

[27]

F. S. Melo and M. I. Ribeiro. Q-learning with linear function approximation. In COLT, pages 308--322, 2007.

Digital Library

[28]

P. N. Mendes, M. Jakob, A. García-Silva, and C. Bizer. Dbpedia spotlight: shedding light on the web of documents. In I-SEMANTICS, pages 1--8, 2011.

Digital Library

[29]

A. Platanios, A. Blum, and T. M. Mitchell. Estimating accuracy from unlabeled data. In In Proceedings of UAI, 2014.

Digital Library

[30]

E. A. Platanios, A. Dubey, and T. M. Mitchell. Estimating accuracy from unlabeled data: A bayesian approach. In ICML, pages 1416--1425, 2016.

Digital Library

[31]

M. L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, Inc., New York, NY, USA, 1st edition, 1994.

Digital Library

[32]

J. R. Rice. The algorithm selection problem. Advances in Computers, 15:65--118, 1976.

[33]

P. Ruiz and T. Poibeau. Combining open source annotators for entity linking through weighted voting. In SEM, 2015.

[34]

R. Speck and A. N. Ngomo. Ensemble learning for named entity recognition. In In ISWC, pages 519--534, 2014.

Digital Library

[35]

A. L. Strehl, L. Li, E. Wiewiora, J. Langford, and M. L. Littman. PAC model-free reinforcement learning. In ICML, pages 881--888, 2006.

Digital Library

[36]

F. M. Suchanek, G. Kasneci, and G. Weikum. Yago: a core of semantic knowledge. In WWW, pages 697--706, 2007.

Digital Library

[37]

C. Thornton, F. Hutter, H. H. Hoos, and K. Leyton-Brown. Auto-weka: Combined selection and hyperparameter optimization of classification algorithms. In SIGKDD, pages 847--855. ACM, 2013.

Digital Library

[38]

R. Usbeck, A. N. Ngomo, M. Röder, D. Gerber, S. A. Coelho, S. Auer, and A. Both. AGDISTIS - graph-based disambiguation of named entities using linked data. In ISWC, pages 457--471, 2014.

Digital Library

[39]

R. Usbeck, M. Röder, A. N. Ngomo, C. Baron, A. Both, M. Brümmer, D. Ceccarelli, M. Cornolti, D. Cherix, B. Eickmann, P. Ferragina, C. Lemke, A. Moro, R. Navigli, F. Piccinno, G. Rizzo, H. Sack, R. Speck, R. Troncy, J. Waitelonis, and L. Wesemann. GERBIL: general entity annotator benchmarking framework. In WWW, pages 1133--1143, 2015.

Digital Library

[40]

T. J. Walsh, I. Szita, C. Diuk, and M. L. Littman. Exploring compact reinforcement-learning representations with linear regression. In UAI, pages 591--598, 2009.

Digital Library

[41]

C. J. C. H. Watkins and P. Dayan. Q-learning. Machine Learning, 8(3):279--292, 1992.

Digital Library

[42]

P. Xuan, V. R. Lesser, and S. Zilberstein. Communication decisions in multi-agent cooperation: model and experiments. In Agents, pages 616--623, 2001.

Digital Library

[43]

Z.-H. Zhou. Ensemble methods: foundations and algorithms. CRC press, 2012.

Digital Library

Cited By

Gimelfarb MSanner SLee C(2018)Reinforcement learning with multiple expertsProceedings of the 32nd International Conference on Neural Information Processing Systems10.5555/3327546.3327623(9549-9559)Online publication date: 3-Dec-2018
https://dl.acm.org/doi/10.5555/3327546.3327623

Index Terms

Reinforcement Learning for Multi-Step Expert Advice
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
  2. Machine learning
    1. Machine learning approaches
      1. Logical and relational learning
        Statistical relational learning
2. Theory of computation
  1. Design and analysis of algorithms
    1. Online algorithms
      1. Online learning algorithms
  2. Theory and algorithms for application domains
    1. Machine learning theory
      1. Reinforcement learning
        Sequential decision making

Recommendations

Correcting flawed expert knowledge through reinforcement learning

Reinforcement learning used to correct erroneous knowledge in a tactical agent.Such experiential learning also creates missing knowledge for a tactical agent.Prototype was built and extensively tested to verify usefulness of method. Subject matter ...
Object-Focused Advice in Reinforcement Learning
AAMAS '16: Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems

In order for robots and intelligent agents to interact with and learn from people with no machine-learning expertise, robots should be able to learn from natural human instruction. Many human explanations consist of simple sentences without state ...
Integrating Agent Advice and Previous Task Solutions in Multiagent Reinforcement Learning
AAMAS '19: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems

Reinforcement learning methods have successfully been applied to build autonomous agents that solve challenging sequential decision-making problems. However, agents need a long time to learn a task, especially when multiple autonomous agents are in the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

AAMAS '17: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems

May 2017

1914 pages

General Chairs:
Kate Larson
University of Waterloo, Canada
,
Michael Winikoff
University of Otago, New Zealand
,
Program Chairs:
Sanmay Das
Washington University in St. Louis, USA
,
Edmund Durfee
University of Michigan, USA

Sponsors

IFAAMAS

In-Cooperation

ACM: Association for Computing Machinery

Publisher

International Foundation for Autonomous Agents and Multiagent Systems

Richland, SC

Publication History

Published: 08 May 2017

Check for updates

Author Tags

Qualifiers

Research-article

Acceptance Rates

AAMAS '17 Paper Acceptance Rate 127 of 457 submissions, 28%;

Overall Acceptance Rate 1,155 of 5,036 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
93
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 16 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Gimelfarb MSanner SLee C(2018)Reinforcement learning with multiple expertsProceedings of the 32nd International Conference on Neural Information Processing Systems10.5555/3327546.3327623(9549-9559)Online publication date: 3-Dec-2018
https://dl.acm.org/doi/10.5555/3327546.3327623

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents