Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/3091125.3091262acmotherconferencesArticle/Chapter ViewAbstractPublication PagesaamasConference Proceedingsconference-collections
research-article

Reinforcement Learning for Multi-Step Expert Advice

Published: 08 May 2017 Publication History

Abstract

Complex tasks for heterogeneous data sources, such as finding and linking named entities in text documents or detecting objects in images, often require multiple steps to be solved in a processing pipeline. In most of the cases, there exist numerous, exchangeable software components for a single step, each an "expert" for data with certain characteristics. Which expert to apply to which observed data instance in which step becomes a challenge that is even hard for humans to decide. In this work, we treat the problem as Single-Agent System (SAS) where a centralized agent learns how to best exploit experts. We therefore define locality-sensitive relational measures for experts and data points, so-called "meta-dependencies", to assess expert performances, and use them for decision-making via Online Model-Free- and Batch Reinforcement Learning (RL) approaches, building on techniques from Contextual Bandits (CBs) and Statistical Relational Learning (SRL). The resulting system automatically learns to pick the best pipeline of experts for a given set of data points. We evaluate our approach for Entity Linking on text corpora with heterogeneous characteristics (such as news articles or tweets). Our empirical results improve the estimation of expert accuracies as well as the out-of-the-box performance of the original experts without manual tuning.

References

[1]
K. Amin, S. Kale, G. Tesauro, and D. S. Turaga. Budgeted prediction with expert advice. In AAAI, pages 2490--2496, 2015.
[2]
S. Arora, E. Hazan, and S. Kale. The multiplicative weights update method: a meta-algorithm and applications. Theory of Computing, 8(1):121--164, 2012.
[3]
P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire. The non-stochastic multiarmed bandit problem. SIAM J. Comput., 32(1):48--77, 2002.
[4]
S. H. Bach, M. Broecheler, L. Getoor, and D. P. O'Leary. Scaling MPE inference for constrained continuous markov random fields with consensus optimization. In NIPS, pages 2663--2671, 2012.
[5]
S. H. Bach, B. Huang, B. London, and L. Getoor. Hinge-loss markov random fields: Convex inference for structured prediction. In UAI, 2013.
[6]
A. E. C. Basave, G. Rizzo, A. Varga, M. Rowe, M. Stankovic, and A. Dadzie. Making sense of microposts (#microposts2014) named entity extraction & linking challenge. In Making Sense of Microposts (WWW), pages 54--60, 2014.
[7]
A. Beygelzimer, J. Langford, L. Li, L. Reyzin, and R. E. Schapire. Contextual bandit algorithms with supervised learning guarantees. In AISTATS, pages 19--26, 2011.
[8]
C. Boutilier. Sequential optimality and coordination in multiagent systems. In IJCAI, pages 478--485, 1999.
[9]
P. Brazdil, C. Giraud-Carrier, C. Soares, and R. Vilalta. Metalearning: Applications to Data Mining. Springer, 1 edition, 2008.
[10]
M. Bröcheler, L. Mihalkova, and L. Getoor. Probabilistic similarity logic. In UAI, pages 73--82, 2010.
[11]
N. Cesa-Bianchi, Y. Freund, D. Haussler, D. P. Helmbold, R. E. Schapire, and M. K. Warmuth. How to use expert advice. J. ACM, 44(3):427--485, 1997.
[12]
X. L. Dong, E. Gabrilovich, G. Heitz, W. Horn, K. Murphy, S. Sun, and W. Zhang. From data fusion to knowledge fusion. PVLDB, 7(10):881--892, 2014.
[13]
X. L. Dong, E. Gabrilovich, K. Murphy, V. Dang, W. Horn, C. Lugaresi, S. Sun, and W. Zhang. Knowledge-based trust: Estimating the trustworthiness of web sources. PVLDB, 8(9):938--949, 2015.
[14]
M. Feurer, A. Klein, K. Eggensperger, J. T. Springenberg, M. Blum, and F. Hutter. Efficient and robust automated machine learning. In NIPS, pages 2962--2970, 2015.
[15]
J. R. Finkel, T. Grenager, and C. D. Manning. Incorporating non-local information into information extraction systems by gibbs sampling. In ACL, 2005.
[16]
Y. Freund and R. E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci., 55(1):119 -- 139, 1997.
[17]
J. Hoffart, M. A. Yosef, I. Bordino, H. Fürstenau, M. Pinkal, M. Spaniol, B. Taneva, S. Thater, and G. Weikum. Robust disambiguation of named entities in text. In EMNLP, pages 782--792, 2011.
[18]
S. Kapetanakis and D. Kudenko. Reinforcement learning of coordination in cooperative multi-agent systems. In AAAI, pages 326--331, 2002.
[19]
A. Kimmig, S. Bach, M. Broecheler, B. Huang, and L. Getoor. A short introduction to probabilistic soft logic. In NIPS Workshop on Probabilistic Programming: Foundations and Applications, pages 1--4, 2012.
[20]
L. Kotthoff. Algorithm selection for combinatorial search problems: A survey. AI Magazine, 35(3):48--60, 2014.
[21]
A. Krishnamurthy, A. Agarwal, and J. Langford. PAC reinforcement learning with rich observations. In NIPS, pages 1840--1848, 2016.
[22]
M. G. Lagoudakis and R. Parr. Least-squares policy iteration. Journal of Machine Learning Research, 4:1107--1149, 2003.
[23]
S. Lange, T. Gabel, and M. Riedmiller. Batch Reinforcement Learning, pages 45--73. Springer Berlin Heidelberg, 2012.
[24]
J. Lehmann, R. Isele, M. Jakob, A. Jentzsch, D. Kontokostas, P. N. Mendes, S. Hellmann, M. Morsey, P. van Kleef, S. Auer, and C. Bizer. Dbpedia - A large-scale, multilingual knowledge base extracted from wikipedia. Semantic Web, 6(2):167--195, 2015.
[25]
L. Li, M. L. Littman, and C. R. Mansley. Online exploration in least-squares policy iteration. In AAMAS, pages 733--739, 2009.
[26]
K. Liu and Q. Zhao. Distributed learning in multi-armed bandit with multiple players. Trans. Sig. Proc., 58(11):5667--5681, 2010.
[27]
F. S. Melo and M. I. Ribeiro. Q-learning with linear function approximation. In COLT, pages 308--322, 2007.
[28]
P. N. Mendes, M. Jakob, A. García-Silva, and C. Bizer. Dbpedia spotlight: shedding light on the web of documents. In I-SEMANTICS, pages 1--8, 2011.
[29]
A. Platanios, A. Blum, and T. M. Mitchell. Estimating accuracy from unlabeled data. In In Proceedings of UAI, 2014.
[30]
E. A. Platanios, A. Dubey, and T. M. Mitchell. Estimating accuracy from unlabeled data: A bayesian approach. In ICML, pages 1416--1425, 2016.
[31]
M. L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, Inc., New York, NY, USA, 1st edition, 1994.
[32]
J. R. Rice. The algorithm selection problem. Advances in Computers, 15:65--118, 1976.
[33]
P. Ruiz and T. Poibeau. Combining open source annotators for entity linking through weighted voting. In SEM, 2015.
[34]
R. Speck and A. N. Ngomo. Ensemble learning for named entity recognition. In In ISWC, pages 519--534, 2014.
[35]
A. L. Strehl, L. Li, E. Wiewiora, J. Langford, and M. L. Littman. PAC model-free reinforcement learning. In ICML, pages 881--888, 2006.
[36]
F. M. Suchanek, G. Kasneci, and G. Weikum. Yago: a core of semantic knowledge. In WWW, pages 697--706, 2007.
[37]
C. Thornton, F. Hutter, H. H. Hoos, and K. Leyton-Brown. Auto-weka: Combined selection and hyperparameter optimization of classification algorithms. In SIGKDD, pages 847--855. ACM, 2013.
[38]
R. Usbeck, A. N. Ngomo, M. Röder, D. Gerber, S. A. Coelho, S. Auer, and A. Both. AGDISTIS - graph-based disambiguation of named entities using linked data. In ISWC, pages 457--471, 2014.
[39]
R. Usbeck, M. Röder, A. N. Ngomo, C. Baron, A. Both, M. Brümmer, D. Ceccarelli, M. Cornolti, D. Cherix, B. Eickmann, P. Ferragina, C. Lemke, A. Moro, R. Navigli, F. Piccinno, G. Rizzo, H. Sack, R. Speck, R. Troncy, J. Waitelonis, and L. Wesemann. GERBIL: general entity annotator benchmarking framework. In WWW, pages 1133--1143, 2015.
[40]
T. J. Walsh, I. Szita, C. Diuk, and M. L. Littman. Exploring compact reinforcement-learning representations with linear regression. In UAI, pages 591--598, 2009.
[41]
C. J. C. H. Watkins and P. Dayan. Q-learning. Machine Learning, 8(3):279--292, 1992.
[42]
P. Xuan, V. R. Lesser, and S. Zilberstein. Communication decisions in multi-agent cooperation: model and experiments. In Agents, pages 616--623, 2001.
[43]
Z.-H. Zhou. Ensemble methods: foundations and algorithms. CRC press, 2012.

Cited By

View all
  • (2018)Reinforcement learning with multiple expertsProceedings of the 32nd International Conference on Neural Information Processing Systems10.5555/3327546.3327623(9549-9559)Online publication date: 3-Dec-2018

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
AAMAS '17: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems
May 2017
1914 pages

Sponsors

  • IFAAMAS

In-Cooperation

Publisher

International Foundation for Autonomous Agents and Multiagent Systems

Richland, SC

Publication History

Published: 08 May 2017

Check for updates

Author Tags

  1. collective learning
  2. decision-making with multi-step expert advice
  3. entity linking
  4. expert processes
  5. reinforcement learning

Qualifiers

  • Research-article

Acceptance Rates

AAMAS '17 Paper Acceptance Rate 127 of 457 submissions, 28%;
Overall Acceptance Rate 1,155 of 5,036 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2018)Reinforcement learning with multiple expertsProceedings of the 32nd International Conference on Neural Information Processing Systems10.5555/3327546.3327623(9549-9559)Online publication date: 3-Dec-2018

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media