Abstract
We investigate the problem of using function approximation in reinforcement learning where the agent’s policy is represented as a classifier mapping states to actions. High classification accuracy is usually deemed to correlate with high policy quality. But this is not necessarily the case as increasing classification accuracy can actually decrease the policy’s quality. This phenomenon takes place when the learning process begins to focus on classifying less “important” states. In this paper, we introduce a measure of state’s decision-making importance that can be used to improve policy learning. As a result, the focused learning process is shown to converge faster to better policies.
Chapter PDF
Similar content being viewed by others
Keywords
- Policy Language
- Neural Information Processing System
- Good Policy
- Policy Learning
- Reinforcement Learning Method
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Baird, L.: Advantage updating. Technical report, Wright-Patterson Air Force Base (1993)
Dietterich, T.G., Wang, X.: Batch value function approximation via support vectors. Advances in Neural Information Processing Systems 14 (2002)
Fan, W., Stolfo, S.J., Zhang, J., Chan, P.K.: AdaCost: Misclassification cost-sensitive boosting. In: Proc. of the 16th Int’l Conf. on Machine Learning (1999)
Fern, A., Yoon, S., Givan, R.: Approximate policy iteration with a policy language bias. Advances in Neural Information Processing Systems 16 (2004)
Kearns, M., Mansour, Y., Ng, A.: Approximate planning in large POMDPs via reusable trajectories. Advances in Neural Information Processing Systems 12 (2000)
Lagoudakis, M., Parr, R.: Reinforcement learning as classification: Leveraging modern classifiers. In: Proc. of the 12th Int’l Conf. on Machine Learning (2003)
Langford, J., Zadrozny, B.: Reducing T-step reinforcement learning to classification. In: Proc. of the Machine Learning Reductions Workshop, Chicago, IL (2003)
Levner, I., Bulitko, V.: Machine learning for adaptive image interpretation. In: Proc. of the 12th Innovative Applications of Artificial Intelligence Conf. (2004)
Li, L.: Focus of attention in reinforcement learning. Master’s thesis, Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada (June 2004)
Ng, A.Y., Jordan, M.: PEGASUS: A policy search method for large MDPs and POMDPs. In: Proc. of the 16th Conf. on Uncertainty in AI (2000)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (March 1998)
Yoon, S., Fern, A., Givan, R.: Inductive policy selection for first-order MDPs. In: Proc. of the 18th Conference on Uncertainty in AI (2002)
Zadrozny, B., Langford, J.: Cost-sensitive learning by cost-proportionate example weighting. In: Proc. of the IEEE Int’l Conf. on Data Mining (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Li, L., Bulitko, V., Greiner, R. (2004). Batch Reinforcement Learning with State Importance. In: Boulicaut, JF., Esposito, F., Giannotti, F., Pedreschi, D. (eds) Machine Learning: ECML 2004. ECML 2004. Lecture Notes in Computer Science(), vol 3201. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30115-8_53
Download citation
DOI: https://doi.org/10.1007/978-3-540-30115-8_53
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23105-9
Online ISBN: 978-3-540-30115-8
eBook Packages: Springer Book Archive