Abstract
We formalize a model for supervised learning of action strategies in dynamic stochastic domains and show that PAC-learning results on Occam algorithms hold in this model as well. We then identify a class of rule-based action strategies for which polynomial time learning is possible. The representation of strategies is a generalization of decision lists; strategies include rules with existentially quantified conditions, simple recursive predicates, and small internal state, but are syntactically restricted. We also study the learnability of hierarchically composed strategies where a subroutine already acquired can be used as a basic action in a higher level strategy. We prove some positive results in this setting, but also show that in some cases the hierarchical learning problem is computationally hard.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Agrawal, R., Imielinski, T., & Swami, A. (1993). Mining association rules between sets of items in large databases. Proceedings of the ACMConference on Management of Data (SIGMOD) (pp. 207–216). Washington, DC: ACM Press.
Allen, J., Hendler, J., & Tate, A. (1990). Readings in planning. San Mateo, CA: Morgan Kaufmann.
Anderson, J. (1983). The architecture of cognition. Cambridge, MA: Harvard University Press.
Angluin, D. (1987). Learning regular sets from queries and counterexamples. Information and Computation, 75, 87–106.
Baum, E. (1996). Toward a model of mind as a laissez-faire economy of idiots. Proceedings of the International Conference on Machine Learning (pp. 28–36). Bari, Italy: Morgan Kaufmann.
Blumer, A., Ehrenfeucht, A., Haussler, D., & Warmuth, M.K. (1987). Occam's razor. Information Processing Letters, 24, 377–380.
Booker, L., Goldberg, D., & Holland, J. (1989). Classifier systems and genetic algorithms. Artificial Intelligence, 40, 235–282.
Brooks, R.A. (1991). Intelligence without representation. Artificial Intelligence, 47, 139–159.
Bylander, T. (1994). The computational complexity of propositional STRIPS planning. Artificial Intelligence, 69, 165–204.
Chapman, D. (1989). Penguins can make cake. AI Magazine, 10(4), 45–50.
Cohen, W. (1995). PAC-learning recursive logic programs: Efficient algorithms. Journal of Artificial Intelligence Research, 2, 501–539.
Cook, S.A. (1971). The complexity of theorem proving procedures. Proceedings of the 3rd Annual ACM Symposium of the Theory of Computing (pp. 151–158). Shaker Heights, Ohio: ACM Press.
DeJong, G., & Bennett, S. (1995). Extending classical planning to real world execution with machine learning. Proceedings of the International Joint Conference of Artificial Intelligence (pp. 1153–1159). Montreal, Canada: Morgan Kaufmann.
DeJong, G., & Mooney, R. (1986). Explanation based learning: An alternative view. Machine Learning, 1, 145–176.
De Raedt, L., & Dzeroski, S. (1994). First order jk-clausal theories are PAC-learnable. Artificial Intelligence, 70, 375–392.
Dzeroski, S., Muggleton, S., & Russell, S. (1992). PAC-learnability of determinate logic programs. Proceedings of the Conference on Computational Learning Theory (pp. 128–135). Pittsburgh, PA: ACM Press.
Fiechter, C.N. (1994). Efficient reinforcement learning. Proceedings of the Conference on Computational Learning Theory (pp. 88–97). New Brunswick, NJ: ACM Press.
Garey, M., & Johnson, D. (1979). Computers and intractability: A guide to the theory of NP-completeness. San Francisco: W.H. Freeman.
Georgeff, M., & Lansky, A. (1987). Reactive reasoning and planning. Proceedings of the National Conference on Artificial Intelligence (pp. 677–682). Philadelphia, PA: AAAI Press.
Ginsberg, M. (1989). Universal planning: An (almost) universally bad idea. AI Magazine, 10(4), 40–44.
Grefenstette, J., Ramsey, C., & Schultz, A. (1990). Learning sequential decision rules using simulation models and competition. Machine Learning, 5, 355–381.
Gupta, N., & Nau, D. (1991). Complexity results for blocksworld planning. Proceedings of the National Conference on Artificial Intelligence (pp. 629–633). Anaheim, CA: AAAI Press.
Haussler, D. (1989). Learning conjunctive concepts in structural domains. Machine Learning, 4, 7–40.
Hayes, P., Ford, K., & Agnew, N. (1994). On babies and bathwather. AI Magazine, 15(4), 15–26.
Jonsson, P., & Bäckström, C. (1996). On the size of reactive plans. Proceedings of the National Conference on Artificial Intelligence (pp. 1182–1187). Portland, Oregon: AAAI Press.
Kaelbling, L. (1993). Learning in embedded systems. Cambridge, MA: MIT Press.
Kaelbling, L., Littman, M., & Moore, A. (1996). Reinforcement learning: A survey. Journal of Artificial Intelligence Research, 4, 237–285.
Kambhampati, S. (1995). A comparative analysis of partial order planning and task reduction planning. SIGART Bulletin, 6(1), 16–25.
Kearns, M.J., & Schapire, R.E. (1994). Efficient distribution-free learning of probabilistic concepts. Journal of Computer and System Sciences, 48, 464–497.
Kearns, M., & Vazirani, U. (1994). An introduction to computational learning theory. Cambridge, MA: MIT Press.
Khardon, R. (1997). Learning action strategies for planning domains (Tech. Rep. TR-09-97). Harvard University: Aiken Computation Lab.
Khardon, R., & Roth, D. (1995). Learning to reason with a restricted view. Proceedings of the Conference on Computational Learning Theory (pp. 301–310). Santa Cruz, CA: ACM Press.
Khardon, R., & Roth, D. (1997). Learning to reason. Journal of the ACM, 44, 697–725.
Klahr, D., Langley, P., & Neches, R. (1986). Production system models of learning and development. Cambridge, MA: MIT Press.
Korf, R.E. (1985). Macro operators: A weak method for learning. Artificial Intelligence, 26, 35–77.
Laird, J., Rosenbloom, P., & Newell, A. (1986). Chunking in Soar: The anatomy of a general learning mechanism. Machine Learning, 1, 11–46.
Lin, L. (1993). Scaling up reinforcement learning for robot control. Proceedings of the International Conference on Machine Learning (pp. 182–189). Amherst, MA: Morgan Kaufmann.
Littlestone, N. (1988). Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Machine Learning, 2, 285–318.
Littman, M., Cassandra, A., & Kaelbling, L. (1995). Learning policies for partially observable environments: Scaling up. Proceedings of the International Conference on Machine Learning (pp. 362–370). Tahoe, CA: Morgan Kaufmann.
Maes, P. (1991). Situated agents can have goals. In P. Maes (Ed.), Designing autonomous agents (pp. 49–70). Cambridge, MA: MIT Press.
McCarthy, J. (1958). Programs with common sense. Proceedings of the Symposium on the Mechanization of Thought Processes (Vol. 1, pp. 77–84), National Physical Laboratory. Reprinted in R. Brachman and H. Levesque (Eds.), Readings in Knowledge Representation, 1985, Los Altos, CA: Morgan Kaufmann.
Minton, S. (1990). Quantitative results concerning the utility of explanation based learning. Artificial Intelligence, 42, 363–391.
Mitchell, T., Keller, R., & Kedar-Cabelli, S. (1986). Explanation based learning: A unifying view. Machine Learning, 1, 47–80.
Mooney, R.J., & Califf, M.E. (1995). Induction of first-order decision lists: Results on learning the past tense of English verbs. Journal of Artificial Intelligence Research, 3, 1–24.
Muggleton, S. (1994). Inductive logic programming: Derivations, successes and shortcomings. SIGART Bulletin, 5(1), 5–11.
Muggleton, S., & De Raedt, L. (1994). Inductive logic programming: Theory and methods. Journal of Logic Programming, 20, 629–679.
Natarajan, B.K. (1989). On learning from exercises. Proceedings of the Conference on Computational Learning Theory (pp. 72–87). Santa Cruz, CA: Morgan Kaufmann.
Natarajan, B.K., & Tadepalli, P. (1988). Two new frameworks for learning. Proceedings of the International Conference on Machine Learning (pp. 402–415). Ann Arbor, Michigan: Morgan Kaufmann.
Newell, A. (1990). Unified theories of cognition. Cambridge, MA: Harvard University Press.
Newell, A., & Simon, H.A. (1972). Human problem solving. Englewood Cliffs, NJ: Prentice-Hall.
Nilsson, N.J. (1994). Teleo-reactive programs for agent control. Journal of Artificial Intelligence Research, 1, 139–158.
Pitt, L., & Valiant, L.G. (1988). Computational limitations on learning from examples. Journal of the ACM, 35, 965–984.
Quinlan, J.R. (1990). Learning logical definitions from relations. Machine Learning, 5, 239–266.
Rivest, R.L. (1987). Learning decision lists. Machine Learning, 2, 229–246.
Rosenbloom, P., & Laird, J. (1986). Mapping explanation based learning onto Soar. Proceedings of the National Conference on Artificial Intelligence (pp. 561–567). Philadelphia, PA: AAAI Press.
Rosenbloom, P.S., Laird, J.E., & Newell, A. (1993). The Soar papers: Research on integrated intelligence. Cambridge, MA: MIT Press.
Roth, D. (1995). Learning to reason: The non-monotonic case. Proceedings of the International Joint Conference of Artificial Intelligence (pp. 1178–1184). Montreal, Canada: Morgan Kaufmann.
Sammut, C., Hurst, S., Kedzier, D., & Michie, D. (1992). Learning to fly. Proceedings of the International Conference on Machine Learning (pp. 385–393). Aberdeen, Scotland: Morgan Kaufmann.
Schoppers, M. (1987). Universal plans for reactive robots in unpredictable domains. Proceedings of the International Joint Conference of Artificial Intelligence (pp. 1039–1046). Milan, Italy: Morgan Kaufmann.
Schoppers, M. (1989). In defense of reaction plans as caches. AI Magazine, 10(4), 51–62.
Selman, B. (1994). Near-optimal plans, tractability, and reactivity. Proceedings of the International Conference on Knowledge Representation and Reasoning (pp. 521–529). Bonn, Germany: Morgan Kaufmann.
Shavlik, J.W. (1990). Acquiring recursive and iterative concepts with explanation based learning. Machine Learning, 5, 39–70.
Slaney, J., & Thiebaux, S. (1996). Linear time near-optimal planning in the blocks world. Proceedings of the National Conference on Artificial Intelligence (pp. 1208–1214). Portland, Oregon: AAAI Press.
Sutton, R.S. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3, 9–44.
Sutton, R.S. (1990). Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. Proceedings of the International Conference on Machine Learning (pp. 216–224). Austin, Texas: Morgan Kaufmann.
Tadepalli, P. (1991). A formalization of explanation based macro-operator learning. Proceedings of the International Joint Conference of Artificial Intelligence (pp. 616–622). Sydney, Australia: Morgan Kaufmann.
Tadepalli, P. (1992). A theory of unsupervised speedup learning. Proceedings of the National Conference on Artificial Intelligence (pp. 229–234). San Jose, CA: AAAI Press.
Tadepalli, P., & Natarajan, B. (1996). A formal framework for speedup learning from problems and solutions. Journal of Artificial Intelligence Research, 4, 445–475.
Tesauro, G. (1992). Temporal difference learning of backgammon strategy. Proceedings of the International Conference on Machine Learning (pp. 451–457). Aberdeen, Scotland: Morgan Kaufmann.
Tesauro, G. (1995). Temporal difference learning and TD-Gammon. Communications of the ACM, 38, 58–68.
Valiant, L.G. (1984). A theory of the learnable. Communications of the ACM, 27, 1134–1142.
Valiant, L.G. (1985). Learning disjunctions of conjunctions. Proceedings of the International Joint Conference of Artificial Intelligence (pp. 560–566). Los Angeles, CA: Morgan Kaufmann.
Valiant, L.G. (1994). Circuits of the mind. Oxford, UK: Oxford University Press.
Valiant, L.G. (1995). Rationality. Proceedings of the Conference on Computational Learning Theory (pp. 3–14). Santa Cruz, CA: ACM Press.
Valiant, L.G. (1996). A neuroidal architecture for cognitive computation (Tech. Rep. TR-11-96). Harvard University: Aiken Computation Lab.
VanLehn, K. (1987). Learning one subprocedure per lesson. Artificial Intelligence, 31, 1–40.
Veloso, M. (1992). Learning by analogical reasoning in general problem solving. Ph.D. thesis, School of Computer Science, Carnegie Mellon University. Also appeared as Technical Report CMU-CS-92-174.
Veloso, M., Carbonell, J., Perez, A., Borrajo, D., Fink, E., & Blythe, J. (1995). Integrating learning and planning: The PRODIGY architecture. Journal of Experimental and Theoretical Artificial Intelligence, 7, 81–120.
Vera, A., & Simon, H. (1993). Situated action: A symbolic interpretation. Cognitive Science, 17, 7–48.
Watkins, C., & Dayan, P. (1992). Q-learning. Machine Learning, 8, 279–292.
Weld, D. (1994). An introduction to least commitment planning. AI Magazine, 15(4), 27–61.
Zelle, J.M., & Mooney, R.J. (1994). Inducing deterministic Prolog parsers from treebanks: A machine learning approach. Proceedings of the National Conference on Artificial Intelligence (pp. 748–753). Seattle, Washington: AAAI Press.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Khardon, R. Learning to Take Actions. Machine Learning 35, 57–90 (1999). https://doi.org/10.1023/A:1007571119753
Issue Date:
DOI: https://doi.org/10.1023/A:1007571119753