Learning agents for uncertain environments (extended abstract)
release_bnv35r7dzfdzfdhch3qea6amdy
by
Stuart Russell
References
NOTE: currently batch computed and may include additional references sources, or be missing recent changes, compared to entity reference list.Showing 1 - 24 of 24 references (in 89ms) | ||
---|---|---|
[b0] via grobid |
Astrom, K. J. (1965). Optimal control of Markov decision processes with incomplete state estimation. J. Math. Anal. Applic., 10, 174-205.
| |
[b1] via grobid |
Bertsekas, D. C., & Tsitsiklis, J. N. (1996). Neuro-dynamic pro- gramming. Athena Scientific, Belmont, Mass.
| |
[b2] via fuzzy |
Adaptive Probabilistic Networks with Hidden Variables
John Binder, Daphne Koller, Stuart J. Russell, Keiji Kanazawa 1997 Machine Learning doi:10.1023/a:1007421730016 dblp:journals/ml/BinderKRK97 | |
[b3] via fuzzy |
Space-Efficient Inference in Dynamic Probabilistic Networks
John Binder, Kevin P. Murphy, Stuart J. Russell 1997 International Joint Conference on Artificial Intelligence dblp:conf/ijcai/BinderMR97 |
web.archive.org [PDF]
|
[b4] via fuzzy |
Tractable Inference for Complex Stochastic Processes
Xavier Boyen, Daphne Koller 1998 Conference on Uncertainty in Artificial Intelligence dblp:conf/uai/BoyenK98 | |
[b5] via grobid |
Dean, T., & Kanazawa, K. (1989). A model for reasoning about persistence and causation. Computational Intelligence, 5(3), 142-150.
| |
[b6] via fuzzy |
A Novel Reinforcement Model of Birdsong Vocalization Learning
Kenji Doya, Terrence J. Sejnowski 1994 Neural Information Processing Systems dblp:conf/nips/DoyaS94 |
web.archive.org [PDF]
|
[b7] via grobid |
Farley, C. T., & Taylor, C. R. (1991). A mechanical trigger for the trot-gallop transition in horses. Science, 253(5017), 306- 308.
| |
[b8] via fuzzy |
Learning the Structure of Dynamic Probabilistic Networks
Nir Friedman, Kevin P. Murphy, Stuart J. Russell 1998 Conference on Uncertainty in Artificial Intelligence dblp:conf/uai/FriedmanMR98 | |
[b9] via grobid |
Hoyt, D., & Taylor, C. (1981). Gait and the energetics of locomotion in horses. Nature, 292, 239-240.
| |
[b10] via grobid |
Kaelbling, L. P., Littman, M. L., & Moore, A. W. (1996). Reinforce- ment learning: A survey. Journal of Artificial Intelligence Research, 4, 237-285.
| |
[b11] via fuzzy |
Stochastic Simulation Algorithms for Dynamic Probabilistic Networks
[report]
Keiji Kanazawa, Daphne Koller, Stuart Russell 2013 pre-print version:v1 number:UAI-P-1995-PG-346-351 arXiv:1302.4965v1 |
web.archive.org [PDF]
|
[b12] via grobid |
Keeney, R. L., & Raiffa, H. (1976). Decisions with Multiple Objec- tives: Preferences and Value Tradeoffs. Wiley, New York.
| |
[b13] via grobid |
McCallum, A. R. (1993). Overcoming incomplete perception with utile distinction memory. In Proceedings of the Tenth In- ternational Conference on Machine Learning, pp. 190-196
| |
[b14] via grobid |
Amherst, Massachusetts. Morgan Kaufmann.
| |
[b15] via fuzzy |
Bee foraging in uncertain environments using predictive hebbian
learning
P. Read Montague, Peter Dayan, Christophe Person, Terrence J. Sejnowski 1995 Nature doi:10.1038/377725a0 pmid:7477260 |
web.archive.org [PDF]
|
[b16] via fuzzy |
Approximating Optimal Policies for Partially Observable Stochastic Domains
Ronald Parr, Stuart J. Russell 1995 International Joint Conference on Artificial Intelligence dblp:conf/ijcai/ParrR95 |
web.archive.org [PDF]
|
[b17] via fuzzy |
Reinforcement Learning with Hierarchies of Machines
Ronald Parr, Stuart J. Russell 1997 Neural Information Processing Systems dblp:conf/nips/ParrR97 |
web.archive.org [PDF]
|
[b18] via grobid |
Russell, S. J., & Norvig, P. (1995). Artificial Intelligence: A Modern Approach. Prentice-Hall, Englewood Cliffs, New Jersey.
| |
[b19] via grobid |
Rust, J. (1994). Do people behave according to bellman's principal of optimality?. Submitted to Journal of Economic Perspec- tives.
| |
[b20] via grobid |
Sargent, T. J. (1978). Estimation of dynamic labor demand schedules under rational expectations. Journal of Political Economy, 86(6), 1009-1044.
| |
[b21] via fuzzy |
Escape, Avoidance, and Imitation: A Neural Network Approach
Nestor A. Schmajuk, B. Silvano Zanutto 1997 Adaptive Behavior doi:10.1177/105971239700600103 | |
[b22] via grobid |
Sutton, R. S. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3, 9-44.
| |
[b23] via fuzzy |
Operant Conditioning in Skinnerbots
David S. Touretzky, Lisa M. Saksida 1997 Adaptive Behavior doi:10.1177/105971239700500302 |
web.archive.org [PDF]
|