Abstract
Dynamic Programming (DP) is known to be a standard optimization tool for solving Stochastic Optimal Control (SOC) problems, either over a finite or an infinite horizon of stages. Under very general assumptions, commonly employed numerical algorithms are based on approximations of the cost-to-go functions, by means of suitable parametric models built from a set of sampling points in the d-dimensional state space. Here the problem of sample complexity, i.e., how “fast” the number of points must grow with the input dimension in order to have an accurate estimate of the cost-to-go functions in typical DP approaches such as value iteration and policy iteration, is discussed. It is shown that a choice of the sampling based on low-discrepancy sequences, commonly used for efficient numerical integration, permits to achieve, under suitable hypotheses, an almost linear sample complexity, thus contributing to mitigate the curse of dimensionality of the approximate DP procedure.
Similar content being viewed by others
References
Bellman, R.: Dynamic Programming. Princeton University Press, Princeton (1957)
Bellman, R., Dreyfus, S.: Applied Dynamic Programming. Princeton University Press, Princeton (1962)
Larson, R.E.: State Increment Dynamic Programming. Elsevier, New York (1968)
Puterman, M.: Markov Decision Processes. Wiley, New York (1994)
Bertsekas, D.: Dynamic Programming and Optimal Control, 2nd edn., vol. 1 Athena Scientific, Belmont (2000)
Jacobson, D., Mayne, D.: Differential Dynamic Programming. Academic, New York (1970)
Bellman, R., Kalaba, R., Kotkin, B.: Polynomial approximation—a new computational technique in dynamic programming allocation processes. Math. Comput. 17, 155–161 (1963)
Bertsekas, D.: Convergence of discretization procedures in dynamic programming. IEEE Trans. Autom. Control 20, 415–419 (1975)
Foufoula-Georgiou, E., Kitanidis, P.: Gradient dynamic programming for stochastic optimal control of multidimensional water resources systems. Water Resour. Res. 24, 1345–1359 (1988)
Johnson, S., Stedinger, J., Shoemaker, C., Li, Y., Tejada-Guibert, J.: Numerical solution of continuous-state dynamic programs using linear and spline interpolation. Oper. Res. 41, 484–500 (1993)
Chow, C., Tsitsiklis, J.: An optimal multigrid algorithm for continuous state discrete time stochastic control. IEEE Trans. Autom. Control 36, 898–914 (1991)
Chen, V., Ruppert, D., Shoemaker, C.: Applying experimental design and regression splines to high-dimensional continuous-state stochastic dynamic programming. Oper. Res. 47, 38–53 (1999)
Bertsekas, D., Tsitsiklis, J.: Neuro-Dynamic Programming. Athena Scientific, Belmont (1996)
Vapnik, V.N.: Statistical Learning Theory. Wiley, New York (1995)
Hammersley, J.M., Handscomb, D.C.: Monte Carlo Methods. Methuen, London (1964)
Cervellera, C., Muselli, M.: Deterministic design for neural network learning: An approach based on discrepancy. IEEE Trans. Neural Netw. 15, 533–543 (2004)
Cervellera, C., Chen, V.C., Wen, A.: Optimization of a large-scale water reservoir network by stochastic dynamic programming with efficient state space discretization. Eur. J. Oper. Res. 171(3), 1139–1151 (2006)
Cervellera, C., Chen, V., Wen, A.: Neural network and regression spline value function approximations for stochastic dynamic programming. Comput. Oper. Res. 34(1), 70–90 (2007)
Baglietto, M., Cervellera, C., Parisini, T., Sanguineti, M., Zoppoli, R.: Neural approximators, dynamic programming and stochastic approximation. In: Proc. 19th Am. Contr. Conf., pp. 3304–3308, 2000
Zoppoli, R., Sanguineti, M., Parisini, T.: Approximating networks and extended Ritz method for the solution of functional optimization problems. J. Optim. Theory Appl. 112, 403–439 (2002)
Fang, K.-T., Wang, Y.: Number-Theoretic Methods in Statistics. Chapman & Hall, London (1994)
Alon, N., Spencer, J.: The Probabilistic Method. Wiley, New York (2000)
Niederreiter, H.: Random Number Generation and Quasi-Monte Carlo Methods. SIAM, Philadelphia (1992)
Barron, A.: Universal approximation bounds for superpositions of a sigmoidal function. IEEE Trans. Inf. Theory 39, 930–945 (1993)
Niyogi, P., Girosi, F.: On the relationship between generalization error, hypothesis complexity, and sample complexity for radial basis functions. Neural Comput. 8, 819–842 (1996)
Breiman, L.: Hinging hyperplanes for regression, classification and function approximation. IEEE Trans. Inf. Theory 39, 993–1013 (1993)
Stokey, N., Lucas, R., Prescott, E.: Recursive Methods in Economic Dynamics. Harvard University Press, Cambridge (1989)
Dudley, R.M.: Real Analysis and Probability. Wadsworth & Brooks/Cole, Pacific Grove (1989)
Bratley, P., Fox, B.L., Niederreiter, H.: Programs to generate Niederreiter’s low-discrepancy sequences. ACM Trans. Math. Softw. 20(4), 494–495 (1994)
Chen, V.C.P., Tsui, K.-L., Barton, R.R., Allen, J.K.: A review of design and modeling in computer experiments. In: Rao, C.R., Khattree, R. (eds.) Handbook in Industrial Statistics, pp. 231–261. Elsevier, Amsterdam (2003)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Cervellera, C., Muselli, M. Efficient sampling in approximate dynamic programming algorithms. Comput Optim Appl 38, 417–443 (2007). https://doi.org/10.1007/s10589-007-9054-8
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10589-007-9054-8