Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–42 of 42 results for author: Petrik, M

.
  1. arXiv:2407.06329  [pdf, other

    cs.LG cs.AI

    Solving Multi-Model MDPs by Coordinate Ascent and Dynamic Programming

    Authors: Xihong Su, Marek Petrik

    Abstract: Multi-model Markov decision process (MMDP) is a promising framework for computing policies that are robust to parameter uncertainty in MDPs. MMDPs aim to find a policy that maximizes the expected return over a distribution of MDP models. Because MMDPs are NP-hard to solve, most methods resort to approximations. In this paper, we derive the policy gradient of MMDPs and propose CADP, which combines… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: Accepted at UAI 2023

  2. arXiv:2404.05055  [pdf, other

    cs.LG cs.AI

    Percentile Criterion Optimization in Offline Reinforcement Learning

    Authors: Elita A. Lobo, Cyrus Cousins, Yair Zick, Marek Petrik

    Abstract: In reinforcement learning, robust policies for high-stakes decision-making problems with limited data are usually computed by optimizing the \emph{percentile criterion}. The percentile criterion is approximately solved by constructing an \emph{ambiguity set} that contains the true model with high probability and optimizing the policy for the worst model in the set. Since the percentile criterion i… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

    Comments: Accepted at Neurips 2023

  3. arXiv:2404.04714  [pdf, other

    cs.LG cs.AI cs.CR

    Data Poisoning Attacks on Off-Policy Policy Evaluation Methods

    Authors: Elita Lobo, Harvineet Singh, Marek Petrik, Cynthia Rudin, Himabindu Lakkaraju

    Abstract: Off-policy Evaluation (OPE) methods are a crucial tool for evaluating policies in high-stakes domains such as healthcare, where exploration is often infeasible, unethical, or expensive. However, the extent to which such methods can be trusted under adversarial threats to data quality is largely unexplored. In this work, we make the first attempt at investigating the sensitivity of OPE methods to m… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

    Comments: Accepted at UAI 2022

  4. arXiv:2312.03618  [pdf, other

    math.OC cs.GT

    Beyond discounted returns: Robust Markov decision processes with average and Blackwell optimality

    Authors: Julien Grand-Clement, Marek Petrik, Nicolas Vieille

    Abstract: Robust Markov Decision Processes (RMDPs) are a widely used framework for sequential decision-making under parameter uncertainty. RMDPs have been extensively studied when the objective is to maximize the discounted return, but little is known for average optimality (optimizing the long-run average of the rewards obtained over time) and Blackwell optimality (remaining discount optimal for all discou… ▽ More

    Submitted 7 March, 2024; v1 submitted 6 December, 2023; originally announced December 2023.

  5. arXiv:2306.01237  [pdf, other

    cs.LG stat.ML

    Bayesian Regret Minimization in Offline Bandits

    Authors: Marek Petrik, Guy Tennenholtz, Mohammad Ghavamzadeh

    Abstract: We study how to make decisions that minimize Bayesian regret in offline linear bandits. Prior work suggests that one must take actions with maximum lower confidence bound (LCB) on their reward. We argue that the reliance on LCB is inherently flawed in this setting and propose a new algorithm that directly minimizes upper bounds on the Bayesian regret using efficient conic optimization solvers. Our… ▽ More

    Submitted 2 July, 2024; v1 submitted 1 June, 2023; originally announced June 2023.

    Journal ref: International Conference on Machine Learning, 2024

  6. arXiv:2304.12477  [pdf, other

    math.OC cs.AI

    On Dynamic Programming Decompositions of Static Risk Measures in Markov Decision Processes

    Authors: Jia Lin Hau, Erick Delage, Mohammad Ghavamzadeh, Marek Petrik

    Abstract: Optimizing static risk-averse objectives in Markov decision processes is difficult because they do not admit standard dynamic programming equations common in Reinforcement Learning (RL) algorithms. Dynamic programming decompositions that augment the state space with discrete risk levels have recently gained popularity in the RL community. Prior work has shown that these decompositions are optimal… ▽ More

    Submitted 23 April, 2024; v1 submitted 24 April, 2023; originally announced April 2023.

    Journal ref: Advances in Neural Information Processing Systems (Neurips), 2023

  7. arXiv:2302.00036  [pdf, other

    cs.LG

    Reducing Blackwell and Average Optimality to Discounted MDPs via the Blackwell Discount Factor

    Authors: Julien Grand-Clément, Marek Petrik

    Abstract: We introduce the Blackwell discount factor for Markov Decision Processes (MDPs). Classical objectives for MDPs include discounted, average, and Blackwell optimality. Many existing approaches to computing average-optimal policies solve for discounted optimal policies with a discount factor close to $1$, but they only work under strong or hard-to-verify assumptions such as ergodicity or weakly commu… ▽ More

    Submitted 31 January, 2023; originally announced February 2023.

    Journal ref: Advances in Neural Information Processing Systems (Neurips), 2023

  8. arXiv:2212.10439  [pdf, other

    cs.LG

    Policy Gradient in Robust MDPs with Global Convergence Guarantee

    Authors: Qiuhao Wang, Chin Pang Ho, Marek Petrik

    Abstract: Robust Markov decision processes (RMDPs) provide a promising framework for computing reliable policies in the face of model errors. Many successful reinforcement learning algorithms build on variations of policy-gradient methods, but adapting these methods to RMDPs has been challenging. As a result, the applicability of RMDPs to large, practical domains remains limited. This paper proposes a new D… ▽ More

    Submitted 7 June, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

    Journal ref: International Conference on Machine Learning, 2023

  9. arXiv:2209.10187  [pdf, other

    math.OC cs.LG

    On the convex formulations of robust Markov decision processes

    Authors: Julien Grand-Clément, Marek Petrik

    Abstract: Robust Markov decision processes (MDPs) are used for applications of dynamic optimization in uncertain environments and have been studied extensively. Many of the main properties and algorithms of MDPs, such as value iteration and policy iteration, extend directly to RMDPs. Surprisingly, there is no known analog of the MDP convex optimization formulation for solving RMDPs. This work describes the… ▽ More

    Submitted 13 December, 2023; v1 submitted 21 September, 2022; originally announced September 2022.

  10. arXiv:2209.04067  [pdf, other

    cs.LG cs.AI

    RASR: Risk-Averse Soft-Robust MDPs with EVaR and Entropic Risk

    Authors: Jia Lin Hau, Marek Petrik, Mohammad Ghavamzadeh, Reazul Russel

    Abstract: Prior work on safe Reinforcement Learning (RL) has studied risk-aversion to randomness in dynamics (aleatory) and to model uncertainty (epistemic) in isolation. We propose and analyze a new framework to jointly model the risk associated with epistemic and aleatory uncertainties in finite-horizon and discounted infinite-horizon MDPs. We call this framework that combines Risk-Averse and Soft-Robust… ▽ More

    Submitted 14 September, 2022; v1 submitted 8 September, 2022; originally announced September 2022.

    Journal ref: Artificial Intelligence and Statistics (AISTATS), 2023

  11. arXiv:2205.14202  [pdf, other

    math.OC cs.LG

    Robust Phi-Divergence MDPs

    Authors: Chin Pang Ho, Marek Petrik, Wolfram Wiesemann

    Abstract: In recent years, robust Markov decision processes (MDPs) have emerged as a prominent modeling framework for dynamic decision problems affected by uncertainty. In contrast to classical MDPs, which only account for stochasticity by modeling the dynamics through a stochastic process with a known transition kernel, robust MDPs additionally account for ambiguity by optimizing in view of the most advers… ▽ More

    Submitted 12 January, 2023; v1 submitted 27 May, 2022; originally announced May 2022.

    Journal ref: Advances in Neural Information Processing Systems (Neurips), 2022

  12. arXiv:2110.03224  [pdf, other

    cs.LG stat.CO

    Darts: User-Friendly Modern Machine Learning for Time Series

    Authors: Julien Herzen, Francesco Lässig, Samuele Giuliano Piazzetta, Thomas Neuer, Léo Tafti, Guillaume Raille, Tomas Van Pottelbergh, Marek Pasieka, Andrzej Skrodzki, Nicolas Huguenin, Maxime Dumonal, Jan Kościsz, Dennis Bader, Frédérick Gusset, Mounir Benheddi, Camila Williamson, Michal Kosinski, Matej Petrik, Gaël Grosch

    Abstract: We present Darts, a Python machine learning library for time series, with a focus on forecasting. Darts offers a variety of models, from classics such as ARIMA to state-of-the-art deep neural networks. The emphasis of the library is on offering modern machine learning functionalities, such as supporting multidimensional series, meta-learning on multiple series, training on large datasets, incorpor… ▽ More

    Submitted 19 May, 2022; v1 submitted 7 October, 2021; originally announced October 2021.

    Comments: Darts Github repository: https://github.com/unit8co/darts

    Journal ref: Journal of Machine Learning Research 23 (2022) 1-6

  13. arXiv:2106.06499  [pdf, other

    cs.LG cs.AI

    Policy Gradient Bayesian Robust Optimization for Imitation Learning

    Authors: Zaynah Javed, Daniel S. Brown, Satvik Sharma, Jerry Zhu, Ashwin Balakrishna, Marek Petrik, Anca D. Dragan, Ken Goldberg

    Abstract: The difficulty in specifying rewards for many real-world problems has led to an increased focus on learning rewards from human feedback, such as demonstrations. However, there are often many different reward functions that explain the human feedback, leaving agents with uncertainty over what the true reward function is. While most policy optimization approaches handle this uncertainty by optimizin… ▽ More

    Submitted 21 June, 2021; v1 submitted 11 June, 2021; originally announced June 2021.

    Comments: In proceedings of the International Conference on Machine Learning (ICML) 2021

  14. arXiv:2101.01251  [pdf, other

    cs.LG

    Robust Maximum Entropy Behavior Cloning

    Authors: Mostafa Hussein, Brendan Crowe, Marek Petrik, Momotaz Begum

    Abstract: Imitation learning (IL) algorithms use expert demonstrations to learn a specific task. Most of the existing approaches assume that all expert demonstrations are reliable and trustworthy, but what if there exist some adversarial demonstrations among the given data-set? This may result in poor decision-making performance. We propose a novel general frame-work to directly generate a policy from demon… ▽ More

    Submitted 4 January, 2021; originally announced January 2021.

    Comments: NeurIPS 2020 3rd Robot Learning Workshop: Grounding Machine Learning Development in the Real World

  15. arXiv:2011.14495  [pdf, other

    cs.LG cs.AI math.OC stat.ML

    Soft-Robust Algorithms for Batch Reinforcement Learning

    Authors: Elita A. Lobo, Mohammad Ghavamzadeh, Marek Petrik

    Abstract: In reinforcement learning, robust policies for high-stakes decision-making problems with limited data are usually computed by optimizing the percentile criterion, which minimizes the probability of a catastrophic failure. Unfortunately, such policies are typically overly conservative as the percentile criterion is non-convex, difficult to optimize, and ignores the mean performance. To overcome the… ▽ More

    Submitted 26 February, 2021; v1 submitted 29 November, 2020; originally announced November 2020.

  16. arXiv:2007.12315  [pdf, other

    cs.LG stat.ML

    Bayesian Robust Optimization for Imitation Learning

    Authors: Daniel S. Brown, Scott Niekum, Marek Petrik

    Abstract: One of the main challenges in imitation learning is determining what action an agent should take when outside the state distribution of the demonstrations. Inverse reinforcement learning (IRL) can enable generalization to new states by learning a parameterized reward function, but these approaches still face uncertainty over the true reward function and corresponding optimal policy. Existing safe… ▽ More

    Submitted 29 February, 2024; v1 submitted 23 July, 2020; originally announced July 2020.

    Comments: In proceedings NeurIPS 2020

  17. arXiv:2006.14364  [pdf, other

    cs.LG stat.ML

    Finite-Sample Analysis of Proximal Gradient TD Algorithms

    Authors: Bo Liu, Ji Liu, Mohammad Ghavamzadeh, Sridhar Mahadevan, Marek Petrik

    Abstract: In this paper, we analyze the convergence rate of the gradient temporal difference learning (GTD) family of algorithms. Previous analyses of this class of algorithms use ODE techniques to prove asymptotic convergence, and to the best of our knowledge, no finite-sample analysis has been done. Moreover, there has been not much work on finite-sample analysis for convergent off-policy reinforcement le… ▽ More

    Submitted 3 July, 2020; v1 submitted 6 June, 2020; originally announced June 2020.

    Comments: 31st Conference on Uncertainty in Artificial Intelligence (UAI). arXiv admin note: substantial text overlap with arXiv:2006.03976

  18. arXiv:2006.11679  [pdf, other

    cs.LG math.OC stat.ML

    Entropic Risk Constrained Soft-Robust Policy Optimization

    Authors: Reazul Hasan Russel, Bahram Behzadian, Marek Petrik

    Abstract: Having a perfect model to compute the optimal policy is often infeasible in reinforcement learning. It is important in high-stakes domains to quantify and manage risk induced by model uncertainties. Entropic risk measure is an exponential utility-based convex risk measure that satisfies many reasonable properties. In this paper, we propose an entropic risk constrained policy gradient and actor-cri… ▽ More

    Submitted 20 June, 2020; originally announced June 2020.

  19. arXiv:2006.09484  [pdf, other

    cs.LG math.OC stat.ML

    Partial Policy Iteration for L1-Robust Markov Decision Processes

    Authors: Chin Pang Ho, Marek Petrik, Wolfram Wiesemann

    Abstract: Robust Markov decision processes (MDPs) allow to compute reliable solutions for dynamic decision problems whose evolution is modeled by rewards and partially-known transition probabilities. Unfortunately, accounting for uncertainty in the transition probabilities significantly increases the computational complexity of solving robust MDPs, which severely limits their scalability. This paper describ… ▽ More

    Submitted 16 June, 2020; originally announced June 2020.

  20. arXiv:2006.03976  [pdf, other

    cs.LG stat.ML

    Proximal Gradient Temporal Difference Learning: Stable Reinforcement Learning with Polynomial Sample Complexity

    Authors: Bo Liu, Ian Gemp, Mohammad Ghavamzadeh, Ji Liu, Sridhar Mahadevan, Marek Petrik

    Abstract: In this paper, we introduce proximal gradient temporal difference learning, which provides a principled way of designing and analyzing true stochastic gradient temporal difference learning algorithms. We show how gradient TD (GTD) reinforcement learning methods can be formally derived, not by starting from their original objective functions, as previously attempted, but rather from a primal-dual s… ▽ More

    Submitted 6 June, 2020; originally announced June 2020.

    Comments: Journal of Artificial Intelligence (JAIR)

  21. arXiv:2004.07199  [pdf, other

    physics.space-ph astro-ph.IM

    MMS SITL Ground Loop: Automating the burst data selection process

    Authors: Matthew R. Argall, Colin Small, Samantha Piatt, Liam Breen, Marek Petrik, Kim Kokkonen, Julie Barnum, Kristopher Larsen, Frederick D. Wilder, Mitsuo Oka, William R. Paterson, Roy B. Torbert, Robert E. Ergun, Tai Phan, Barbara L. Giles, James L. Burch

    Abstract: Global-scale energy flow throughout Earth's magnetosphere (MSP) is catalyzed by processes that occur at Earth's magnetopause (MP). Magnetic reconnection is one process responsible for solar wind entry into and global convection within the MSP, and the MP location, orientation, and motion have an impact on the dynamics. Statistical studies that focus on these and other MP phenomena and characterist… ▽ More

    Submitted 20 July, 2020; v1 submitted 15 April, 2020; originally announced April 2020.

    Comments: 21 pages, 8 figures, 3 tables, submitted to Frontiers: Space Science

  22. arXiv:1912.02696  [pdf, other

    cs.LG cs.AI stat.ML

    Optimizing Norm-Bounded Weighted Ambiguity Sets for Robust MDPs

    Authors: Reazul Hasan Russel, Bahram Behzadian, Marek Petrik

    Abstract: Optimal policies in Markov decision processes (MDPs) are very sensitive to model misspecification. This raises serious concerns about deploying them in high-stake domains. Robust MDPs (RMDP) provide a promising framework to mitigate vulnerabilities by computing policies with worst-case guarantees in reinforcement learning. The solution quality of an RMDP depends on the ambiguity set, which is a qu… ▽ More

    Submitted 4 December, 2019; originally announced December 2019.

    Comments: arXiv admin note: substantial text overlap with arXiv:1910.10786

  23. arXiv:1910.10786  [pdf, other

    cs.LG cs.AI stat.ML

    Optimizing Percentile Criterion Using Robust MDPs

    Authors: Bahram Behzadian, Reazul Hasan Russel, Marek Petrik, Chin Pang Ho

    Abstract: We address the problem of computing reliable policies in reinforcement learning problems with limited data. In particular, we compute policies that achieve good returns with high confidence when deployed. This objective, known as the \emph{percentile criterion}, can be optimized using Robust MDPs~(RMDPs). RMDPs generalize MDPs to allow for uncertain transition probabilities chosen adversarially fr… ▽ More

    Submitted 25 February, 2021; v1 submitted 23 October, 2019; originally announced October 2019.

  24. arXiv:1904.08528  [pdf, other

    cs.LG cs.AI stat.ML

    Robust Exploration with Tight Bayesian Plausibility Sets

    Authors: Reazul H. Russel, Tianyi Gu, Marek Petrik

    Abstract: Optimism about the poorly understood states and actions is the main driving force of exploration for many provably-efficient reinforcement learning algorithms. We propose optimism in the face of sensible value functions (OFVF)- a novel data-driven Bayesian algorithm to constructing Plausibility sets for MDPs to explore robustly minimizing the worst case exploration cost. The method computes polici… ▽ More

    Submitted 17 April, 2019; originally announced April 2019.

  25. arXiv:1902.07605  [pdf, other

    cs.LG stat.ML

    Beyond Confidence Regions: Tight Bayesian Ambiguity Sets for Robust MDPs

    Authors: Marek Petrik, Reazul Hasan Russell

    Abstract: Robust MDPs (RMDPs) can be used to compute policies with provable worst-case guarantees in reinforcement learning. The quality and robustness of an RMDP solution are determined by the ambiguity set---the set of plausible transition probabilities---which is usually constructed as a multi-dimensional confidence region. Existing methods construct ambiguity sets as confidence regions using concentrati… ▽ More

    Submitted 20 February, 2019; originally announced February 2019.

  26. arXiv:1811.06512  [pdf, other

    cs.LG stat.ML

    Tight Bayesian Ambiguity Sets for Robust MDPs

    Authors: Reazul Hasan Russel, Marek Petrik

    Abstract: Robustness is important for sequential decision making in a stochastic dynamic environment with uncertain probabilistic parameters. We address the problem of using robust MDPs (RMDPs) to compute policies with provable worst-case guarantees in reinforcement learning. The quality and robustness of an RMDP solution is determined by its ambiguity set. Existing methods construct ambiguity sets that lea… ▽ More

    Submitted 15 November, 2018; originally announced November 2018.

    Comments: 5 pages. Accepted at Infer to Control Workshop at Neural Information Processing Systems (NIPS) 2018

  27. arXiv:1809.06995  [pdf, other

    cs.LG stat.ML

    Interpretable Reinforcement Learning with Ensemble Methods

    Authors: Alexander Brown, Marek Petrik

    Abstract: We propose to use boosted regression trees as a way to compute human-interpretable solutions to reinforcement learning problems. Boosting combines several regression trees to improve their accuracy without significantly reducing their inherent interpretability. Prior work has focused independently on reinforcement learning and on interpretable machine learning, but there has been little progress i… ▽ More

    Submitted 18 September, 2018; originally announced September 2018.

  28. arXiv:1708.00211  [pdf

    cond-mat.mtrl-sci

    Ab initio based analysis of grain boundary segregation in Al-Mg and Al-Zn binary alloys

    Authors: M. V. Petrik, A. R. Kuznetsov, N. Enikeev, Yu. N. Gornostyrev, R. Z. Valiev

    Abstract: Based on ab-initio simulations, we report on the nature of principally different mechanisms for interaction of Mg and Zn atoms with grain boundaries in Al alloys leading to different morphology of segregation. The Mg atoms segregate in relatively wide GB region with heterogeneous agglomerations due to the deformation mechanism of solute-GB interaction. In contrast, in the case of Zn atoms an elect… ▽ More

    Submitted 1 August, 2017; originally announced August 2017.

    Comments: 5 pages, 3 figures, APL

  29. arXiv:1706.04687  [pdf, other

    cs.LG stat.ML

    A Practical Method for Solving Contextual Bandit Problems Using Decision Trees

    Authors: Adam N. Elmachtoub, Ryan McNellis, Sechan Oh, Marek Petrik

    Abstract: Many efficient algorithms with strong theoretical guarantees have been proposed for the contextual multi-armed bandit problem. However, applying these algorithms in practice can be difficult because they require domain expertise to build appropriate features and to tune their parameters. We propose a new method for the contextual bandit problem that is simple, practical, and can be applied with li… ▽ More

    Submitted 19 October, 2018; v1 submitted 14 June, 2017; originally announced June 2017.

    Comments: Proceedings of the 33rd Conference on Uncertainty in Artificial Intelligence (UAI 2017)

  30. arXiv:1704.03926  [pdf, other

    cs.LG cs.AI stat.ML

    Value Directed Exploration in Multi-Armed Bandits with Structured Priors

    Authors: Bence Cserna, Marek Petrik, Reazul Hasan Russel, Wheeler Ruml

    Abstract: Multi-armed bandits are a quintessential machine learning problem requiring the balancing of exploration and exploitation. While there has been progress in developing algorithms with strong theoretical guarantees, there has been less focus on practical near-optimal finite-time performance. In this paper, we propose an algorithm for Bayesian multi-armed bandits that utilizes value-function-driven o… ▽ More

    Submitted 17 May, 2017; v1 submitted 12 April, 2017; originally announced April 2017.

  31. arXiv:1607.03842  [pdf, other

    stat.ML

    Safe Policy Improvement by Minimizing Robust Baseline Regret

    Authors: Marek Petrik, Yinlam Chow, Mohammad Ghavamzadeh

    Abstract: An important problem in sequential decision-making under uncertainty is to use limited data to compute a safe policy, i.e., a policy that is guaranteed to perform at least as well as a given baseline strategy. In this paper, we develop and analyze a new model-based approach to compute a safe policy when we have access to an inaccurate dynamics model of the system with known accuracy guarantees. Ou… ▽ More

    Submitted 13 July, 2016; originally announced July 2016.

  32. arXiv:1606.05819  [pdf, other

    stat.ML cs.LG

    Building an Interpretable Recommender via Loss-Preserving Transformation

    Authors: Amit Dhurandhar, Sechan Oh, Marek Petrik

    Abstract: We propose a method for building an interpretable recommender system for personalizing online content and promotions. Historical data available for the system consists of customer features, provided content (promotions), and user responses. Unlike in a standard multi-class classification setting, misclassification costs depend on both recommended actions and customers. Our method transforms such a… ▽ More

    Submitted 18 June, 2016; originally announced June 2016.

    Comments: Presented at 2016 ICML Workshop on Human Interpretability in Machine Learning (WHI 2016), New York, NY

  33. arXiv:1510.04905  [pdf, other

    stat.ML cs.LG

    Robust Partially-Compressed Least-Squares

    Authors: Stephen Becker, Ban Kawas, Marek Petrik, Karthikeyan N. Ramamurthy

    Abstract: Randomized matrix compression techniques, such as the Johnson-Lindenstrauss transform, have emerged as an effective and practical way for solving large-scale problems efficiently. With a focus on computational efficiency, however, forsaking solutions quality and accuracy becomes the trade-off. In this paper, we investigate compressed least-squares problems and propose new models and algorithms tha… ▽ More

    Submitted 16 October, 2015; originally announced October 2015.

  34. arXiv:1506.04514  [pdf, other

    math.OC

    Robust Policy Optimization with Baseline Guarantees

    Authors: Yinlam Chow, Marek Petrik, Mohammad Ghavamzadeh

    Abstract: Our goal is to compute a policy that guarantees improved return over a baseline policy even when the available MDP model is inaccurate. The inaccurate model may be constructed, for example, by system identification techniques when the true model is inaccessible. When the modeling error is large, the standard solution to the constructed model has no performance guarantees with respect to the true m… ▽ More

    Submitted 15 June, 2015; v1 submitted 15 June, 2015; originally announced June 2015.

  35. arXiv:1408.3275  [pdf

    cond-mat.mtrl-sci

    Role of magnetic degrees of freedom in a scenario of phase transformations in steel

    Authors: I. K. Razumov, D. V. Boukhvalov, M. V. Petrik, V. N. Urtsev, A. V. Shmakov, M. I. Katsnelson, Yu. N. Gornostyrev

    Abstract: The diversity of mesostructures formed in steel at cooling from a high-temperature austenite ("gamma") phase is determined by the interplay of shear reconstructions of crystal lattice and diffusion of carbon. Combining first-principles calculations with large-scale phase-field simulations we demonstrate a decisive role of magnetic degrees of freedom in the formation of energy relief along the Bain… ▽ More

    Submitted 4 September, 2014; v1 submitted 14 August, 2014; originally announced August 2014.

    Journal ref: Phys. Rev. B 90, 094101 (2014)

  36. A Bilinear Programming Approach for Multiagent Planning

    Authors: Marek Petrik, Shlomo Zilberstein

    Abstract: Multiagent planning and coordination problems are common and known to be computationally hard. We show that a wide range of two-agent problems can be formulated as bilinear programs. We present a successive approximation algorithm that significantly outperforms the coverage set algorithm, which is the state-of-the-art method for this class of multiagent problems. Because the algorithm is formula… ▽ More

    Submitted 15 January, 2014; originally announced January 2014.

    Journal ref: Journal Of Artificial Intelligence Research, Volume 35, pages 235-274, 2009

  37. arXiv:1309.6857  [pdf

    cs.AI

    Solution Methods for Constrained Markov Decision Process with Continuous Probability Modulation

    Authors: Marek Petrik, Dharmashankar Subramanian, Janusz Marecki

    Abstract: We propose solution methods for previously-unsolved constrained MDPs in which actions can continuously modify the transition probabilities within some acceptable sets. While many methods have been proposed to solve regular MDPs with large state sets, there are few practical approaches for solving constrained MDPs with large action sets. In particular, we show that the continuous action sets can be… ▽ More

    Submitted 26 September, 2013; originally announced September 2013.

    Comments: Appears in Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence (UAI2013)

    Report number: UAI-P-2013-PG-518-526

  38. arXiv:1210.4901  [pdf

    q-fin.PM cs.AI cs.GT

    An Approximate Solution Method for Large Risk-Averse Markov Decision Processes

    Authors: Marek Petrik, Dharmashankar Subramanian

    Abstract: Stochastic domains often involve risk-averse decision makers. While recent work has focused on how to model risk in Markov decision processes using risk measures, it has not addressed the problem of solving large risk-averse formulations. In this paper, we propose and analyze a new method for solving large risk-averse MDPs with hybrid continuous-discrete state spaces and continuous action spaces.… ▽ More

    Submitted 16 October, 2012; originally announced October 2012.

    Comments: Appears in Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence (UAI2012)

    Report number: UAI-P-2012-PG-805-814

  39. arXiv:1205.1782  [pdf, other

    stat.ML cs.LG

    Approximate Dynamic Programming By Minimizing Distributionally Robust Bounds

    Authors: Marek Petrik

    Abstract: Approximate dynamic programming is a popular method for solving large Markov decision processes. This paper describes a new class of approximate dynamic programming (ADP) methods- distributionally robust ADP-that address the curse of dimensionality by minimizing a pessimistic bound on the policy loss. This approach turns ADP into an optimization problem, for which we derive new mathematical progra… ▽ More

    Submitted 21 May, 2012; v1 submitted 8 May, 2012; originally announced May 2012.

    Comments: In Proceedings of International Conference on Machine Learning, 2012

  40. arXiv:1106.6102  [pdf, other

    q-fin.RM math.OC

    Tight Approximations of Dynamic Risk Measures

    Authors: Dan A. Iancu, Marek Petrik, Dharmashankar Subramanian

    Abstract: This paper compares two different frameworks recently introduced in the literature for measuring risk in a multi-period setting. The first corresponds to applying a single coherent risk measure to the cumulative future costs, while the second involves applying a composition of one-step coherent risk mappings. We summarize the relative strengths of the two methods, characterize several necessary an… ▽ More

    Submitted 23 August, 2013; v1 submitted 29 June, 2011; originally announced June 2011.

  41. arXiv:1006.2743  [pdf, other

    cs.AI

    Global Optimization for Value Function Approximation

    Authors: Marek Petrik, Shlomo Zilberstein

    Abstract: Existing value function approximation methods have been successfully used in many applications, but they often lack useful a priori error bounds. We propose a new approximate bilinear programming formulation of value function approximation, which employs global optimization. The formulation provides strong a priori guarantees on both robust and expected policy loss by minimizing specific norms of… ▽ More

    Submitted 14 June, 2010; originally announced June 2010.

  42. arXiv:1005.1860  [pdf, ps, other

    cs.AI

    Feature Selection Using Regularization in Approximate Linear Programs for Markov Decision Processes

    Authors: Marek Petrik, Gavin Taylor, Ron Parr, Shlomo Zilberstein

    Abstract: Approximate dynamic programming has been used successfully in a large variety of domains, but it relies on a small set of provided approximation features to calculate solutions reliably. Large and rich sets of features can cause existing algorithms to overfit because of a limited number of samples. We address this shortcoming using $L_1$ regularization in approximate linear programming. Because th… ▽ More

    Submitted 20 May, 2010; v1 submitted 11 May, 2010; originally announced May 2010.

    Comments: Technical report corresponding to the ICML2010 submission of the same name