Search | arXiv e-print repository

Why long model-based rollouts are no reason for bad Q-value estimates

Authors: Philipp Wissmann, Daniel Hein, Steffen Udluft, Volker Tresp

Abstract: This paper explores the use of model-based offline reinforcement learning with long model rollouts. While some literature criticizes this approach due to compounding errors, many practitioners have found success in real-world applications. The paper aims to demonstrate that long rollouts do not necessarily result in exponentially growing errors and can actually produce better Q-value estimates tha… ▽ More This paper explores the use of model-based offline reinforcement learning with long model rollouts. While some literature criticizes this approach due to compounding errors, many practitioners have found success in real-world applications. The paper aims to demonstrate that long rollouts do not necessarily result in exponentially growing errors and can actually produce better Q-value estimates than model-free methods. These findings can potentially enhance reinforcement learning techniques. △ Less

Submitted 16 July, 2024; originally announced July 2024.

Comments: Accepted at ESANN 2024

arXiv:2407.10856 [pdf, other]

Physics-Inspired Generative Models in Medical Imaging: A Review

Authors: Dennis Hein, Afshin Bozorgpour, Dorit Merhof, Ge Wang

Abstract: Physics-inspired Generative Models (GMs), in particular Diffusion Models (DMs) and Poisson Flow Models (PFMs), enhance Bayesian methods and promise great utility in medical imaging. This review examines the transformative role of such generative methods. First, a variety of physics-inspired GMs, including Denoising Diffusion Probabilistic Models (DDPMs), Score-based Diffusion Models (SDMs), and Po… ▽ More Physics-inspired Generative Models (GMs), in particular Diffusion Models (DMs) and Poisson Flow Models (PFMs), enhance Bayesian methods and promise great utility in medical imaging. This review examines the transformative role of such generative methods. First, a variety of physics-inspired GMs, including Denoising Diffusion Probabilistic Models (DDPMs), Score-based Diffusion Models (SDMs), and Poisson Flow Generative Models (PFGMs and PFGM++), are revisited, with an emphasis on their accuracy, robustness as well as acceleration. Then, major applications of physics-inspired GMs in medical imaging are presented, comprising image reconstruction, image generation, and image analysis. Finally, future research directions are brainstormed, including unification of physics-inspired GMs, integration with Vision-Language Models (VLMs), and potential novel applications of GMs. Since the development of generative methods has been rapid, this review will hopefully give peers and learners a timely snapshot of this new family of physics-driven generative models and help capitalize their enormous potential for medical imaging. △ Less

Submitted 23 August, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

arXiv:2404.10017 [pdf, other]

Model-based Offline Quantum Reinforcement Learning

Authors: Simon Eisenmann, Daniel Hein, Steffen Udluft, Thomas A. Runkler

Abstract: This paper presents the first algorithm for model-based offline quantum reinforcement learning and demonstrates its functionality on the cart-pole benchmark. The model and the policy to be optimized are each implemented as variational quantum circuits. The model is trained by gradient descent to fit a pre-recorded data set. The policy is optimized with a gradient-free optimization scheme using the… ▽ More This paper presents the first algorithm for model-based offline quantum reinforcement learning and demonstrates its functionality on the cart-pole benchmark. The model and the policy to be optimized are each implemented as variational quantum circuits. The model is trained by gradient descent to fit a pre-recorded data set. The policy is optimized with a gradient-free optimization scheme using the return estimate given by the model as the fitness function. This model-based approach allows, in principle, full realization on a quantum computer during the optimization phase and gives hope that a quantum advantage can be achieved as soon as sufficiently powerful quantum computers are available. △ Less

Submitted 14 April, 2024; originally announced April 2024.

arXiv:2402.08159 [pdf, other]

Poisson flow consistency models for low-dose CT image denoising

Authors: Dennis Hein, Adam Wang, Ge Wang

Abstract: Diffusion and Poisson flow models have demonstrated remarkable success for a wide range of generative tasks. Nevertheless, their iterative nature results in computationally expensive sampling and the number of function evaluations (NFE) required can be orders of magnitude larger than for single-step methods. Consistency models are a recent class of deep generative models which enable single-step s… ▽ More Diffusion and Poisson flow models have demonstrated remarkable success for a wide range of generative tasks. Nevertheless, their iterative nature results in computationally expensive sampling and the number of function evaluations (NFE) required can be orders of magnitude larger than for single-step methods. Consistency models are a recent class of deep generative models which enable single-step sampling of high quality data without the need for adversarial training. In this paper, we introduce a novel image denoising technique which combines the flexibility afforded in Poisson flow generative models (PFGM)++ with the, high quality, single step sampling of consistency models. The proposed method first learns a trajectory between a noise distribution and the posterior distribution of interest by training PFGM++ in a supervised fashion. These pre-trained PFGM++ are subsequently "distilled" into Poisson flow consistency models (PFCM) via an updated version of consistency distillation. We call this approach posterior sampling Poisson flow consistency models (PS-PFCM). Our results indicate that the added flexibility of tuning the hyperparameter D, the dimensionality of the augmentation variables in PFGM++, allows us to outperform consistency models, a current state-of-the-art diffusion-style model with NFE=1 on clinical low-dose CT images. Notably, PFCM is in itself a novel family of deep generative models and we provide initial results on the CIFAR-10 dataset. △ Less

Submitted 12 February, 2024; originally announced February 2024.

arXiv:2312.09754 [pdf, other]

PPFM: Image denoising in photon-counting CT using single-step posterior sampling Poisson flow generative models

Authors: Dennis Hein, Staffan Holmin, Timothy Szczykutowicz, Jonathan S Maltz, Mats Danielsson, Ge Wang, Mats Persson

Abstract: Diffusion and Poisson flow models have shown impressive performance in a wide range of generative tasks, including low-dose CT image denoising. However, one limitation in general, and for clinical applications in particular, is slow sampling. Due to their iterative nature, the number of function evaluations (NFE) required is usually on the order of $10-10^3$, both for conditional and unconditional… ▽ More Diffusion and Poisson flow models have shown impressive performance in a wide range of generative tasks, including low-dose CT image denoising. However, one limitation in general, and for clinical applications in particular, is slow sampling. Due to their iterative nature, the number of function evaluations (NFE) required is usually on the order of $10-10^3$, both for conditional and unconditional generation. In this paper, we present posterior sampling Poisson flow generative models (PPFM), a novel image denoising technique for low-dose and photon-counting CT that produces excellent image quality whilst keeping NFE=1. Updating the training and sampling processes of Poisson flow generative models (PFGM)++, we learn a conditional generator which defines a trajectory between the prior noise distribution and the posterior distribution of interest. We additionally hijack and regularize the sampling process to achieve NFE=1. Our results shed light on the benefits of the PFGM++ framework compared to diffusion models. In addition, PPFM is shown to perform favorably compared to current state-of-the-art diffusion-style models with NFE=1, consistency models, as well as popular deep learning and non-deep learning-based image denoising techniques, on clinical low-dose CT images and clinical images from a prototype photon-counting CT system. △ Less

Submitted 19 December, 2023; v1 submitted 15 December, 2023; originally announced December 2023.

arXiv:2309.02562 [pdf]

Recurrence-Free Survival Prediction for Anal Squamous Cell Carcinoma Chemoradiotherapy using Planning CT-based Radiomics Model

Authors: Shanshan Tang, Kai Wang, David Hein, Gloria Lin, Nina N. Sanford, Jing Wang

Abstract: Objectives: Approximately 30% of non-metastatic anal squamous cell carcinoma (ASCC) patients will experience recurrence after chemoradiotherapy (CRT), and currently available clinical variables are poor predictors of treatment response. We aimed to develop a model leveraging information extracted from radiation pretreatment planning CT to predict recurrence-free survival (RFS) in ASCC patients aft… ▽ More Objectives: Approximately 30% of non-metastatic anal squamous cell carcinoma (ASCC) patients will experience recurrence after chemoradiotherapy (CRT), and currently available clinical variables are poor predictors of treatment response. We aimed to develop a model leveraging information extracted from radiation pretreatment planning CT to predict recurrence-free survival (RFS) in ASCC patients after CRT. Methods: Radiomics features were extracted from planning CT images of 96 ASCC patients. Following pre-feature selection, the optimal feature set was selected via step-forward feature selection with a multivariate Cox proportional hazard model. The RFS prediction was generated from a radiomics-clinical combined model based on an optimal feature set with five repeats of five-fold cross validation. The risk stratification ability of the proposed model was evaluated with Kaplan-Meier analysis. Results: Shape- and texture-based radiomics features significantly predicted RFS. Compared to a clinical-only model, radiomics-clinical combined model achieves better performance in the testing cohort with higher C-index (0.80 vs 0.73) and AUC (0.84 vs 0.79 for 1-year RFS, 0.84 vs 0.78 for 2-year RFS, and 0.86 vs 0.83 for 3-year RFS), leading to distinctive high- and low-risk of recurrence groups (p<0.001). Conclusions: A treatment planning CT based radiomics and clinical combined model had improved prognostic performance in predicting RFS for ASCC patients treated with CRT as compared to a model using clinical features only. △ Less

Submitted 5 September, 2023; originally announced September 2023.

arXiv:2308.06127 [pdf, other]

doi 10.1109/SSCI52147.2023.10371978

Learning Control Policies for Variable Objectives from Offline Data

Authors: Marc Weber, Phillip Swazinna, Daniel Hein, Steffen Udluft, Volkmar Sterzing

Abstract: Offline reinforcement learning provides a viable approach to obtain advanced control strategies for dynamical systems, in particular when direct interaction with the environment is not available. In this paper, we introduce a conceptual extension for model-based policy search methods, called variable objective policy (VOP). With this approach, policies are trained to generalize efficiently over a… ▽ More Offline reinforcement learning provides a viable approach to obtain advanced control strategies for dynamical systems, in particular when direct interaction with the environment is not available. In this paper, we introduce a conceptual extension for model-based policy search methods, called variable objective policy (VOP). With this approach, policies are trained to generalize efficiently over a variety of objectives, which parameterize the reward function. We demonstrate that by altering the objectives passed as input to the policy, users gain the freedom to adjust its behavior or re-balance optimization targets at runtime, without need for collecting additional observation batches or re-training. △ Less

Submitted 11 August, 2023; originally announced August 2023.

Comments: 8 pages, 7 figures

Journal ref: 2023 IEEE Symposium Series on Computational Intelligence

arXiv:2206.04741 [pdf]

Quantum Policy Iteration via Amplitude Estimation and Grover Search -- Towards Quantum Advantage for Reinforcement Learning

Authors: Simon Wiedemann, Daniel Hein, Steffen Udluft, Christian Mendl

Abstract: We present a full implementation and simulation of a novel quantum reinforcement learning method. Our work is a detailed and formal proof of concept for how quantum algorithms can be used to solve reinforcement learning problems and shows that, given access to error-free, efficient quantum realizations of the agent and environment, quantum methods can yield provable improvements over classical Mon… ▽ More We present a full implementation and simulation of a novel quantum reinforcement learning method. Our work is a detailed and formal proof of concept for how quantum algorithms can be used to solve reinforcement learning problems and shows that, given access to error-free, efficient quantum realizations of the agent and environment, quantum methods can yield provable improvements over classical Monte-Carlo based methods in terms of sample complexity. Our approach shows in detail how to combine amplitude estimation and Grover search into a policy evaluation and improvement scheme. We first develop quantum policy evaluation (QPE) which is quadratically more efficient compared to an analogous classical Monte Carlo estimation and is based on a quantum mechanical realization of a finite Markov decision process (MDP). Building on QPE, we derive a quantum policy iteration that repeatedly improves an initial policy using Grover search until the optimum is reached. Finally, we present an implementation of our algorithm for a two-armed bandit MDP which we then simulate. △ Less

Submitted 10 May, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

arXiv:2201.05433 [pdf, ps, other]

Comparing Model-free and Model-based Algorithms for Offline Reinforcement Learning

Authors: Phillip Swazinna, Steffen Udluft, Daniel Hein, Thomas Runkler

Abstract: Offline reinforcement learning (RL) Algorithms are often designed with environments such as MuJoCo in mind, in which the planning horizon is extremely long and no noise exists. We compare model-free, model-based, as well as hybrid offline RL approaches on various industrial benchmark (IB) datasets to test the algorithms in settings closer to real world problems, including complex noise and partial… ▽ More Offline reinforcement learning (RL) Algorithms are often designed with environments such as MuJoCo in mind, in which the planning horizon is extremely long and no noise exists. We compare model-free, model-based, as well as hybrid offline RL approaches on various industrial benchmark (IB) datasets to test the algorithms in settings closer to real world problems, including complex noise and partially observable states. We find that on the IB, hybrid approaches face severe difficulties and that simpler algorithms, such as rollout based algorithms or model-free algorithms with simpler regularizers perform best on the datasets. △ Less

Submitted 14 January, 2022; originally announced January 2022.

Comments: Submitted to IFAC Conference on Intelligent Control and Automation Sciences (ICONS)2022

arXiv:2108.13381 [pdf, other]

doi 10.1145/3449726.3463131

Trustworthy AI for Process Automation on a Chylla-Haase Polymerization Reactor

Authors: Daniel Hein, Daniel Labisch

Abstract: In this paper, genetic programming reinforcement learning (GPRL) is utilized to generate human-interpretable control policies for a Chylla-Haase polymerization reactor. Such continuously stirred tank reactors (CSTRs) with jacket cooling are widely used in the chemical industry, in the production of fine chemicals, pigments, polymers, and medical products. Despite appearing rather simple, controlli… ▽ More In this paper, genetic programming reinforcement learning (GPRL) is utilized to generate human-interpretable control policies for a Chylla-Haase polymerization reactor. Such continuously stirred tank reactors (CSTRs) with jacket cooling are widely used in the chemical industry, in the production of fine chemicals, pigments, polymers, and medical products. Despite appearing rather simple, controlling CSTRs in real-world applications is quite a challenging problem to tackle. GPRL utilizes already existing data from the reactor and generates fully automatically a set of optimized simplistic control strategies, so-called policies, the domain expert can choose from. Note that these policies are white-box models of low complexity, which makes them easy to validate and implement in the target control system, e.g., SIMATIC PCS 7. However, despite its low complexity the automatically-generated policy yields a high performance in terms of reactor temperature control deviation, which we empirically evaluate on the original reactor template. △ Less

Submitted 30 August, 2021; originally announced August 2021.

Journal ref: Proceedings of the Genetic and Evolutionary Computation Conference Companion GECCO 21 (2021)

arXiv:2107.05479 [pdf, other]

Behavior Constraining in Weight Space for Offline Reinforcement Learning

Authors: Phillip Swazinna, Steffen Udluft, Daniel Hein, Thomas Runkler

Abstract: In offline reinforcement learning, a policy needs to be learned from a single pre-collected dataset. Typically, policies are thus regularized during training to behave similarly to the data generating policy, by adding a penalty based on a divergence between action distributions of generating and trained policy. We propose a new algorithm, which constrains the policy directly in its weight space i… ▽ More In offline reinforcement learning, a policy needs to be learned from a single pre-collected dataset. Typically, policies are thus regularized during training to behave similarly to the data generating policy, by adding a penalty based on a divergence between action distributions of generating and trained policy. We propose a new algorithm, which constrains the policy directly in its weight space instead, and demonstrate its effectiveness in experiments. △ Less

Submitted 12 July, 2021; originally announced July 2021.

Comments: Accepted at ESANN 2021

arXiv:2007.09964 [pdf, other]

Interpretable Control by Reinforcement Learning

Authors: Daniel Hein, Steffen Limmer, Thomas A. Runkler

Abstract: In this paper, three recently introduced reinforcement learning (RL) methods are used to generate human-interpretable policies for the cart-pole balancing benchmark. The novel RL methods learn human-interpretable policies in the form of compact fuzzy controllers and simple algebraic equations. The representations as well as the achieved control performances are compared with two classical controll… ▽ More In this paper, three recently introduced reinforcement learning (RL) methods are used to generate human-interpretable policies for the cart-pole balancing benchmark. The novel RL methods learn human-interpretable policies in the form of compact fuzzy controllers and simple algebraic equations. The representations as well as the achieved control performances are compared with two classical controller design methods and three non-interpretable RL methods. All eight methods utilize the same previously generated data batch and produce their controller offline - without interaction with the real benchmark dynamics. The experiments show that the novel RL methods are able to automatically generate well-performing policies which are at the same time human-interpretable. Furthermore, one of the methods is applied to automatically learn an equation-based policy for a hardware cart-pole demonstrator by using only human-player-generated batch data. The solution generated in the first attempt already represents a successful balancing policy, which demonstrates the methods applicability to real-world problems. △ Less

Submitted 20 July, 2020; originally announced July 2020.

arXiv:2001.07295 [pdf, other]

AutoMATES: Automated Model Assembly from Text, Equations, and Software

Authors: Adarsh Pyarelal, Marco A. Valenzuela-Escarcega, Rebecca Sharp, Paul D. Hein, Jon Stephens, Pratik Bhandari, HeuiChan Lim, Saumya Debray, Clayton T. Morrison

Abstract: Models of complicated systems can be represented in different ways - in scientific papers, they are represented using natural language text as well as equations. But to be of real use, they must also be implemented as software, thus making code a third form of representing models. We introduce the AutoMATES project, which aims to build semantically-rich unified representations of models from scien… ▽ More Models of complicated systems can be represented in different ways - in scientific papers, they are represented using natural language text as well as equations. But to be of real use, they must also be implemented as software, thus making code a third form of representing models. We introduce the AutoMATES project, which aims to build semantically-rich unified representations of models from scientific code and publications to facilitate the integration of computational models from different domains and allow for modeling large, complicated systems that span multiple domains and levels of abstraction. △ Less

Submitted 20 January, 2020; originally announced January 2020.

Comments: 8 pages, 6 figures, accepted to Modeling the World's Systems 2019

ACM Class: D.3.3; D.3.4; H.1.0; I.2.2; I.2.5; I.2.7; I.6.4; I.6.5

arXiv:1912.06290 [pdf, other]

Meta-Learning Initializations for Image Segmentation

Authors: Sean M. Hendryx, Andrew B. Leach, Paul D. Hein, Clayton T. Morrison

Abstract: We extend first-order model agnostic meta-learning algorithms (including FOMAML and Reptile) to image segmentation, present a novel neural network architecture built for fast learning which we call EfficientLab, and leverage a formal definition of the test error of meta-learning algorithms to decrease error on out of distribution tasks. We show state of the art results on the FSS-1000 dataset by m… ▽ More We extend first-order model agnostic meta-learning algorithms (including FOMAML and Reptile) to image segmentation, present a novel neural network architecture built for fast learning which we call EfficientLab, and leverage a formal definition of the test error of meta-learning algorithms to decrease error on out of distribution tasks. We show state of the art results on the FSS-1000 dataset by meta-training EfficientLab with FOMAML and using Bayesian optimization to infer the optimal test-time adaptation routine hyperparameters. We also construct a small benchmark dataset, FP-k, for the empirical study of how meta-learning systems perform in both few- and many-shot settings. On the FP-k dataset, we show that meta-learned initializations provide value for canonical few-shot image segmentation but their performance is quickly matched by conventional transfer learning with performance being equal beyond 10 labeled examples. Our code, meta-learned model, and the FP-k dataset are available at https://github.com/ml4ai/mliis . △ Less

Submitted 7 May, 2020; v1 submitted 12 December, 2019; originally announced December 2019.

arXiv:1812.06199 [pdf, other]

Inter-sentence Relation Extraction for Associating Biological Context with Events in Biomedical Texts

Authors: Enrique Noriega-Atala, Paul D. Hein, Shraddha S. Thumsi, Zechy Wong, Xia Wang, Clayton T. Morrison

Abstract: We present an analysis of the problem of identifying biological context and associating it with biochemical events in biomedical texts. This constitutes a non-trivial, inter-sentential relation extraction task. We focus on biological context as descriptions of the species, tissue type and cell type that are associated with biochemical events. We describe the properties of an annotated corpus of co… ▽ More We present an analysis of the problem of identifying biological context and associating it with biochemical events in biomedical texts. This constitutes a non-trivial, inter-sentential relation extraction task. We focus on biological context as descriptions of the species, tissue type and cell type that are associated with biochemical events. We describe the properties of an annotated corpus of context-event relations and present and evaluate several classifiers for context-event association trained on syntactic, distance and frequency features. △ Less

Submitted 14 December, 2018; originally announced December 2018.

arXiv:1804.10960 [pdf, other]

doi 10.1145/3205651.3208277

Generating Interpretable Fuzzy Controllers using Particle Swarm Optimization and Genetic Programming

Authors: Daniel Hein, Steffen Udluft, Thomas A. Runkler

Abstract: Autonomously training interpretable control strategies, called policies, using pre-existing plant trajectory data is of great interest in industrial applications. Fuzzy controllers have been used in industry for decades as interpretable and efficient system controllers. In this study, we introduce a fuzzy genetic programming (GP) approach called fuzzy GP reinforcement learning (FGPRL) that can sel… ▽ More Autonomously training interpretable control strategies, called policies, using pre-existing plant trajectory data is of great interest in industrial applications. Fuzzy controllers have been used in industry for decades as interpretable and efficient system controllers. In this study, we introduce a fuzzy genetic programming (GP) approach called fuzzy GP reinforcement learning (FGPRL) that can select the relevant state features, determine the size of the required fuzzy rule set, and automatically adjust all the controller parameters simultaneously. Each GP individual's fitness is computed using model-based batch reinforcement learning (RL), which first trains a model using available system samples and subsequently performs Monte Carlo rollouts to predict each policy candidate's performance. We compare FGPRL to an extended version of a related method called fuzzy particle swarm reinforcement learning (FPSRL), which uses swarm intelligence to tune the fuzzy policy parameters. Experiments using an industrial benchmark show that FGPRL is able to autonomously learn interpretable fuzzy policies with high control performance. △ Less

Submitted 29 April, 2018; originally announced April 2018.

Comments: Accepted at Genetic and Evolutionary Computation Conference 2018 (GECCO '18)

arXiv:1712.04170 [pdf, other]

Interpretable Policies for Reinforcement Learning by Genetic Programming

Authors: Daniel Hein, Steffen Udluft, Thomas A. Runkler

Abstract: The search for interpretable reinforcement learning policies is of high academic and industrial interest. Especially for industrial systems, domain experts are more likely to deploy autonomously learned controllers if they are understandable and convenient to evaluate. Basic algebraic equations are supposed to meet these requirements, as long as they are restricted to an adequate complexity. Here… ▽ More The search for interpretable reinforcement learning policies is of high academic and industrial interest. Especially for industrial systems, domain experts are more likely to deploy autonomously learned controllers if they are understandable and convenient to evaluate. Basic algebraic equations are supposed to meet these requirements, as long as they are restricted to an adequate complexity. Here we introduce the genetic programming for reinforcement learning (GPRL) approach based on model-based batch reinforcement learning and genetic programming, which autonomously learns policy equations from pre-existing default state-action trajectory samples. GPRL is compared to a straight-forward method which utilizes genetic programming for symbolic regression, yielding policies imitating an existing well-performing, but non-interpretable policy. Experiments on three reinforcement learning benchmarks, i.e., mountain car, cart-pole balancing, and industrial benchmark, demonstrate the superiority of our GPRL approach compared to the symbolic regression method. GPRL is capable of producing well-performing interpretable reinforcement learning policies from pre-existing default trajectory data. △ Less

Submitted 4 April, 2018; v1 submitted 12 December, 2017; originally announced December 2017.

arXiv:1709.09480 [pdf, other]

doi 10.1109/SSCI.2017.8280935

A Benchmark Environment Motivated by Industrial Control Problems

Authors: Daniel Hein, Stefan Depeweg, Michel Tokic, Steffen Udluft, Alexander Hentschel, Thomas A. Runkler, Volkmar Sterzing

Abstract: In the research area of reinforcement learning (RL), frequently novel and promising methods are developed and introduced to the RL community. However, although many researchers are keen to apply their methods on real-world problems, implementing such methods in real industry environments often is a frustrating and tedious process. Generally, academic research groups have only limited access to rea… ▽ More In the research area of reinforcement learning (RL), frequently novel and promising methods are developed and introduced to the RL community. However, although many researchers are keen to apply their methods on real-world problems, implementing such methods in real industry environments often is a frustrating and tedious process. Generally, academic research groups have only limited access to real industrial data and applications. For this reason, new methods are usually developed, evaluated and compared by using artificial software benchmarks. On one hand, these benchmarks are designed to provide interpretable RL training scenarios and detailed insight into the learning process of the method on hand. On the other hand, they usually do not share much similarity with industrial real-world applications. For this reason we used our industry experience to design a benchmark which bridges the gap between freely available, documented, and motivated artificial benchmarks and properties of real industrial problems. The resulting industrial benchmark (IB) has been made publicly available to the RL community by publishing its Java and Python code, including an OpenAI Gym wrapper, on Github. In this paper we motivate and describe in detail the IB's dynamics and identify prototypic experimental settings that capture common situations in real-world industry control problems. △ Less

Submitted 24 November, 2022; v1 submitted 27 September, 2017; originally announced September 2017.

Journal ref: 2017 IEEE Symposium Series on Computational Intelligence (SSCI)

arXiv:1705.07262 [pdf, ps, other]

doi 10.1109/IJCNN.2017.7966389

Batch Reinforcement Learning on the Industrial Benchmark: First Experiences

Authors: Daniel Hein, Steffen Udluft, Michel Tokic, Alexander Hentschel, Thomas A. Runkler, Volkmar Sterzing

Abstract: The Particle Swarm Optimization Policy (PSO-P) has been recently introduced and proven to produce remarkable results on interacting with academic reinforcement learning benchmarks in an off-policy, batch-based setting. To further investigate the properties and feasibility on real-world applications, this paper investigates PSO-P on the so-called Industrial Benchmark (IB), a novel reinforcement lea… ▽ More The Particle Swarm Optimization Policy (PSO-P) has been recently introduced and proven to produce remarkable results on interacting with academic reinforcement learning benchmarks in an off-policy, batch-based setting. To further investigate the properties and feasibility on real-world applications, this paper investigates PSO-P on the so-called Industrial Benchmark (IB), a novel reinforcement learning (RL) benchmark that aims at being realistic by including a variety of aspects found in industrial applications, like continuous state and action spaces, a high dimensional, partially observable state space, delayed effects, and complex stochasticity. The experimental results of PSO-P on IB are compared to results of closed-form control policies derived from the model-based Recurrent Control Neural Network (RCNN) and the model-free Neural Fitted Q-Iteration (NFQ). Experiments show that PSO-P is not only of interest for academic benchmarks, but also for real-world industrial applications, since it also yielded the best performing policy in our IB setting. Compared to other well established RL techniques, PSO-P produced outstanding results in performance and robustness, requiring only a relatively low amount of effort in finding adequate parameters or making complex design decisions. △ Less

Submitted 27 July, 2017; v1 submitted 20 May, 2017; originally announced May 2017.

Journal ref: 2017 International Joint Conference on Neural Networks (IJCNN), Anchorage, AK, 2017, pp. 4214-4221

arXiv:1610.05984 [pdf, other]

doi 10.1016/j.engappai.2017.07.005

Particle Swarm Optimization for Generating Interpretable Fuzzy Reinforcement Learning Policies

Authors: Daniel Hein, Alexander Hentschel, Thomas Runkler, Steffen Udluft

Abstract: Fuzzy controllers are efficient and interpretable system controllers for continuous state and action spaces. To date, such controllers have been constructed manually or trained automatically either using expert-generated problem-specific cost functions or incorporating detailed knowledge about the optimal control strategy. Both requirements for automatic training processes are not found in most re… ▽ More Fuzzy controllers are efficient and interpretable system controllers for continuous state and action spaces. To date, such controllers have been constructed manually or trained automatically either using expert-generated problem-specific cost functions or incorporating detailed knowledge about the optimal control strategy. Both requirements for automatic training processes are not found in most real-world reinforcement learning (RL) problems. In such applications, online learning is often prohibited for safety reasons because online learning requires exploration of the problem's dynamics during policy training. We introduce a fuzzy particle swarm reinforcement learning (FPSRL) approach that can construct fuzzy RL policies solely by training parameters on world models that simulate real system dynamics. These world models are created by employing an autonomous machine learning technique that uses previously generated transition samples of a real system. To the best of our knowledge, this approach is the first to relate self-organizing fuzzy controllers to model-based batch RL. Therefore, FPSRL is intended to solve problems in domains where online learning is prohibited, system dynamics are relatively easy to model from previously generated default policy transition samples, and it is expected that a relatively easily interpretable control policy exists. The efficiency of the proposed approach with problems from such domains is demonstrated using three standard RL benchmarks, i.e., mountain car, cart-pole balancing, and cart-pole swing-up. Our experimental results demonstrate high-performing, interpretable fuzzy policies. △ Less

Submitted 15 August, 2017; v1 submitted 19 October, 2016; originally announced October 2016.

Journal ref: Engineering Applications of Artificial Intelligence, Volume 65C, October 2017, Pages 87-98

arXiv:1610.03793 [pdf, ps, other]

Introduction to the "Industrial Benchmark"

Authors: Daniel Hein, Alexander Hentschel, Volkmar Sterzing, Michel Tokic, Steffen Udluft

Abstract: A novel reinforcement learning benchmark, called Industrial Benchmark, is introduced. The Industrial Benchmark aims at being be realistic in the sense, that it includes a variety of aspects that we found to be vital in industrial applications. It is not designed to be an approximation of any real system, but to pose the same hardness and complexity. A novel reinforcement learning benchmark, called Industrial Benchmark, is introduced. The Industrial Benchmark aims at being be realistic in the sense, that it includes a variety of aspects that we found to be vital in industrial applications. It is not designed to be an approximation of any real system, but to pose the same hardness and complexity. △ Less

Submitted 28 September, 2017; v1 submitted 12 October, 2016; originally announced October 2016.

Comments: 11 pages

Showing 1–21 of 21 results for author: Hein, D