Search CORE

47 research outputs found

Stochastic Optimal Control with Neural Networks and Application to a Retailer Inventory Problem

Author: Balakrishnan S. N.
Huang Zhongwu
Wang Xiaohua
Publication venue: Scholars\u27 Mine
Publication date: 01/01/2005
Field of study

Overwhelming computational requirements of classical dynamic programming algorithms render them inapplicable to most practical stochastic problems. To overcome this problem a neural network based Dynamic Programming (DP) approach is described in this study. The cost function which is critical in a dynamic programming formulation is approximated by a neural network according to some designed weight-update rule based on Temporal Difference(TD)learning. A Lyapunov based theory is developed to guarantee an upper error bound between the output of the cost neural network and the true cost. We illustrate this approach through a retailer inventory problem

Missouri University of Science and Technology (Missouri S&T): Scholars' Mine

不完全知覚環境のための複素強化学習に基づく学習分類子システム

Author: 山崎大地
Publication venue
Publication date: 21/09/2016
Field of study

本研究では，難解なPOMDPs環境において最適な方策を獲得するために，複素強化学習に基づく学習分類子システムとしてComplex-Valued Classifier System（CVCS）およびその改良であるAdjustment-Population-size-based CVCS（AP-CVCS）を提案する．提案システムを用いた手法は，POMDPs環境に適用可能な強化学習手法である複素強化学習（Complex-Valued Reinforcement Learning : CVRL）と比較して，1)方策（分類子）を進化計算より探索し，不要な分類子を淘汰することで，最適な方策を効率よく探索可能である．また，POMDPs環境に適用可能な学習分類子システム（Learning Classifier System : LCS）である進化型メモリベース法を組み込んだZCSM（Zeroth level Classifier System with Memory）と比較して，2)方策に内部メモリを用いず，行動履歴のみを用いることで，少ない計算リソースでPOMDPs環境における不完全知覚問題を解決することが可能である．提案手法の有効性を検証するための計算機実験として，a) 標準的な不完全知覚環境，およびに，従来手法が適用困難なPOMDPs環境としてb) 状態空間が大きい環境，c) 不完全知覚の特性が異なる環境に提案手法を適用したところ，次の知見を得た．まず，1)提案手法は，従来手法（Q． -learning とZCSM）よりも少ない学習回数で高い学習性能を実現し，2)不完全知覚に対して必要なパラメータについて適切な値が設定できないために従来手法では学習不可能な環境においても学習が可能であり，3)従来手法が最適な方策を獲得不可能な初期状態が不完全知覚となる問題においても，提案手法は最適な方策を獲得可能であることを明らかにした．また， AP-CVCSに関しては，1)CVCSや従来手法と比較してより安定した最適方策の獲得を達成できる一方，2)初期状態が不完全知覚状態となるような環境では従来手法と同等に安定した性能を発揮することを確認した．また，さらなる展開として，実問題への適用可能性について調査するため， 1)不完全知覚に対して必要なパラメータについて環境に合わせて適用的に変化させる機構の考案および評価実験を行った結果，事前にパラメータを設定することなく，不完全知覚問題を解決することができた．また， 2) 知覚入力に外乱の発生する環境下での評価実験を行った．その結果として，Q． -learning と比較してCVCSの枠組がノイズに対して頑強性があることを示す一方，AP-CVCSでは安定した学習が困難となることを明らかにした．電気通信大学201

Creative Repository of Electro-Communications

An improved reinforcement learning system using affective factors

Author: Kobayashi Kunikazu
Tsurusaki Tetsuya
呉本尭
大林正直
間普真吾
Publication venue: 'MDPI AG'
Publication date: 10/07/2013
Field of study

Yamaguchi University Navigator for Open access Collection and Archives

Predicting Pilot Behavior in Medium Scale Scenarios Using Game Theory and Reinforcement Learning

Author: Agogino Adrian
Brat Guillaume
Yildiz Yildiray
Publication venue
Publication date: 01/01/2014
Field of study

Effective automation is critical in achieving the capacity and safety goals of the Next Generation Air Traffic System. Unfortunately creating integration and validation tools for such automation is difficult as the interactions between automation and their human counterparts is complex and unpredictable. This validation becomes even more difficult as we integrate wide-reaching technologies that affect the behavior of different decision makers in the system such as pilots, controllers and airlines. While overt short-term behavior changes can be explicitly modeled with traditional agent modeling systems, subtle behavior changes caused by the integration of new technologies may snowball into larger problems and be very hard to detect. To overcome these obstacles, we show how integration of new technologies can be validated by learning behavior models based on goals. In this framework, human participants are not modeled explicitly. Instead, their goals are modeled and through reinforcement learning their actions are predicted. The main advantage to this approach is that modeling is done within the context of the entire system allowing for accurate modeling of all participants as they interact as a whole. In addition such an approach allows for efficient trade studies and feasibility testing on a wide range of automation scenarios. The goal of this paper is to test that such an approach is feasible. To do this we implement this approach using a simple discrete-state learning system on a scenario where 50 aircraft need to self-navigate using Automatic Dependent Surveillance-Broadcast (ADS-B) information. In this scenario, we show how the approach can be used to predict the ability of pilots to adequately balance aircraft separation and fly efficient paths. We present results with several levels of complexity and airspace congestion

Bilkent University Institutional Repository

NASA Technical Reports Server

Predicting pilot behavior in medium-scale scenarios using game theory and reinforcement learning

Author: Agogino A.
Brat G.
Yildiz Y.
Publication venue: 'American Institute of Aeronautics and Astronautics (AIAA)'
Publication date: 01/01/2014
Field of study

Cataloged from PDF version of article.A key element to meet the continuing growth in air traffic is the increased use of automation. Decision support systems, computer-based information acquisition, trajectory planning systems, high-level graphic display systems, and all advisory systems are considered to be automation components related to next generation (NextGen) air space. Given a set of goals represented as reward functions, the actions of the players may be predicted. However, several challenges need to be overcome. First, determining how a player can attempt to maximize their reward function can be a difficult inverse problem. Second, players may not be able to perfectly maximize their reward functions. ADS-B technology can provide pilots the information, position, velocity, etc. of other aircraft. However, a pilot has limited ability to use all this information for his/her decision making. For this scenario, the authors model these pilot limitations by assuming that pilots can observe a limited section of the grid in front of them

Bilkent University Institutional Repository