research-article

EX-DRL: Hedging Against Heavy Losses with EXtreme Distributional Reinforcement Learning

Authors:

Parvin Malekzadeh,

Konstantinos N PlataniotisAuthors Info & Claims

ICAIF '24: Proceedings of the 5th ACM International Conference on AI in Finance

Pages 370 - 378

https://doi.org/10.1145/3677052.3698668

Published: 14 November 2024 Publication History

Abstract

Recent advancements in Distributional Reinforcement Learning (DRL) for modeling loss distributions have shown promise in developing hedging strategies in derivatives markets. A common approach in DRL involves learning the quantiles of loss distributions at specified levels using Quantile Regression (QR). This method is particularly effective in option hedging due to its direct quantile-based risk assessment, such as Value at Risk (VaR) and Conditional Value at Risk (CVaR). However, these risk measures depend on the accurate estimation of extreme quantiles in the loss distribution’s tail, which can be imprecise in QR-based DRL due to the rarity and extremity of tail data, as highlighted in the literature. To address this issue, we propose EXtreme DRL (EX-DRL), which enhances extreme quantile prediction by modeling the tail of the loss distribution with a Generalized Pareto Distribution (GPD). This method introduces supplementary data to mitigate the scarcity of extreme quantile observations, thereby improving estimation accuracy through QR. Comprehensive experiments on gamma hedging options demonstrate that EX-DRL improves existing QR-based models by providing more precise estimates of extreme quantiles, thereby improving the computation and reliability of risk metrics for complex financial risk management. The implementation is available here.

References

[1]

S Abilasha, Sahely Bhadra, Ahmed Zaheer Dadarkar, and P Deepak. 2022. Deep Extreme Mixture Model for Time Series Forecasting. In CIKM. 1726–1735.

[2]

Yu Bai, Song Mei, Huan Wang, and Caiming Xiong. 2021. Understanding the under-coverage bias in uncertainty estimation. Advances in Neural Information Processing Systems 34 (2021), 18307–18319.

[3]

Marc G Bellemare, Will Dabney, and Rémi Munos. 2017. A distributional perspective on reinforcement learning. In International conference on machine learning. PMLR, 449–458.

[4]

Boris Beranger, Simone A Padoan, and Scott A Sisson. 2021. Estimation and uncertainty quantification for extreme quantile regions. Extremes 24, 2 (2021), 349–375.

[5]

Tomas Björk and Agatha Murgoci. 2014. A theory of Markovian time-inconsistent stochastic control in discrete time. Finance and Stochastics 18 (2014), 545–592.

[6]

Fischer Black and Myron Scholes. 1973. The pricing of options and corporate liabilities. Journal of political economy 81, 3 (1973), 637–654.

[7]

Hans Buehler, Lukas Gonon, Josef Teichmann, and Ben Wood. 2019. Deep hedging. Quantitative Finance 19, 8 (2019), 1271–1291.

[8]

Hans Buehler, Phillip Murray, and Ben Wood. 2022. Deep bellman hedging. arXiv preprint arXiv:2207.00932 (2022).

[9]

Jay Cao, Jacky Chen, Soroush Farghadani, John Hull, Zissis Poulos, Zeyu Wang, and Jun Yuan. 2023. Gamma and vega hedging using deep distributional reinforcement learning. Frontiers in Artificial Intelligence 6 (2023), 1129370.

[10]

Jay Cao, Jacky Chen, John Hull, and Zissis Poulos. 2021. Deep Hedging of Derivatives Using Reinforcement Learning. The Journal of Financial Data Science 3, 1 (2021), 10–27.

[11]

Will Dabney, Georg Ostrovski, David Silver, and Rémi Munos. 2018. Implicit quantile networks for distributional reinforcement learning. In International conference on machine learning. PMLR, 1096–1105.

[12]

Will Dabney, Mark Rowland, Marc Bellemare, and Rémi Munos. 2018. Distributional reinforcement learning with quantile regression. In Proceedings of the AAAI conference on artificial intelligence, Vol. 32.

[13]

Roberto Daluiso, Marco Pinciroli, Michele Trapletti, and Edoardo Vittori. 2023. Cva hedging with reinforcement learning. In Proceedings of the Fourth ACM International Conference on AI in Finance. 261–269.

Digital Library

[14]

Parisa Davar, Frédéric Godin, and Jose Garrido. 2024. Catastrophic-risk-aware reinforcement learning with extreme-value-theory-based policy gradients. arXiv preprint arXiv:2406.15612 (2024).

[15]

Kang Gao, Stephen Weston, Perukrishnen Vytelingum, Namid Stillman, Wayne Luk, and Ce Guo. 2023. Deeper Hedging: A New Agent-based Model for Effective Deep Hedging. In Proceedings of the Fourth ACM International Conference on AI in Finance. 270–278.

Digital Library

[16]

Igor Halperin. 2020. QLBS: Q-Learner in the Black-Scholes (-Merton) Worlds. The Journal of Derivatives 28, 1 (2020), 99–122.

[17]

Yi He, Liang Peng, Dabao Zhang, and Zifeng Zhao. 2022. Risk analysis via generalized Pareto distributions. Journal of Business & Economic Statistics 40, 2 (2022), 852–867.

[18]

Yan Huang, Fuyu Du, Jian Chen, Yan Chen, Qicong Wang, and Maozhen Li. 2019. Generalized Pareto model based on particle swarm optimization for anomaly detection. IEEE Access 7 (2019), 176329–176338.

[19]

Petter N Kolm and Gordon Ritter. 2019. Dynamic replication and hedging: A reinforcement learning approach. The Journal of Financial Data Science 1, 1 (2019), 159–171.

[20]

Yuxi Li, Csaba Szepesvari, and Dale Schuurmans. 2009. Learning exercise policies for american options. In Artificial intelligence and statistics. PMLR, 352–359.

[21]

Guiliang Liu, Yudong Luo, Oliver Schulte, and Pascal Poupart. 2022. Uncertainty-aware reinforcement learning for risk-sensitive player evaluation in sports game. Advances in Neural Information Processing Systems 35 (2022), 20218–20231.

[22]

Yudong Luo, Guiliang Liu, Pascal Poupart, and Yangchen Pan. 2023. An alternative to variance: Gini deviation for risk-averse policy gradient. Advances in Neural Information Processing Systems 36 (2023), 60922–60946.

[23]

Yecheng Ma, Dinesh Jayaraman, and Osbert Bastani. 2021. Conservative offline distributional reinforcement learning. Advances in neural information processing systems 34 (2021), 19235–19247.

[24]

Parvin Malekzadeh, Ming Hou, and Konstantinos N Plataniotis. 2023. A unified uncertainty-aware exploration: Combining epistemic and aleatory uncertainty. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 1–5.

[25]

Parvin Malekzadeh and Konstantinos N Plataniotis. 2024. Active Inference and Reinforcement Learning: A Unified Inference on Continuous State and Action Spaces under Partial Observability. Neural Computation (2024), 1–64.

[26]

Parvin Malekzadeh, Konstantinos N Plataniotis, Zissis Poulos, and Zeyu Wang. 2024. A Robust Quantile Huber Loss with Interpretable Parameter Adjustment in Distributional Reinforcement Learning. In IEEE International Conference on Acoustics, Speech and Signal Processing. 6120–6124.

[27]

Parvin Malekzadeh, Mohammad Salimibeni, Arash Mohammadi, Akbar Assa, and Konstantinos N Plataniotis. 2020. MM-KTD: multiple model kalman temporal differences for reinforcement learning. IEEE Access 8 (2020), 128716–128729.

[28]

Saeed Marzban, Erick Delage, and Jonathan Yu-Meng Li. 2022. Equal risk pricing and hedging of financial derivatives with convex risk measures. Quantitative Finance 22, 1 (2022), 47–73.

[29]

Robert C Merton. 1973. Theory of rational option pricing. The Bell Journal of economics and management science (1973), 141–183.

[30]

Phillip Murray, Ben Wood, Hans Buehler, Magnus Wiese, and Mikko Pakkanen. 2022. Deep hedging: Continuous reinforcement learning for hedging of general portfolios across multiple risk aversions. In Proceedings of the Third ACM International Conference on AI in Finance. 361–368.

Digital Library

[31]

Qiyun Pan, Young Myoung Ko, and Eunshin Byon. 2020. Uncertainty quantification for extreme quantile estimation with stochastic computer models. IEEE Transactions on Reliability 70, 1 (2020), 134–145.

[32]

Shige Peng, Shuzhen Yang, and Jianfeng Yao. 2023. Improving value-at-risk prediction under model uncertainty. Journal of Financial Econometrics 21, 1 (2023), 228–259.

[33]

James Pickands III. 1975. Statistical inference using extreme order statistics. the Annals of Statistics (1975), 119–131.

[34]

Anil Sharma, Freeman Chen, Jaesun Noh, Julio DeJesus, and Mario Schlener. 2024. Hedging Beyond the Mean: A Distributional Reinforcement Learning Perspective for Hedging Portfolios with Structured Products. arXiv preprint arXiv:2407.10903 (2024).

[35]

Abhay K Singh, David E Allen, and Powell J Robert. 2013. Extreme market risk and extreme value theory. Mathematics and computers in simulation 94 (2013), 310–328.

[36]

Dylan Troop, Frédéric Godin, and Jia Yuan Yu. 2021. Bias-corrected peaks-over-threshold estimation of the cvar. In Uncertainty in Artificial Intelligence. PMLR, 1809–1818.

[37]

Edoardo Vittori, Michele Trapletti, and Marcello Restelli. 2020. Option hedging with risk averse reinforcement learning. In Proceedings of the first ACM international conference on AI in finance. 1–8.

Digital Library

[38]

Wen Xu, Huixia Judy Wang, and Deyuan Li. 2022. Extreme quantile estimation based on the tail single-index model. Statistica Sinica 32, 2 (2022), 893–914.

Index Terms

EX-DRL: Hedging Against Heavy Losses with EXtreme Distributional Reinforcement Learning
1. Applied computing
  1. Law, social and behavioral sciences
    1. Economics
2. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Reinforcement learning

Recommendations

Distributional reinforcement learning with unconstrained monotonic neural networks
Highlights
- Novel distributional RL algorithm based on unconstrained monotonic neural networks.
- Monotonicity ensures the validity of the random return probability distribution.
- Methodology to learn different representations of the random ...
Abstract
The distributional reinforcement learning (RL) approach advocates for representing the complete probability distribution of the random return instead of only modelling its expectation. A distributional RL algorithm may be characterised by two ...
Distributional reinforcement learning with epistemic and aleatoric uncertainty estimation
Abstract
Distributional reinforcement learning (RL) differs from conventional RL, which only estimates the expectation of the return. Distributional RL considers the return as a random variable and estimates its distribution. The return distribution can ...
Risk-averse Distributional Reinforcement Learning: A CVaR Optimization Approach
IJCCI 2019: Proceedings of the 11th International Joint Conference on Computational Intelligence

Conditional Value-at-Risk (CVaR) is a well-known measure of risk that has been directly equated to robustness, an important component of Artificial Intelligence (AI) safety. In this paper we focus on optimizing CVaR in the context of Reinforcement ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICAIF '24: Proceedings of the 5th ACM International Conference on AI in Finance

November 2024

878 pages

ISBN:9798400710810

DOI:10.1145/3677052

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 November 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ICAIF '24

ICAIF '24: 5th ACM International Conference on AI in Finance

November 14 - 17, 2024

NY, Brooklyn, USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
71
Total Downloads

Downloads (Last 12 months)71
Downloads (Last 6 weeks)71

Reflects downloads up to 23 Dec 2024

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents