0% found this document useful (0 votes)

79 views

QF-TraderNet Intraday Trading Via Deep Reinforceme

1) QF-TraderNet is a deep reinforcement learning model for intraday trading that generates trading policies to control profit and loss, instead of using fixed target profit and stop loss levels. 2) It comprises two neural networks: one extracts temporal features from financial time series, and the other generates trading policies based on quantum price levels to discretize the price space. 3) Experiments on various financial datasets show QF-TraderNet outperforms baselines in profitability and robustness, with an adaptive trading strategy.

Uploaded by

Manaz Afzal

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

79 views

QF-TraderNet Intraday Trading Via Deep Reinforceme

Uploaded by

Manaz Afzal

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

ORIGINAL RESEARCH

published: 29 October 2021

doi: 10.3389/frai.2021.749878

QF-TraderNet: Intraday Trading via

Deep Reinforcement With Quantum
Price Levels Based Proﬁt-And-Loss
Control
Yifu Qiu, Yitao Qiu, Yicong Yuan, Zheng Chen and Raymond Lee *
Department of Computer Science and Technology, Division of Science and Technology, BNU-HKBU United International
College, Zhuhai, China

Reinforcement Learning (RL) based machine trading attracts a rich profusion of interest.
However, in the existing research, RL in the day-trade task suffers from the noisy financial
movement in the short time scale, difficulty in order settlement, and expensive action
search in a continuous-value space. This paper introduced an end-to-end RL intraday
trading agent, namely QF-TraderNet, based on the quantum finance theory (QFT) and
deep reinforcement learning. We proposed a novel design for the intraday RL trader’s
action space, inspired by the Quantum Price Levels (QPLs). Our action space design also
Edited by: brings the model a learnable profit-and-loss control strategy. QF-TraderNet composes
Ronald Hochreiter, two neural networks: 1) A long short term memory networks for the feature learning of
Vienna University of Economics and
Business, Austria
financial time series; 2) a policy generator network (PGN) for generating the distribution of
Reviewed by:
actions. The profitability and robustness of QF-TraderNet have been verified in multi-type
Paolo Pagnottoni, financial datasets, including FOREX, metals, crude oil, and financial indices. The
University of Pavia, Italy experimental results demonstrate that QF-TraderNet outperforms other baselines in
Taiyong Li,
Southwestern University of Finance terms of cumulative price returns and Sharpe Ratio, and the robustness in the
and Economics, China acceidential market shift.
*Correspondence: Keywords: quantum finance, quantum price level, reinforcement learning, automatic trading, intelligent trading
Raymond Lee system
raymondshtlee@uic.edu.cn

Specialty section: 1 INTRODUCTION

This article was submitted to
Artificial Intelligence in Finance, Financial trading is an online decision-making process (Deng et al., 2016). Previous works (Moody
a section of the journal and Saffell, 1998; Moody and Saffell, 2001; Dempster and Leemans, 2006) demonstrated the
Frontiers in Artificial Intelligence Reinforcement Learning (RL) agent’s promising profitability in trading activities. However,
Received: 30 July 2021 traditional RL algorithms face challenges for the intraday trading problem in three aspects: 1)
Accepted: 21 September 2021 Short-term financial movement is often accompanied by more noisy oscillations. 2) The
Published: 29 October 2021
computational complexity for making decision in daily continuous-value price range. In the T +
Citation: n strategy, RL agents are assigned a long, neutral, or short position in each trading day, including the
Qiu Y, Qiu Y, Yuan Y, Chen Z and Lee R Fuzzy Deep Recurrent Neural Networks (FDRNN) (Deng et al., 2016) and Direct Reinforcement
(2021) QF-TraderNet: Intraday Trading
Learning (DRL) (Moody and Saffell, 2001). However, in day trade, i.e., T + 0 strategy, the trading task
via Deep Reinforcement With Quantum
Price Levels Based Profit-And-
is converted to identify the optimal price to open and close the order. 3) The early stop of orders
Loss Control. when applying the intraday strategy. Conventionally, the settlement of orders involved two
Front. Artif. Intell. 4:749878. hyperparameters: Target Profit (TP) and Stop Loss (SL). TP refers to the price to close the
doi: 10.3389/frai.2021.749878 activating order and take out the profit if the price moved as expected. SL denotes the price to

Frontiers in Artiﬁcial Intelligence | www.frontiersin.org 1 October 2021 | Volume 4 | Article 749878

Qiu et al. QF-TraderNet

FIGURE 1 | An early-stop loss problem: a short order is early settled (red dash line: SL) before the price drops to the proﬁtable range. Thus, the strategy loses the
potential proﬁt (blue double arrow).

FIGURE 2 | Illustration of AUDUSD’s QPLs in 3 consecutive trading days (23/04/2020–27/04/2020) in 30-min K-line graph. The blue lines represent negative QPLs
based on the ground state (black dash line); the red lines are positive QPLs. Line color deepens with the rise of the QPL level n.

terminate the transaction and avoid a further loss if the price temporal feature in financial time series; 2) a policy generator
moved towards a loss direction (e.g., the price dropped down network (PGN) for generating the distribution of actions (policy)
following a long position decision). These two hyperparameters in each state. We especially reference the Quantum Price Levels
are defined as a fixed shift relative to price to enter the market, as (QPLs) as illustrated in Figure 2 to design the action space for the
known as, points. If the price touched these two-preset levels, the RL agent, thus discretizing the price-value space. Our method is
order will be closed deterministically. An instance of the early- inspired by the Quantum Finance Theory that QPLs captures the
stop order is shown in Figure 1. equilibrium states of price movement on a daily basis (Lee, 2020).
Focusing on the mentioned challenges, we proposed a deep We utilize the deep reinforcement learning algorithm to update the
reinforcement learning-based end-to-end learning model, named trainable parameters of QF-TraderNet iteratively to maximize the
QF-TraderNet. Our model directly generates the trading policy to cumulative price return.
control profit and loss instead of using fixed TP and SL. QF- Experiments on various financial datasets, including the
TraderNet comprises two neural networks with different functions: financial indices, metals, crude oil, and FOREX, and
1) a Long-short Term Memory (LSTM) networks for extracting the comparisons with previous RL and DL-based single-product

Frontiers in Artiﬁcial Intelligence | www.frontiersin.org 2 October 2021 | Volume 4 | Article 749878

Qiu et al. QF-TraderNet

trading systems have been conducted. Our QF-TraderNet Nowadays, the AI-based trading, especially, the reinforcement
outperforms some state-of-the-art baselines in the profitability learning-approach, attracts the interest in both academia and
evaluated by the cumulative return and the risk-adjusted return industry. Moody and Saffell (2001) proposed a direct
(Sharpe ratio), and the robustness facing market turbulence. Our reinforcement algorithm to trade and performed a
model shows adaptability in the unseen market environment. The comprehensive comparison between the Q-learning with the
generated policy of QF-TraderNet also provides an explainable policy gradient. Huang et al. (2016) further propose a robust
profit-and-loss order control strategy. trading agent based on the deep-Q networks (DQN). Deng et al.
Our main contributions could be summarized as: (2016) utilized the fuzzy logic with a deep learning model to
extract the financial feature from noisy time series, which
• We propose a novel end-to-end daytrade model that achieved state-of-the-art performance in the single-product
directly learns the optimal price level to settle, thus trading. Xiong et al. (2018) employed the Deep Deterministic
solving the early stop in an implicit stop-loss and target- Policy Gradient (DDPG) baesd on the standard actor-critic
profit setting. framework to perform the stock trading. The experiments
• We are the first to present RL agent’s action space via the demonstrated their profitability over the baselines including
daily quantum price level, making the machine day trade the min-variance portfolio allocation method and the technical
tractable. approach based on the Dow Jones Industrial Average (DJIA)
• Under the same market information perception, we achieve index. Wang et al. (2019) employed the RL algorithm to construct
better profitability and robustness than previous state-of- the winner and loser portfolio and traded in the buy-winner-sell-
the-art RL based models. loser strategy. However, the intraday trading task for reinforced
trading agent are still less addressed, which is mainly because the
complexity in designing trading space for frequent trading
2 RELATED WORK strategy. We dominantly aim at the efficient intraday trading
in our research.
Our work is in line with two sub-tasks: financial feature extraction
and transactions based on deep reinforcement learning. We
shortly review past studies. 3 QF-TRADERNET
Daytrade refers to the strategy of taking a position and leaving the
2.1 Financial Feature Extraction and market within one trading day. We let our model sends an order
Representation when the market is opened every trading day. Based on the
Computational approaches for the applications in financial observed environment, we train QF-TraderNet to learn the
modeling have attracted much attention in the past. (Peralta optimal QPL to settle. We will introduce the QPL based
and Zareei, 2016). utilized the network model to perform the action space search and model architecture separately.
portfolio planning and selection. Giudici et al. (2021) used
volatility spillover decomposition methods to model the
relations between two currencies. Resta et al. (2020) conducted 3.1 Quantum Finance Theory Based Action
a technical analysis-based approach to identify the trading Space Search
opportunities with specific on cryptocurrency. Among these, Quantum finance theory elaborated on the relationship between
the neural networks shows promising ability in learning both the secondary financial market and the classical-quantum
the structured and unstructured data. Most of the related works in mechanics model (Lee, 2020) (Meng et al., 2015) (Ye and
neural financial modeling were made to the relationship Huang, 2008). QFT proposes an anharmonic oscillator model
embedding (Li et al., 2019) and forecasting (Wei et al., 2017), to embed the interrelationships among financial products. It
option pricing (Pagnottoni, 2019), and forecasting (Neely et al., considers the dynamics of the financial products are affected
2014). The long short-term memory networks (LSTM) (Wei by the energy field generated by itself and other financial product
et al., 2017), Elman recurrent neural networks (Wang et al., (Lee, 2020). The energy levels generated from the field of particle
2016) were employed in financial time series analysis tasks regulate the equilibrium states of price movement on a daily basis,
successfully. Tran et al. (2018) utilized the attention which is noted as the daily quantum price level (QPL). QPLs
mechanism to refine RNN. (Mohan et al., 2019). leveraged could be viewed as the support or resistance in classical financial
both market and textual information to boost the performance analysis indeed. Past studies (Lee, 2019) have shown that QPLs
of stock prediction. Some studies also adopted stock embedding can be used as feature extraction for the financial time series. The
to mine the affinity indicators (Chen et al., 2019). procedure of the QPL calculation is given with the
following steps.
2.2 Reinforcement Learning in Trading
Algorithmic trading has been widely studied in its different Step 1: Modeling the Potential Energy of Market
subareas, including risk control (Pichler et al., 2021), portfolio Movement via Four Major Market Participants
optimization (Giudici et al., 2020), and trading strategy (Marques Same with the classical quantum mechanics, the Hamiltonian
and Gomes, 2010; Vella and Ng, 2015; Chen et al., 2021). in QFT contains the potential term and the volatility term.

Frontiers in Artiﬁcial Intelligence | www.frontiersin.org 3 October 2021 | Volume 4 | Article 749878

Qiu et al. QF-TraderNet

Founded on the conventional ﬁnancial analysis, primary dr P

dΔzi
market participants include 1) Investor, 2) Speculator, 3) c −cδrt + cvr3t (6)
dt i1
dt
Arbitrageurs, 4) Hedger, and 5) Market maker; however,
there is no available chance for Arbitrager to perform where P denotes the number of types of participants inside
effective trading according to the efficient market markets. δ, and v in Eq. 5 are the summary of each term
hypothesis (Lee, 2020). Thus we ignore the arbitrageurs’ across all participants models, i.e., δ cαMM + δSP + δHG −
effect, and then count the impact of other participants δ IV, and v vHG − vIV. Combining dr/dt with the Brownian price
towards the calculation of market potential term: returns described by the Langevin equation, the instantaneous
Market makers provide the facilitator services for other potential energy is modeled with the following equation,
participants, and to absorb the outstanding demand noted as
zσ , with absorbability factors ασ . Thus, the excess demand at any cηδ 2 cηv 4
V(r) cηδr − cηvr3 dr ≈ r − r (7)
instance is given by Δz z+ − z−. The relationship between 2 4
instantaneous returns r(t) r(t, Δt) p(t)−p(t−Δt)
p(t−Δt) , and the excess where η is the damping force factor of the market.
demand could be approximately noted as r(t) Δz c , in which c
represents the market depth. For an efficient market with the Step 2: Modeling the Kinetic Term of Market
smooth market environment, we assume the absorbability of Movement via Price Return
existing orders with different trading directions will be the same, One challenge to model the kinetic term is to replace the
and the contribution of the market makers is derived as (Lee, displacements in classical particles with an appropriate
2020), measurement in finance. Specifically, we replace displacement
dΔz dz+ dz− with price returns r(t), as r(t) connects the price change with time
| | − | (1) unit, which simplifies the Schrödinger equation into the Non-
dt MM dt MM dt MM
−α+ z+ + α− z− − cαMM rt (2) time-dependent one. Hence, the Hamiltonian for financial
particle could be formulated by,
where σ denotes the trading position including +: long position,
and -: short position. rt denotes the simultaneous price return ^ Z z + V(r)
H (8)
2m zr2
respect to time t.
Speculators are trend-following participants with few senses where Z, m denote the plank constant and intrinsic properties of
about risk control. Their behavior mainly contributes to the the financial market, such as market capitalization in a stock
market movement by its dynamic oscillator term. A damping market. Combining the Hamiltonian with the classical
variable δ is defined to represent the resistance of trend followers Schrödinger equation, the Schrödinger Equation for Quantum
behaviors towards the market. Considering that speculators have Finance Theory (QFSE) comes out with (Lee, 2020),
less consider risk, there is no high-order anharmonic term
regarding the market volatility, Z d2 cηδ cηv 4
+ r2 − r ϕ(r) Eϕ(r) (9)
2m dr2 2 4
dΔz
| −rt δ|SP (3)
dt SP E denotes the particle’s energy levels, which refers to the
Quantum Price Levels for the financial particles. The first term
Investors have a sense of stopping loss. They are 1) earning Z d2
2m dr2 is the kinetic energy term. The second term V(r) represents
profit following the trend, 2) minimizing the risk; thus, we define
the potential energy term, i.e. (3.6), of the quantum finance
their potential energy by,
market. ϕ(r) is the wave-function of QFSE, which is
dΔz approximated by the probability density function of historical
| rt δ|IV − v|IV r2t (4)
dt IV price return.
where δ, v stand for the harmonic dynamic term (trend following Step 3: Perform the Action Space Search by Solving
contribution); and anharmonic term (market volatility), the QFSE
respectively. According to QFT, if there were no extrinsic incentives such as
Hedger also controls the risk but using sophisticated hedging financial events or the release of critical financial figures, QFPs
techniques. Commonly, the reverse trading direction has been would remain at their energy levels (i.e., equilibrium states) and
performed by Hedgers compared with common Investors, perform regular oscillations. If there is an external stimulus, QFPs
especially for the one-product hedging strategy. Hence, the would absorb or release the quantized energy and jump to other
market dynamic caused by Hedger could be summarized as, QPLs. Thus, daily QPLs could be viewed as the potential states of
dΔz the price movements in one trading day. Hence, we employ QPLs
| −δ|HG − v|HG r2t rt (5) as the action candidates in the action space A {a1 , a2 , . . . , aA } of
dt HG
QF-TraderNet. The detailed numerical method for solving QFSE
To conclude the equations (3.1) from to (3.4), the and the algorithm for the QPL based action space search is given
simultaneous price return dr/dt could be rewritten as, in the supplementary file.

Frontiers in Artiﬁcial Intelligence | www.frontiersin.org 4 October 2021 | Volume 4 | Article 749878

Qiu et al. QF-TraderNet

3.2 Deep Feature Learning and in timestep t, model takes action at by sampling from the policy
Representation by LSTM Networks a+−
t comprised of long (+) and short (-) trading direction. at
+−

LSTM networks show promising performance in the sequential contains A dimensions, indicating the number of candidate
feature learning, as its structural adaptability (Gers et al., 2000). actions, with the reward of price return rit for each,
We introduce the LSTM networks to extract the temporal features ⎨ δ QPLδi − pot
⎧ , ∀QPLδi ∈ pht , plt
of the ﬁnancial series, thus improving the perception in the rit ⎩ (14)
δ pct − pot , ∀QPLδi ∉ pht , plt
market status of the policy generation network (PGN).
We use the same look-back window in (Wang et al., 2019) with where δ denotes the trading direction: for actions with +QPL as
size W to split the input sequence x from the completed series the target price level to settle, the trading will be determined as
S (s1 , s2 , . . . , st , . . . , sT ), i.e., agent evaluates the market status long buy (δ + 1); for the actions in -QPL, short sell (δ − 1)
by the time period with size W. Hence, the input matrix of LSTM trading will be performed; and δ is 0 when the decision is made to
could be noted as X (x1 , x2 , . . . , xt , . . . , xT−W+1 ), where be neutral, as no trading will be made in t trading day.
xt (st−W+w |w ∈ [1, W])T . We design our input vectors st is We train our QF-TraderNet with reinforcement learning. The
constituted by: 1) Opening, highest, lowest and closing prices key idea is to maintain a loop with the successive steps: 1) agent π
for each trading day. Note: the close price in t − 1 day might aware the environment, 2) π make the action, and 3) adjust its
be different with the open price in t because of the adjustment of behavior to receive more reward until the agent has received its
the market outside the trading hours; hence, we consider the learning goal (Sutton and Barto, 2018). Therefore, for each
entire price variables with four types. 2) Transaction Volume. 3) training episode, a trajectory τ
Moving Average Convergence-Divergence is a technical indicator {(h1 , a1 ), (h2 , a1 ), . . . , (ht−1 , aT )} could be deﬁned as the
to identify the market status. 4) Relative strength index is a sequence of state-action tuple, with the corresponding return
technical indicator measuring the price momentum. 5) sequence1r {r1 , r2 , r3 , . . . , rT }. The probability of action Pr
Bollinger Band (main, upper, and lower) can be applied to (actiont i) for each QPL is determined by QF-TraderNet as:
identify the potential price range, consequently observing the
market trend (Colby and Meyers, 1988). 6) KDJ (stochastic ait Practiont QPL(i) |X;~ θ, ξ (15)
oscillator) is used in short-term oriented trading by the price

velocity techniques (Colby and Meyers, 1988). πPGN LSTM(xt ) (16)
θ ξ
The principal components analysis (PCA) (Wold et al., 1987) actioni

is utilized to compress the series data S into F~ dimension and let Rτ denotes the cumulative price return for trajectory τ, with
denoise (Wold et al., 1987). Subsequently, the L2 normalization is T−W+1 (i)
t1 rt Rτ . Then, for all possible explored trajectories, the
applied to scale the input features to be in the same magnitude. expectation reward obtained by the RL agent could be evaluated
The preprocessing is calculated as, as (Sutton et al., 2000),
PCA(X)
X~ F→F~
(10) Jπ (θ, ξ) Rτ Pr(τ; θ, ξ) dτ (17)
τ π
PCA (X)2
F→F~
where Pr(τ|θ, ξ ) is the probability for QF-TraderNet agent π with
, where F~ < F, and the deep feature learning model could be parametersπ θ and ξ to generate trajectory τ with Monte-Carlo
described as, Simulation. Then, the objective is to maximize the expectation of
reward, θ*, ξ* argmaxθ,ξ J (θ,ξ). We substitute objective with its
ht LSTM(xt ), t ∈ [0, T − W + 1] (11) inverse to and use gradient descent to optimize. To avoid the local
ξ
minimum probelm caused by the multiple postive-reward
where ξ is the trainable parameters for LSTM. actions, we use the state-dependent threshold method (Sutton
and Barto, 2018) to allow the RL agent perform a more efﬁcient
optimization. The detailed gradient calculation is given in the
3.3 Policy Generator Networks (PGN)
supplementary.
Given the learned feature vector ht, PGN directly produces the
output policy, i.e., the probability of settling order in each + QPL
and -QPL, according to the action score z it produced by a fully- 3.4 Trading Policy With Learnable Soft Proﬁt
connected networks (FFBPN).
and Loss Control
z it FFBPN(ht ; W θ , bθ ) (12) In QF-TraderNet, the LSTM networks learn the hidden
θ
representation and feed it into PGN; then PGN generates the
where θ deontes the parameters of FFBPN, with the weighted learned policy to decide the target QPL to settle. As the action is
matrix Wθ and bias bθ. Let ait denotes i − th action at time t. The sampled from the generated policy, QF-TraderNet adopts a soft
output policy at is calculated as,
expz it
a+−
t (13) 1
r in here denotes the reward of RL agent, rather than the previous price return r(t)
ai′ ∈[1,A]
expz it′
in the QPL evaluation

Frontiers in Artiﬁcial Intelligence | www.frontiersin.org 5 October 2021 | Volume 4 | Article 749878

Qiu et al. QF-TraderNet

FIGURE 3 | The RL framework for the QF-TraderNet.

FIGURE 4 | A case study illustrates our proﬁt-and-loss control strategy. The trading policy is uniformly distributed initially. Ideally, our model assigns the +3 QPL
action which earns the maximum proﬁt with the largest probability as S-TP. On the short side, −1 QPL can take the most considerable reward, leading to being
accredited the maximum probability as S-SL.

profit-and-loss control strategy rather than the deterministic TP better target price for settlement in the entire action space.
and SL. The overall summary of QF-TraderNet architecture has Therefore, the model is more flexible for the SL and TP
been shown in Figure 3. control in different states, compared with using a couple of
An equivalent way to interpret our strategy is that our model preset “hard” hyperparameters.
trades with long buy if the decision is made in positive QPL. In
reverse, short sell transactions will be delivered. Once the trading
direction is decided, the target QPL with the maximum 4 EXPERIMENTS
probability will be considered as the soft target price (S-TP),
and the soft stop loss line will be the QPL with the highest We conduct the empirical evaluation for our QF-TraderNet in
probability in the opposite trading direction. One exemplification various types of financial datasets. In our experiment, eight
is presented in Figure 4. datasets from 4 categories are used, including 1) foreign
Since the S-TP and S-SL control is probability-based, when the exchange product: Great Britain Pounds vs. United States
price touches the stop loss line prematurely, QF-TraderNet will Dollar (GBPUSD), Australian Dollar vs. United States Dollar
not be forced to do the settlement. It will think whether there is a (AUDUSD), Euro vs. United States Dollar (EURUSD),

Frontiers in Artiﬁcial Intelligence | www.frontiersin.org 6 October 2021 | Volume 4 | Article 749878

Qiu et al. QF-TraderNet

FIGURE 5 | 1st panel: Continuous partition for the training and veriﬁcation data; 2nd panel: Affected by the global economic situation, most datasets showed a
downward trend at the testing interval, accompanied by highly irregular oscillations; the 3rd panel: cumulative reward curve for different methods in testing evaluation.

United States Dollar vs. Swiss Franc (USDCHF); 2) financial corresponding currency or asset with a value of 10,000, at a
indices: S&P 500 Index (S&P500), Hang Seng Index (HSI); 3) transaction cost with 0.3% (Deng et al., 2016). All the
Metal: Silver vs. United States Dollar (XAGUSD), and 4) experiments are conducted in the single NVIDIA GTX Titan
Crude oil: Oil vs. United States Dollar (OILUSe). The X GPU.
evaluation is conducted from the perspective of earning
profits; and the robustness when agents face the 4.2 Models Settings
unexpected change of market states. We also investigate the To compare our model with the traditional methods, we select the
impact of different settings of our proposed QPL based action forecasting based trading model and other state-of-the-art
space search for RL trader, and the ablation study of reinforcement learning-based trading agents as the baseline.
our model.
• Market baseline (Huang et al., 2016). This strategy is used to
4.1 Experiment Settings measure the overall performance of the market during this
All datasets utilized in experiments are fetched from the free and period T, by holding the product consistently.
opened historical data center in MetaTrader 4, which is a • DDR-RNN. Following the idea of Deep Direct
professional trading platform for the FOREX, financial Reinforcement, but we apply the principal component
indices, and other securities. We download the raw time analysis (PCA) to denoise and composes data. We also
series data, around 2048 trading days, and we split the 90% employ RNN to learn the features, and a two-layer FFBPN
front of data for training and validation. The rest will be utilized as the policy generator rather than the logistic regression in
as out-of-sample verification, i.e., the continuous series from original design. This model can be regarded as the ablation
November 2012 to July 2019, has been spliced to construct the study of QF-TraderNet without the QPL action space
sequential training sample; the rest part is applied as testing and search.
validation. To be noticed, the evaluation period has covered the • FCM, a forecasting model based on RNN trend predictor,
recent fluctuations in the global financial market caused by the consisting of a 7-layer LSTM with 512 hidden dimensions. It
COVID-19 pandemic, which could be utilized as the robustness trades with a Buy-Winner-Sell-Loser strategy.
test when the trading agent is handling the unforeseen market • RF. Same design with FCM but predict the trend via
fluctuations. The size of look-back window is set at 3, and the Random Forest.
metrics regarding price return and Sharpe ratio is daily • QF-PGN. QF-PGN is the policy gradient based RL agent
calculated. In the backtest, initial capital is set to the with QPL based order control. Single FFBPN is utilized as

Frontiers in Artiﬁcial Intelligence | www.frontiersin.org 7 October 2021 | Volume 4 | Article 749878

Qiu et al. QF-TraderNet

TABLE 1 | Summary of the main comparison results among all models.

Models HSI S&P500 Silver Crude oil USDCHF GBPUSD EURUSD AUDUSD
CPR SR CPR SR CPR SR CPR SR CPR SR CPR SR CPR SR CPR SR

Market 555.00 0.01 2,122.27 0.05 −12.66 −0.03 −90.79 −0.04 0.19 0.02 −0.07 −0.01 −0.05 −0.01 −0.04 −0.01
RNN-FCM 1,251.78 0.03 361.94 0.09 -11.67 -0.07 6.76 0.02 0.04 0.02 −0.14 −0.07 −0.24 −0.13 0.05 0.04
RF-FCM 3,846.31 0.09 336.27 0.06 23.60 0.91 112.88 1.13 0.11 0.16 0.29 0.33 0.20 0.53 −0.04 −0.07
DDR-RNN 4,505.00 0.10 345.50 0.03 1.53 0.02 −4.57 −0.02 0.07 0.09 -0.02 -0.02 0.08 0.15 <0.01 −0.08
FDRNN 1,536.00 0.04 731.73 0.07 2.80 0.04 −9.38 −0.03 0.08 0.10 0.05 0.04 −0.08 −0.10 0.05 0.12
QF-PGN 3,244.35 0.07 3,133.76 1.88 1.94 0.05 138.34 2.00 −0.08 −0.11 0.28 0.37 −0.03 −0.05 0.17 0.50

QF-TraderNet Lite 2,779.64 0.17 155.66 0.04 1.56 0.04 82.40 0.54 0.58 1.69 0.61 1.31 0.20 0.65 0.02 0.03
QF-TraderNet Ultra 8,100.51 0.17 4,428.00 1.52 31.24 1.49 164.38 1.44 0.64 1.16 0.92 1.31 0.57 1.11 0.36 0.97

Bold values indicating the best performance in terms of corresponding metrics.

TABLE 2 | Ablation study for QF-TraderNet.

of the interrelationship of features. In the practical
implementation, we directly utilize the four prices as the
Models Avg. Sharpe% Impact input for USDCHF, S&P500, XAGUSD, and OILUSe; the
Full Model 1.15 − normalization step is not performed for the HSI and
QFTN-L: Limit A to 3 0.56 −0.59 (−51%) OILUSe. The reason is that our experimental results show
PGN: - without LSTM 0.59 −0.56 (−49%) our model can perceive the market state good enough in
DDR-RNN: - without QPL 0.03 −1.12 (−97%) these settings. For the sake of computational complexity, we
Supervised: - without RL 0.19 −0.96 (−83%)
remove the extra input features.

4.3 Performance in 8 Financial Datasets

the policy generator with 3 ReLU layers, and 128 neurons As displayed in Figure 5 and Table 1, we present the evaluation of
per layer. This model could be admitted as our model each trading system’s proﬁtability in 8 datasets, with the metrics
without the deep feature representation block. of cumulative price return (CPR) and the Sharpe ratio (SR). The
• FDRNN (Deng et al., 2016). A state-of-the-art direct CPR is formulated with,
reinforcement RL trader following the one-product
trading, by using the fuzzy representation and deep t
(holding)
autoencoder to extract the features. CPR pt − p(settlement)
t (18)
1

We implement two versions of QF-TraderNet: 1) QF- and the Sharpe ratio is calculated by:
TraderNet Lite (QFTN-L): 2 layers LSTM with 128- Average(CPR)
dimensional hidden vector as the feature representation, and 3 SR (19)
StandardDeviation(CPR)
layers of policy generator network with 128, 64, 32 neurons per
each. The size of action space is 3.2) QF-TraderNet Ultra (QFTN- The result of MARKET denotes that the market is in a downtrend
U): Same architecture with the Lite, but the number of candidate with high volatility in the evaluating interval, due to the recent
actions is enlarged to 7. global economic ﬂuctuation. The price range in testing is not fully
Regarding the training settings, the Adaptive Moment covered in training data in some datasets (crude oil and
Estimation (ADAM) optimizer with 1,500 training epochs is AUDUSD), which tests the models in an unseen environment.
used for all iterative optimization models at a 0.001 learning Under these testing conditions, our QFTN-U trained with CPR
rate. For the algorithms requiring PCA, the target dimensions F~ achieves higher CPR and SR than other comparisons, except the
is set at 4, satisfying the composes matrix has embedded 99.5% SR in S&P500 and Crude Oil. QFTN-L is also comparable to the

TABLE 3 | Summary for net proﬁt in the backtesting.

USDCHF HSI S&P500 XAGUSD GBPUSD EURUSD AUDUSD OILUSe

Market −156.43 −1,505.9 −175 19.29 −28.07 −214.95 −477.53 −7,228.5

FCM −4,779.2 −5,585.6 −5,656.2 −4,575.5 −3,939.4 −3,685.9 −2,230.1 3,008.8
RF −5,051.6 10,589 −3,302.9 15,536 −284.62 −1,229.5 −1,366.4 66,743
DDR-RNN −2,727.0 −2,309.5 −3,979.3 35,248 −2,298.4 −2,132.2 −3,780.9 −2097.6
FDRNN −4,543.2 6,204.1 −3,791.8 18,960 −1,619.6 −2,331.1 −3,249.2 4,145.3
QF-PGN −5,024.0 −4,598.6 3,316.0 −4,203.6 −2,341.2 −3,987.8 −2043.4 79,433

QFTN-U 588.81 −4,598.6 10,089 24,602 2,499.3 399.49 538.54 57,689

Bold values indicating the best performance in terms of corresponding metrics.

Frontiers in Artiﬁcial Intelligence | www.frontiersin.org 8 October 2021 | Volume 4 | Article 749878

Qiu et al. QF-TraderNet

TABLE 4 | Decision classification metrices. detail (SUPERVISED counts from the average of RF and FCM), where
Optimal QPL Prediction Trading Direction Prediction the QPL actions dramatically contribute to the Sharpe Ratio of
our full model. These demonstrates the benefit of trading with
Acc. P R F1 Acc. P R F1
QPL to gain considerable profitability and efficient risk-control
Pgn (3x) 0.34 0.25 0.25 0.37 0.34 0.25 0.25 0.37 ability.
Qftn-L (3x) 0.56 0.54 0.50 0.50 0.56 0.54 0.50 0.50 The backtesting results in Table 3 shows the good
Qftn-U (7x) 0.48 — — — 0.80 0.78 0.78 0.82
generalization of the QFTN-U. It is the only strategy for
Bold values indicating the best performance in terms of corresponding metrics. earning a positive profit on almost all datasets, which is
because the day-trading strategy are less affected by the
baselines. It signifies the profitability and robustness of our QF- market trend, compared with other strategies in long,
TraderNet. neutral, and short setting. We also find that the
Moreover, QFTN-L, QFTN-U, and the PGN models yield performance of our model in FOREX datasets is
significantly higher CPR and SR than other RL traders without significantly better than others. FOREX contains more
QPL-based actions (DDR-RNN and FDRNN). The ablation study noise and fluctuations, which indicates the advantages of
in Table 2 also presents the contribution of each component in our models in highly fluctuated products.

FIGURE 6 | Training curves for different settings in action space size.

Frontiers in Artiﬁcial Intelligence | www.frontiersin.org 9 October 2021 | Volume 4 | Article 749878

Qiu et al. QF-TraderNet

4.4 QPL-Inspired Intraday Trading Model

Analysis
We analyze the decision of the QPL-based intraday models in
Table 4 as two classifications: 1) predict the optimal QPL to
settle; 2) predict the profitable QPL (the QPLs having the same
trading direction with the optimal one) to settle. Noticeably,
the action space for PGN and QFTN-L is {+1 QPL, Neutral, -1
QPL}, which means that these two classification tasks for them
are actually the same. QFTN-7 might have multiple ground
truths, as the payoff might be the same while settlement in
varied QPLs, thus we only report the accuracy. Table 4
indicates two points: 1) comparing with PGN, our QFTN-L
with LSTM as feature extraction has higher accuracy in the
optimal QPL selection. The contribution of LSTM to our
model can also be proved in the ablation study in Table 2.
2) QFTN-U has less accuracy in optimal QPL prediction
compared with QFTN-L, due to the larger action space
brings difficulties in decision. Nevertheless, QFTN-U earns
higher CPR and SR. We visualize the reward in the training
process and the actions made in testing as shown in Figure 6.
FIGURE 7 | Effects of the different settings in action space size.
We analyze that the better performance of QFTN-U is due to
the more accurate judgment of trading direction (see their
accuracy in the trading direction classification). In addition,
QFTN-U can explore its policy in a broader range. When the
agent perceives changes in the market environment daily price range ideally. Therefore, if the QPL that brings
confidently, it can select the QPL farther than the ground the maximum reward is not in the model’s action space,
state as the target price for order closing, rather than only the enlarging the action space will be more possible to capture
first positive or negative QPL, thereby obtaining more the global ground truth. However, if the action space has
potential payoff, although the action might not be optimal. covered the ground truth already, it is meaningless to
For instance, if the price is in a substantial increase, agents continue to expand the action space. On the contrary, a
acquire higher rewards by closing orders at +3 QPL rather than large number of candidate actions can make the decision to
the only positive QPL in QFTN-L’s candidate decisions. be more difficult. We report the results for each dataset in the
According to Figure 6, the trading directions made by two supplementary.
QFTNs are usually the same, but QFTN-U tends to enlarge the
levels of selected QPL to obtain more profit. However, the
Ultra model needs more training episodes to converge 5 CONCLUSION AND FUTURE WORK
normally (GBPUSD, EURUSD, and OILUSe, etc.).
Additionally, the Lite model suffers from the local optimal In this paper, we investigated the Quantum Finance Theory’s
trap on some datasets (AUDUSD and HSI), in which our application in building an end-to-end day-trade RL trader.
model tends to select the same action consistently, e.g., the Lite With a QPL inspired probabilistic loss-and-profit control for
model keeps delivering a short trade with uniform TP setting the order settlement, our model substantiate the profitability
in the -1 QPL for AUDUSD. and robustness in the intraday trading task. Experiments
reveal our QF-TraderNet outperforms other baselines. To
perform intraday trading, we assumed the ground state in t-
4.5 Increasing the Size of Action Space th day is available for QF-TraderNet in this work. One
In this section, we compare the average CPR and SR among 8 interesting future work will be combining QF-TraderNet
datasets versus different settings of the action space size in with the state-of-the-art forecasters to perform real-time
Figure 7. We observe that when the size of the action space is trading by a predictor-trader framework in which a
less than 7, increasing this parameter has a positive effect on forecaster predicts the opening price in t-th day for our QF-
system performance. Especially, Figure 5 shows that our lite TraderNet to perform trading.
model fails in the HSI dataset but the ultra one achieves strong
performance. We argue this is because the larger action space
can potentially contribute to trading with complex strategies. DATA AVAILABILITY STATEMENT
However, when the number of candidate actions continues to
increase, SR and CPR decrease after A 7. We analyze as that The original contributions presented in the study are included in
the action space of the daytrade model should cover the the article/Supplementary Material, further inquiries can be
optimal settlement QPL (global ground truth) within the directed to the corresponding author.

Frontiers in Artiﬁcial Intelligence | www.frontiersin.org 10 October 2021 | Volume 4 | Article 749878

Qiu et al. QF-TraderNet

AUTHOR CONTRIBUTIONS Intelligence and Multi-Model Data Processing of Department of

Education of Guangdong Province.
YQ: Conceptualization, Methodology, Implementation and
Experiment, Validation, Formal analysis. Writing and Editing.
YQ: Implementation and Experiment, Editing. YY: Visualization.
Implementation and Experiment. ZC: Implementation and ACKNOWLEDGMENTS
Experiment. RL: Supervision, Reviewing and Editing.
The authors highly appreciate the provision of computing
equipment and facilities from the Division of Science and
FUNDING Technology of Beijing Normal University-Hong Kong Baptist
University United International College (UIC). The authors also
This paper was supported by Research Grant R202008 of Beijing wish to thank Quantum Finance Forecast Center of UIC for the
Normal University-Hong Kong Baptist University United R&D supports and the provision of the platform qffc.org for
International College (UIC) and Key Laboratory for Artiﬁcial system testing and evaluation.

Mohan, S., Mullapudi, S., Sammeta, S., Vijayvergia, P., and Anastasiu, D. C.
REFERENCES (2019). “Stock price Prediction Using News Sentiment Analysis,” in 2019
IEEE Fifth International Conference on Big Data Computing Service and
Chen, C., Zhao, L., Bian, J., Xing, C., and Liu, T.-Y. (2019). “Investment Applications (BigDataService), Newark, CA, April 4–9, 2019, 205–208.
Behaviors Can Tell what inside: Exploring Stock Intrinsic Properties for doi:10.1109/BigDataService.2019.00035
Stock Trend Prediction,” in Proceedings of the 25th ACM SIGKDD Moody, J. E., and Saffell, M. (1998). “Reinforcement Learning for Trading,” in
International Conference on Knowledge Discovery & Data Mining, Advances in Neural Information Processing Systems. Cambridge, MA: MIT
Anchorage, AK, August 4–8, 2019, 2376–2384. Press, 917–923.
Chen, J., Luo, C., Pan, L., and Jia, Y. (2021). Trading Strategy of Structured Mutual Moody, J., and Saffell, M. (2001). Learning to Trade via Direct Reinforcement. IEEE
Fund Based on Deep Learning Network. Expert Syst. Appl. 183, 115390. Trans. Neural Netw. 12, 875–889. doi:10.1109/72.935097
doi:10.1016/j.eswa.2021.115390 Neely, C. J., Rapach, D. E., Tu, J., and Zhou, G. (2014). Forecasting the Equity Risk
Colby, R. W., and Meyers, T. A. (1988). The Encyclopedia of Technical Market Premium: the Role of Technical Indicators. Manage. Sci. 60, 1772–1791.
Indicators. Homewood, IL: Dow Jones-Irwin. doi:10.1287/mnsc.2013.1838
Dempster, M. A. H., and Leemans, V. (2006). An Automated Fx Trading System Pagnottoni, P. (2019). Neural Network Models for Bitcoin Option Pricing. Front.
Using Adaptive Reinforcement Learning. Expert Syst. Appl. 30, 543–552. Artif. Intell. 2, 5. doi:10.3389/frai.2019.00005
doi:10.1016/j.eswa.2005.10.012 Peralta, G., and Zareei, A. (2016). A Network Approach to Portfolio Selection.
Deng, Y., Bao, F., Kong, Y., Ren, Z., and Dai, Q. (2016). Deep Direct Reinforcement J. Empirical Finance 38, 157–180. doi:10.1016/j.jempfin.2016.06.003
Learning for Financial Signal Representation and Trading. IEEE Trans. Neural Pichler, A., Poledna, S., and Thurner, S. (2021). Systemic Risk-Efficient
Netw. Learn. Syst. 28, 653–664. doi:10.1109/TNNLS.2016.2522401 Asset Allocations: Minimization of Systemic Risk as a Network
Gers, F. A., Schmidhuber, J., and Cummins, F. (2000). Learning to Forget: Optimization Problem. J. Financial Stab. 52, 100809. doi:10.1016/
Continual Prediction with Lstm. Neural Comput. 12 (10), 2451–2471. j.jfs.2020.100809
doi:10.1162/089976600300015015 Resta, M., Pagnottoni, P., and De Giuli, M. E. (2020). Technical Analysis on the
Giudici, P., Pagnottoni, P., and Polinesi, G. (2020). Network Models to Enhance Bitcoin Market: Trading Opportunities or Investors’ Pitfall? Risks 8, 44.
Automated Cryptocurrency Portfolio Management. Front. Artif. Intell. 3, 22. doi:10.3390/risks8020044
doi:10.3389/frai.2020.00022 Sutton, R. S., and Barto, A. G. (2018). Reinforcement Learning: An Introduction.
Giudici, P., Leach, T., and Pagnottoni, P. (2021). Libra or Librae? Basket Based Cambridge, MA: MIT press.
Stablecoins to Mitigate Foreign Exchange Volatility Spillovers. Finance Res. Sutton, R. S., McAllester, D. A., Singh, S. P., and Mansour, Y. (2000). “Policy
Lett., 102054. doi:10.1016/j.frl.2021.102054 Gradient Methods for Reinforcement Learning with Function
Huang, D.-j., Zhou, J., Li, B., Hoi, S. C. H., and Zhou, S. (2016). Robust Median Approximation,” in Advances in Neural Information Processing Systems,
Reversion Strategy for Online Portfolio Selection. IEEE Trans. Knowl. Data Eng. 1057–1063.
28, 2480–2493. doi:10.1109/tkde.2016.2563433 Tran, D. T., Iosifidis, A., Kanniainen, J., and Gabbouj, M. (2018). Temporal
Lee, R. S. (2019). Chaotic Type-2 Transient-Fuzzy Deep Neuro-Oscillatory Attention-Augmented Bilinear Network for Financial Time-Series Data
Network (Ct2tfdnn) for Worldwide Financial Prediction. IEEE Trans. Fuzzy Analysis. IEEE Trans. Neural Netw. Learn. Syst. 30, 1407–1418. doi:10.1109/
Syst. 28 (4), 731–745. doi:10.1109/tfuzz.2019.2914642 TNNLS.2018.2869225
Lee, R. (2020). Quantum Finance: Intelligent Forecast and Trading Systems. Vella, V., and Ng, W. L. (2015). A Dynamic Fuzzy Money Management
Singapore: Springer. Approach for Controlling the Intraday Risk-Adjusted Performance of Ai
Li, Z., Yang, D., Zhao, L., Bian, J., Qin, T., and Liu, T.-Y. (2019). Trading Algorithms. Intell. Sys. Acc. Fin. Mgmt. 22, 153–178. doi:10.1002/
“Individualized Indicator for All: Stock-wise Technical Indicator isaf.1359
Optimization with Stock Embedding,” in Proceedings of the 25th ACM Wang, J., Wang, J., Fang, W., and Niu, H. (2016). Financial Time Series Prediction
SIGKDD International Conference on Knowledge Discovery & Data Using Elman Recurrent Random Neural Networks. Comput. Intell. Neurosci.
Mining, Anchorage, AK, August 4–8, 2019, 894–902. 2016, 14. doi:10.1155/2016/4742515
Marques, N. C., and Gomes, C. (2010). “Maximus-ai: Using Elman Neural Wang, J., Zhang, Y., Tang, K., Wu, J., and Xiong, Z. (2019). “Alphastock: A Buying-
Networks for Implementing a Slmr Trading Strategy,” in International Winners-And-Selling-Losers Investment Strategy Using Interpretable Deep
Conference on Knowledge Science, Engineering and Management, Belfast, Reinforcement Attention Networks,” in Proceedings of the 25th ACM
United Kingdom, September 1–3, 2010 (Springer), 579–584. doi:10.1007/ SIGKDD International Conference on Knowledge Discovery & Data
978-3-642-15280-1_55 Mining, Anchorage, AK, August 4–8, 2019, 1900–1908.
Meng, X., Zhang, J.-W., Xu, J., and Guo, H. (2015). Quantum Spatial- Wei, B., Yue, J., Rao, Y., and Boris, P. (2017). A Deep Learning Framework
Periodic Harmonic Model for Daily price-limited Stock Markets. for Financial Time Series Using Stacked Autoencoders and Long-Short
Physica A: Stat. Mech. its Appl. 438, 154–160. doi:10.1016/ Term Memory. Plos One 12, e0180944. doi:10.1371/
j.physa.2015.06.041 journal.pone.0180944

Frontiers in Artiﬁcial Intelligence | www.frontiersin.org 11 October 2021 | Volume 4 | Article 749878

Qiu et al. QF-TraderNet

Wold, S., Esbensen, K., and Geladi, P. (1987). Principal Component Analysis. Publisher’s Note: All claims expressed in this article are solely those of the authors
Chemometrics Intell. Lab. Syst. 2, 37–52. doi:10.1016/0169-7439(87) and do not necessarily represent those of their affiliated organizations, or those of
80084-9 the publisher, the editors and the reviewers. Any product that may be evaluated in
Xiong, Z., Liu, X.-Y., Zhong, S., Yang, H., and Walid, A. (2018). Practical Deep this article, or claim that may be made by its manufacturer, is not guaranteed or
Reinforcement Learning Approach for Stock Trading. arXiv preprint arXiv:1811.07522. endorsed by the publisher.
Ye, C., and Huang, J. P. (2008). Non-classical Oscillator Model for Persistent
Fluctuations in Stock Markets. Physica A: Stat. Mech. its Appl. 387, 1255–1263. Copyright © 2021 Qiu, Qiu, Yuan, Chen and Lee. This is an open-access article
doi:10.1016/j.physa.2007.10.050 distributed under the terms of the Creative Commons Attribution License (CC BY).
The use, distribution or reproduction in other forums is permitted, provided the original
Conflict of Interest: The authors declare that the research was conducted in the author(s) and the copyright owner(s) are credited and that the original publication in
absence of any commercial or financial relationships that could be construed as a this journal is cited, in accordance with accepted academic practice. No use,
potential conflict of interest. distribution or reproduction is permitted which does not comply with these terms.

Frontiers in Artiﬁcial Intelligence | www.frontiersin.org 12 October 2021 | Volume 4 | Article 749878

Deep_Reinforcement_Learning_for_Quantitative_Trading
No ratings yet
Deep_Reinforcement_Learning_for_Quantitative_Trading
7 pages
Algorithmic Trading On Financial Time Series Using
No ratings yet
Algorithmic Trading On Financial Time Series Using
20 pages
Deep Reinforcement Learning for Quantitative Trading
No ratings yet
Deep Reinforcement Learning for Quantitative Trading
7 pages
10 DeepScalper A Risk-Aware Reinforcement Learning Framework
No ratings yet
10 DeepScalper A Risk-Aware Reinforcement Learning Framework
10 pages
Self-Operating Stock Exchange: A Deep Reinforcement Learning Approach
No ratings yet
Self-Operating Stock Exchange: A Deep Reinforcement Learning Approach
16 pages
Beating The Stock Market With A Deep Reinforcement Learning Day Trading System
No ratings yet
Beating The Stock Market With A Deep Reinforcement Learning Day Trading System
8 pages
A Deep Reinforcement Learning Trader Without Offline Training
No ratings yet
A Deep Reinforcement Learning Trader Without Offline Training
17 pages
Quantitative Trading Using Deep Q Learning
No ratings yet
Quantitative Trading Using Deep Q Learning
13 pages
Kumar - Deep Recurrent Q-Networks For Market Making PDF
No ratings yet
Kumar - Deep Recurrent Q-Networks For Market Making PDF
10 pages
Deep Reinforcement Learning For Active High Frequency Trading
No ratings yet
Deep Reinforcement Learning For Active High Frequency Trading
9 pages
impt ml2
No ratings yet
impt ml2
5 pages
Reinforcement Learning For Quantitative Trading - 2021
No ratings yet
Reinforcement Learning For Quantitative Trading - 2021
26 pages
TA23
No ratings yet
TA23
9 pages
Reinforcement Learning For Quantitative Trading: Shuo Sun Rundong Wang Bo An
No ratings yet
Reinforcement Learning For Quantitative Trading: Shuo Sun Rundong Wang Bo An
29 pages
Financial Trading As A Game: A Deep Reinforcement Learning Approach
No ratings yet
Financial Trading As A Game: A Deep Reinforcement Learning Approach
15 pages
C D L O B R L P T: Ombining EEP Earning On Rder Ooks With Einforcement Earning For Rofitable Rading
No ratings yet
C D L O B R L P T: Ombining EEP Earning On Rder Ooks With Einforcement Earning For Rofitable Rading
41 pages
mathematics-12-01621
No ratings yet
mathematics-12-01621
22 pages
Application_of_Deep_Reinforcement_Learning_to_Algo_Trading
No ratings yet
Application_of_Deep_Reinforcement_Learning_to_Algo_Trading
15 pages
Pplication of Deep Reinforcement Learning For Ndian Stock Trading Automation
No ratings yet
Pplication of Deep Reinforcement Learning For Ndian Stock Trading Automation
9 pages
2004A Q-Learning Based Approach to Design of Intelligent Stock Trading Agents
No ratings yet
2004A Q-Learning Based Approach to Design of Intelligent Stock Trading Agents
4 pages
Deep Reinforcement Learning
No ratings yet
Deep Reinforcement Learning
16 pages
Knowledge-Based Systems TSF Trading
No ratings yet
Knowledge-Based Systems TSF Trading
10 pages
Learning To Trade With Deep Actor Critic Methods
No ratings yet
Learning To Trade With Deep Actor Critic Methods
6 pages
DeepScalper: A Risk-Aware Reinforcement Learning Framework To Capture Fleeting Intraday Trading Opportunities
No ratings yet
DeepScalper: A Risk-Aware Reinforcement Learning Framework To Capture Fleeting Intraday Trading Opportunities
10 pages
Deep Robust Reinforcement Learning For Practical Algorithmic Trading
No ratings yet
Deep Robust Reinforcement Learning For Practical Algorithmic Trading
9 pages
Reinforcement CNN With Candlestick 2022
No ratings yet
Reinforcement CNN With Candlestick 2022
25 pages
Intelligent Algorithmic Trading Strategy Using Reinforcement Learning and Directional Change
No ratings yet
Intelligent Algorithmic Trading Strategy Using Reinforcement Learning and Directional Change
13 pages
Deep Reinforcement Learning For Algorithmic Trading
No ratings yet
Deep Reinforcement Learning For Algorithmic Trading
9 pages
Analysis of Algorithmic Trading With Q-Learning in The Forex Market
No ratings yet
Analysis of Algorithmic Trading With Q-Learning in The Forex Market
5 pages
2411.17900v1
No ratings yet
2411.17900v1
12 pages
540900b455
No ratings yet
540900b455
2 pages
A_Futures_Quantitative_Trading_Strategy_Based_on_a_Deep_Reinforcement_Learning_Algorithm
No ratings yet
A_Futures_Quantitative_Trading_Strategy_Based_on_a_Deep_Reinforcement_Learning_Algorithm
5 pages
s00521-020-05359-8
No ratings yet
s00521-020-05359-8
16 pages
Application of Deep Reinforcement Learning To Algo Trading
No ratings yet
Application of Deep Reinforcement Learning To Algo Trading
19 pages
A Mean-VaR Based Deep Reinforcement Learning Framework For Practical Algorithmic Trading
No ratings yet
A Mean-VaR Based Deep Reinforcement Learning Framework For Practical Algorithmic Trading
14 pages
Quantitative Trading Using Deep Q Learning
No ratings yet
Quantitative Trading Using Deep Q Learning
10 pages
Good - DRL Survey
No ratings yet
Good - DRL Survey
25 pages
Deep_Reinforcement_Learning_Algorithms_for_Profitable_Stock_Trading_Strategies
No ratings yet
Deep_Reinforcement_Learning_Algorithms_for_Profitable_Stock_Trading_Strategies
6 pages
2利用深度強化學習於股票市場_unlocked
No ratings yet
2利用深度強化學習於股票市場_unlocked
37 pages
5587-Article Text-8812-1-10-20200512
No ratings yet
5587-Article Text-8812-1-10-20200512
8 pages
Stock Market Prediction Using CNN and LSTM
No ratings yet
Stock Market Prediction Using CNN and LSTM
7 pages
Dynamic Replication Hedging Nyu P Kolm
No ratings yet
Dynamic Replication Hedging Nyu P Kolm
41 pages
Reinforcement Learning For High Frequency Market Making
No ratings yet
Reinforcement Learning For High Frequency Market Making
6 pages
Reinforcement Learning For Options Trading
No ratings yet
Reinforcement Learning For Options Trading
17 pages
A Multi-layer and Multi-Ensemble Stock Trader Using Deep Learning and Deep Reinforcement Learning - Anna’s Archive
No ratings yet
A Multi-layer and Multi-Ensemble Stock Trader Using Deep Learning and Deep Reinforcement Learning - Anna’s Archive
17 pages
A Q-learning Agent for Automated Trading in Equity Stock Markets - Anna’s Archive
No ratings yet
A Q-learning Agent for Automated Trading in Equity Stock Markets - Anna’s Archive
34 pages
J Eswa 2018 09 036
No ratings yet
J Eswa 2018 09 036
36 pages
Practical Algorithmic Trading Using State Represen
No ratings yet
Practical Algorithmic Trading Using State Represen
12 pages
Deep Reinforcement Learning Approach For Trading Automation in The Stock Market
No ratings yet
Deep Reinforcement Learning Approach For Trading Automation in The Stock Market
10 pages
A Novel Deep Reinforcement Learning Based Automated Stock Trading System Using Cascaded LSTM Networks
No ratings yet
A Novel Deep Reinforcement Learning Based Automated Stock Trading System Using Cascaded LSTM Networks
11 pages
Deep Policy Gradient Methods in Commodity Markets
No ratings yet
Deep Policy Gradient Methods in Commodity Markets
114 pages
Stock Predection Paper Reference
No ratings yet
Stock Predection Paper Reference
3 pages
33 Optimization of Multi Factor M
No ratings yet
33 Optimization of Multi Factor M
7 pages
Neurips 2018
No ratings yet
Neurips 2018
7 pages
Deep Reinforcement Learning For Trading: Correspondence To: Zihao Zhang
No ratings yet
Deep Reinforcement Learning For Trading: Correspondence To: Zihao Zhang
16 pages
Fin RL
No ratings yet
Fin RL
11 pages
B3387CBQ
No ratings yet
B3387CBQ
5 pages
Integrating and Optimizing Technical Indicators to Automated Bitcoin Trading Bot
No ratings yet
Integrating and Optimizing Technical Indicators to Automated Bitcoin Trading Bot
20 pages
5th alternative paper
No ratings yet
5th alternative paper
31 pages
Duplex Models of Complex Systems
From Everand
Duplex Models of Complex Systems
Steven H. Kim
No ratings yet
Machine Learning-Based Fuzz Testing Techniques A Survey
No ratings yet
Machine Learning-Based Fuzz Testing Techniques A Survey
18 pages
Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) Network
No ratings yet
Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) Network
39 pages
EEG Based Emotion Recognition With Deep
No ratings yet
EEG Based Emotion Recognition With Deep
15 pages
Toxic Comment Analyser: Indian Institute of Information Technology, Kalyani
No ratings yet
Toxic Comment Analyser: Indian Institute of Information Technology, Kalyani
37 pages
Using Machine Learning Tools To Predict Compressor Stall: Samuel M. Hipple
No ratings yet
Using Machine Learning Tools To Predict Compressor Stall: Samuel M. Hipple
9 pages
Group - Vii, Nse Project-2
No ratings yet
Group - Vii, Nse Project-2
11 pages
Chapter 5 AI
No ratings yet
Chapter 5 AI
27 pages
Sentiment Analysis On Twitter Using Neural Network
No ratings yet
Sentiment Analysis On Twitter Using Neural Network
7 pages
Accident Detection and Warning System
No ratings yet
Accident Detection and Warning System
4 pages
Crypto Currency Prediction
100% (1)
Crypto Currency Prediction
34 pages
Toward Multilingual Neural Machine Translation With Universal Encoder and Decoder
No ratings yet
Toward Multilingual Neural Machine Translation With Universal Encoder and Decoder
10 pages
Thesis: Stock Trading Approaches Using LSTM and Gans
No ratings yet
Thesis: Stock Trading Approaches Using LSTM and Gans
72 pages
9
No ratings yet
9
29 pages
Blockchain Based Rumor Detection Approach For COVID 19: Poonam Rani Vibha Jain Jyoti Shokeen Arnav Balyan
No ratings yet
Blockchain Based Rumor Detection Approach For COVID 19: Poonam Rani Vibha Jain Jyoti Shokeen Arnav Balyan
15 pages
ML Ete Sanjay Sir PDF
No ratings yet
ML Ete Sanjay Sir PDF
7 pages
Paper On Computer Vision
No ratings yet
Paper On Computer Vision
7 pages
Wisen Document Text
No ratings yet
Wisen Document Text
26 pages
A Spam Transformer Model For SMS Spam Detection
No ratings yet
A Spam Transformer Model For SMS Spam Detection
5 pages
KFRNN An Effective False Data Injection Attack Detection in Smart Grid Based On Kalman Filter and Recurrent Neural Network
No ratings yet
KFRNN An Effective False Data Injection Attack Detection in Smart Grid Based On Kalman Filter and Recurrent Neural Network
12 pages
Tensorflow Developer Certificate: Candidate Handbook
No ratings yet
Tensorflow Developer Certificate: Candidate Handbook
8 pages
Learning Agents & Factors For Designing Learning Agents
No ratings yet
Learning Agents & Factors For Designing Learning Agents
64 pages
Redes Neurais Feedforward
No ratings yet
Redes Neurais Feedforward
53 pages
Software Reliability Prediction Using Machine Learning and Deep Learning
No ratings yet
Software Reliability Prediction Using Machine Learning and Deep Learning
6 pages
Classifying Non-Functional Requirements Using RNN
No ratings yet
Classifying Non-Functional Requirements Using RNN
7 pages
Fake News Project
No ratings yet
Fake News Project
21 pages
Indonesian Export Arima LSTM
No ratings yet
Indonesian Export Arima LSTM
8 pages
Knowing When To Look - Adaptive Attention Via A Visual Sentinel For Image Captioning
No ratings yet
Knowing When To Look - Adaptive Attention Via A Visual Sentinel For Image Captioning
9 pages
NLP Notes
No ratings yet
NLP Notes
11 pages
ms-data-science-deakin-programme-deakin (1) (1)
No ratings yet
ms-data-science-deakin-programme-deakin (1) (1)
20 pages
Satish 2
No ratings yet
Satish 2
4 pages