QF-TraderNet Intraday Trading Via Deep Reinforceme
QF-TraderNet Intraday Trading Via Deep Reinforceme
Reinforcement Learning (RL) based machine trading attracts a rich profusion of interest.
However, in the existing research, RL in the day-trade task suffers from the noisy financial
movement in the short time scale, difficulty in order settlement, and expensive action
search in a continuous-value space. This paper introduced an end-to-end RL intraday
trading agent, namely QF-TraderNet, based on the quantum finance theory (QFT) and
deep reinforcement learning. We proposed a novel design for the intraday RL trader’s
action space, inspired by the Quantum Price Levels (QPLs). Our action space design also
Edited by: brings the model a learnable profit-and-loss control strategy. QF-TraderNet composes
Ronald Hochreiter, two neural networks: 1) A long short term memory networks for the feature learning of
Vienna University of Economics and
Business, Austria
financial time series; 2) a policy generator network (PGN) for generating the distribution of
Reviewed by:
actions. The profitability and robustness of QF-TraderNet have been verified in multi-type
Paolo Pagnottoni, financial datasets, including FOREX, metals, crude oil, and financial indices. The
University of Pavia, Italy experimental results demonstrate that QF-TraderNet outperforms other baselines in
Taiyong Li,
Southwestern University of Finance terms of cumulative price returns and Sharpe Ratio, and the robustness in the
and Economics, China acceidential market shift.
*Correspondence: Keywords: quantum finance, quantum price level, reinforcement learning, automatic trading, intelligent trading
Raymond Lee system
raymondshtlee@uic.edu.cn
FIGURE 1 | An early-stop loss problem: a short order is early settled (red dash line: SL) before the price drops to the profitable range. Thus, the strategy loses the
potential profit (blue double arrow).
FIGURE 2 | Illustration of AUDUSD’s QPLs in 3 consecutive trading days (23/04/2020–27/04/2020) in 30-min K-line graph. The blue lines represent negative QPLs
based on the ground state (black dash line); the red lines are positive QPLs. Line color deepens with the rise of the QPL level n.
terminate the transaction and avoid a further loss if the price temporal feature in financial time series; 2) a policy generator
moved towards a loss direction (e.g., the price dropped down network (PGN) for generating the distribution of actions (policy)
following a long position decision). These two hyperparameters in each state. We especially reference the Quantum Price Levels
are defined as a fixed shift relative to price to enter the market, as (QPLs) as illustrated in Figure 2 to design the action space for the
known as, points. If the price touched these two-preset levels, the RL agent, thus discretizing the price-value space. Our method is
order will be closed deterministically. An instance of the early- inspired by the Quantum Finance Theory that QPLs captures the
stop order is shown in Figure 1. equilibrium states of price movement on a daily basis (Lee, 2020).
Focusing on the mentioned challenges, we proposed a deep We utilize the deep reinforcement learning algorithm to update the
reinforcement learning-based end-to-end learning model, named trainable parameters of QF-TraderNet iteratively to maximize the
QF-TraderNet. Our model directly generates the trading policy to cumulative price return.
control profit and loss instead of using fixed TP and SL. QF- Experiments on various financial datasets, including the
TraderNet comprises two neural networks with different functions: financial indices, metals, crude oil, and FOREX, and
1) a Long-short Term Memory (LSTM) networks for extracting the comparisons with previous RL and DL-based single-product
trading systems have been conducted. Our QF-TraderNet Nowadays, the AI-based trading, especially, the reinforcement
outperforms some state-of-the-art baselines in the profitability learning-approach, attracts the interest in both academia and
evaluated by the cumulative return and the risk-adjusted return industry. Moody and Saffell (2001) proposed a direct
(Sharpe ratio), and the robustness facing market turbulence. Our reinforcement algorithm to trade and performed a
model shows adaptability in the unseen market environment. The comprehensive comparison between the Q-learning with the
generated policy of QF-TraderNet also provides an explainable policy gradient. Huang et al. (2016) further propose a robust
profit-and-loss order control strategy. trading agent based on the deep-Q networks (DQN). Deng et al.
Our main contributions could be summarized as: (2016) utilized the fuzzy logic with a deep learning model to
extract the financial feature from noisy time series, which
• We propose a novel end-to-end daytrade model that achieved state-of-the-art performance in the single-product
directly learns the optimal price level to settle, thus trading. Xiong et al. (2018) employed the Deep Deterministic
solving the early stop in an implicit stop-loss and target- Policy Gradient (DDPG) baesd on the standard actor-critic
profit setting. framework to perform the stock trading. The experiments
• We are the first to present RL agent’s action space via the demonstrated their profitability over the baselines including
daily quantum price level, making the machine day trade the min-variance portfolio allocation method and the technical
tractable. approach based on the Dow Jones Industrial Average (DJIA)
• Under the same market information perception, we achieve index. Wang et al. (2019) employed the RL algorithm to construct
better profitability and robustness than previous state-of- the winner and loser portfolio and traded in the buy-winner-sell-
the-art RL based models. loser strategy. However, the intraday trading task for reinforced
trading agent are still less addressed, which is mainly because the
complexity in designing trading space for frequent trading
2 RELATED WORK strategy. We dominantly aim at the efficient intraday trading
in our research.
Our work is in line with two sub-tasks: financial feature extraction
and transactions based on deep reinforcement learning. We
shortly review past studies. 3 QF-TRADERNET
Daytrade refers to the strategy of taking a position and leaving the
2.1 Financial Feature Extraction and market within one trading day. We let our model sends an order
Representation when the market is opened every trading day. Based on the
Computational approaches for the applications in financial observed environment, we train QF-TraderNet to learn the
modeling have attracted much attention in the past. (Peralta optimal QPL to settle. We will introduce the QPL based
and Zareei, 2016). utilized the network model to perform the action space search and model architecture separately.
portfolio planning and selection. Giudici et al. (2021) used
volatility spillover decomposition methods to model the
relations between two currencies. Resta et al. (2020) conducted 3.1 Quantum Finance Theory Based Action
a technical analysis-based approach to identify the trading Space Search
opportunities with specific on cryptocurrency. Among these, Quantum finance theory elaborated on the relationship between
the neural networks shows promising ability in learning both the secondary financial market and the classical-quantum
the structured and unstructured data. Most of the related works in mechanics model (Lee, 2020) (Meng et al., 2015) (Ye and
neural financial modeling were made to the relationship Huang, 2008). QFT proposes an anharmonic oscillator model
embedding (Li et al., 2019) and forecasting (Wei et al., 2017), to embed the interrelationships among financial products. It
option pricing (Pagnottoni, 2019), and forecasting (Neely et al., considers the dynamics of the financial products are affected
2014). The long short-term memory networks (LSTM) (Wei by the energy field generated by itself and other financial product
et al., 2017), Elman recurrent neural networks (Wang et al., (Lee, 2020). The energy levels generated from the field of particle
2016) were employed in financial time series analysis tasks regulate the equilibrium states of price movement on a daily basis,
successfully. Tran et al. (2018) utilized the attention which is noted as the daily quantum price level (QPL). QPLs
mechanism to refine RNN. (Mohan et al., 2019). leveraged could be viewed as the support or resistance in classical financial
both market and textual information to boost the performance analysis indeed. Past studies (Lee, 2019) have shown that QPLs
of stock prediction. Some studies also adopted stock embedding can be used as feature extraction for the financial time series. The
to mine the affinity indicators (Chen et al., 2019). procedure of the QPL calculation is given with the
following steps.
2.2 Reinforcement Learning in Trading
Algorithmic trading has been widely studied in its different Step 1: Modeling the Potential Energy of Market
subareas, including risk control (Pichler et al., 2021), portfolio Movement via Four Major Market Participants
optimization (Giudici et al., 2020), and trading strategy (Marques Same with the classical quantum mechanics, the Hamiltonian
and Gomes, 2010; Vella and Ng, 2015; Chen et al., 2021). in QFT contains the potential term and the volatility term.
3.2 Deep Feature Learning and in timestep t, model takes action at by sampling from the policy
Representation by LSTM Networks a+−
t comprised of long (+) and short (-) trading direction. at
+−
LSTM networks show promising performance in the sequential contains A dimensions, indicating the number of candidate
feature learning, as its structural adaptability (Gers et al., 2000). actions, with the reward of price return rit for each,
We introduce the LSTM networks to extract the temporal features ⎨ δ QPLδi − pot
⎧ , ∀QPLδi ∈ pht , plt
of the financial series, thus improving the perception in the rit ⎩ (14)
δ pct − pot , ∀QPLδi ∉ pht , plt
market status of the policy generation network (PGN).
We use the same look-back window in (Wang et al., 2019) with where δ denotes the trading direction: for actions with +QPL as
size W to split the input sequence x from the completed series the target price level to settle, the trading will be determined as
S (s1 , s2 , . . . , st , . . . , sT ), i.e., agent evaluates the market status long buy (δ + 1); for the actions in -QPL, short sell (δ − 1)
by the time period with size W. Hence, the input matrix of LSTM trading will be performed; and δ is 0 when the decision is made to
could be noted as X (x1 , x2 , . . . , xt , . . . , xT−W+1 ), where be neutral, as no trading will be made in t trading day.
xt (st−W+w |w ∈ [1, W])T . We design our input vectors st is We train our QF-TraderNet with reinforcement learning. The
constituted by: 1) Opening, highest, lowest and closing prices key idea is to maintain a loop with the successive steps: 1) agent π
for each trading day. Note: the close price in t − 1 day might aware the environment, 2) π make the action, and 3) adjust its
be different with the open price in t because of the adjustment of behavior to receive more reward until the agent has received its
the market outside the trading hours; hence, we consider the learning goal (Sutton and Barto, 2018). Therefore, for each
entire price variables with four types. 2) Transaction Volume. 3) training episode, a trajectory τ
Moving Average Convergence-Divergence is a technical indicator {(h1 , a1 ), (h2 , a1 ), . . . , (ht−1 , aT )} could be defined as the
to identify the market status. 4) Relative strength index is a sequence of state-action tuple, with the corresponding return
technical indicator measuring the price momentum. 5) sequence1r {r1 , r2 , r3 , . . . , rT }. The probability of action Pr
Bollinger Band (main, upper, and lower) can be applied to (actiont i) for each QPL is determined by QF-TraderNet as:
identify the potential price range, consequently observing the
market trend (Colby and Meyers, 1988). 6) KDJ (stochastic ait Practiont QPL(i) |X;~ θ, ξ (15)
oscillator) is used in short-term oriented trading by the price
velocity techniques (Colby and Meyers, 1988). πPGN LSTM(xt ) (16)
θ ξ
The principal components analysis (PCA) (Wold et al., 1987) actioni
is utilized to compress the series data S into F~ dimension and let Rτ denotes the cumulative price return for trajectory τ, with
denoise (Wold et al., 1987). Subsequently, the L2 normalization is T−W+1 (i)
t1 rt Rτ . Then, for all possible explored trajectories, the
applied to scale the input features to be in the same magnitude. expectation reward obtained by the RL agent could be evaluated
The preprocessing is calculated as, as (Sutton et al., 2000),
PCA(X)
X~ F→F~
(10) Jπ (θ, ξ) Rτ Pr(τ; θ, ξ) dτ (17)
τ π
PCA (X)2
F→F~
where Pr(τ|θ, ξ ) is the probability for QF-TraderNet agent π with
, where F~ < F, and the deep feature learning model could be parametersπ θ and ξ to generate trajectory τ with Monte-Carlo
described as, Simulation. Then, the objective is to maximize the expectation of
reward, θ*, ξ* argmaxθ,ξ J (θ,ξ). We substitute objective with its
ht LSTM(xt ), t ∈ [0, T − W + 1] (11) inverse to and use gradient descent to optimize. To avoid the local
ξ
minimum probelm caused by the multiple postive-reward
where ξ is the trainable parameters for LSTM. actions, we use the state-dependent threshold method (Sutton
and Barto, 2018) to allow the RL agent perform a more efficient
optimization. The detailed gradient calculation is given in the
3.3 Policy Generator Networks (PGN)
supplementary.
Given the learned feature vector ht, PGN directly produces the
output policy, i.e., the probability of settling order in each + QPL
and -QPL, according to the action score z it produced by a fully- 3.4 Trading Policy With Learnable Soft Profit
connected networks (FFBPN).
and Loss Control
z it FFBPN(ht ; W θ , bθ ) (12) In QF-TraderNet, the LSTM networks learn the hidden
θ
representation and feed it into PGN; then PGN generates the
where θ deontes the parameters of FFBPN, with the weighted learned policy to decide the target QPL to settle. As the action is
matrix Wθ and bias bθ. Let ait denotes i − th action at time t. The sampled from the generated policy, QF-TraderNet adopts a soft
output policy at is calculated as,
expz it
a+−
t (13) 1
r in here denotes the reward of RL agent, rather than the previous price return r(t)
ai′ ∈[1,A]
expz it′
in the QPL evaluation
FIGURE 4 | A case study illustrates our profit-and-loss control strategy. The trading policy is uniformly distributed initially. Ideally, our model assigns the +3 QPL
action which earns the maximum profit with the largest probability as S-TP. On the short side, −1 QPL can take the most considerable reward, leading to being
accredited the maximum probability as S-SL.
profit-and-loss control strategy rather than the deterministic TP better target price for settlement in the entire action space.
and SL. The overall summary of QF-TraderNet architecture has Therefore, the model is more flexible for the SL and TP
been shown in Figure 3. control in different states, compared with using a couple of
An equivalent way to interpret our strategy is that our model preset “hard” hyperparameters.
trades with long buy if the decision is made in positive QPL. In
reverse, short sell transactions will be delivered. Once the trading
direction is decided, the target QPL with the maximum 4 EXPERIMENTS
probability will be considered as the soft target price (S-TP),
and the soft stop loss line will be the QPL with the highest We conduct the empirical evaluation for our QF-TraderNet in
probability in the opposite trading direction. One exemplification various types of financial datasets. In our experiment, eight
is presented in Figure 4. datasets from 4 categories are used, including 1) foreign
Since the S-TP and S-SL control is probability-based, when the exchange product: Great Britain Pounds vs. United States
price touches the stop loss line prematurely, QF-TraderNet will Dollar (GBPUSD), Australian Dollar vs. United States Dollar
not be forced to do the settlement. It will think whether there is a (AUDUSD), Euro vs. United States Dollar (EURUSD),
FIGURE 5 | 1st panel: Continuous partition for the training and verification data; 2nd panel: Affected by the global economic situation, most datasets showed a
downward trend at the testing interval, accompanied by highly irregular oscillations; the 3rd panel: cumulative reward curve for different methods in testing evaluation.
United States Dollar vs. Swiss Franc (USDCHF); 2) financial corresponding currency or asset with a value of 10,000, at a
indices: S&P 500 Index (S&P500), Hang Seng Index (HSI); 3) transaction cost with 0.3% (Deng et al., 2016). All the
Metal: Silver vs. United States Dollar (XAGUSD), and 4) experiments are conducted in the single NVIDIA GTX Titan
Crude oil: Oil vs. United States Dollar (OILUSe). The X GPU.
evaluation is conducted from the perspective of earning
profits; and the robustness when agents face the 4.2 Models Settings
unexpected change of market states. We also investigate the To compare our model with the traditional methods, we select the
impact of different settings of our proposed QPL based action forecasting based trading model and other state-of-the-art
space search for RL trader, and the ablation study of reinforcement learning-based trading agents as the baseline.
our model.
• Market baseline (Huang et al., 2016). This strategy is used to
4.1 Experiment Settings measure the overall performance of the market during this
All datasets utilized in experiments are fetched from the free and period T, by holding the product consistently.
opened historical data center in MetaTrader 4, which is a • DDR-RNN. Following the idea of Deep Direct
professional trading platform for the FOREX, financial Reinforcement, but we apply the principal component
indices, and other securities. We download the raw time analysis (PCA) to denoise and composes data. We also
series data, around 2048 trading days, and we split the 90% employ RNN to learn the features, and a two-layer FFBPN
front of data for training and validation. The rest will be utilized as the policy generator rather than the logistic regression in
as out-of-sample verification, i.e., the continuous series from original design. This model can be regarded as the ablation
November 2012 to July 2019, has been spliced to construct the study of QF-TraderNet without the QPL action space
sequential training sample; the rest part is applied as testing and search.
validation. To be noticed, the evaluation period has covered the • FCM, a forecasting model based on RNN trend predictor,
recent fluctuations in the global financial market caused by the consisting of a 7-layer LSTM with 512 hidden dimensions. It
COVID-19 pandemic, which could be utilized as the robustness trades with a Buy-Winner-Sell-Loser strategy.
test when the trading agent is handling the unforeseen market • RF. Same design with FCM but predict the trend via
fluctuations. The size of look-back window is set at 3, and the Random Forest.
metrics regarding price return and Sharpe ratio is daily • QF-PGN. QF-PGN is the policy gradient based RL agent
calculated. In the backtest, initial capital is set to the with QPL based order control. Single FFBPN is utilized as
Models HSI S&P500 Silver Crude oil USDCHF GBPUSD EURUSD AUDUSD
CPR SR CPR SR CPR SR CPR SR CPR SR CPR SR CPR SR CPR SR
Market 555.00 0.01 2,122.27 0.05 −12.66 −0.03 −90.79 −0.04 0.19 0.02 −0.07 −0.01 −0.05 −0.01 −0.04 −0.01
RNN-FCM 1,251.78 0.03 361.94 0.09 -11.67 -0.07 6.76 0.02 0.04 0.02 −0.14 −0.07 −0.24 −0.13 0.05 0.04
RF-FCM 3,846.31 0.09 336.27 0.06 23.60 0.91 112.88 1.13 0.11 0.16 0.29 0.33 0.20 0.53 −0.04 −0.07
DDR-RNN 4,505.00 0.10 345.50 0.03 1.53 0.02 −4.57 −0.02 0.07 0.09 -0.02 -0.02 0.08 0.15 <0.01 −0.08
FDRNN 1,536.00 0.04 731.73 0.07 2.80 0.04 −9.38 −0.03 0.08 0.10 0.05 0.04 −0.08 −0.10 0.05 0.12
QF-PGN 3,244.35 0.07 3,133.76 1.88 1.94 0.05 138.34 2.00 −0.08 −0.11 0.28 0.37 −0.03 −0.05 0.17 0.50
QF-TraderNet Lite 2,779.64 0.17 155.66 0.04 1.56 0.04 82.40 0.54 0.58 1.69 0.61 1.31 0.20 0.65 0.02 0.03
QF-TraderNet Ultra 8,100.51 0.17 4,428.00 1.52 31.24 1.49 164.38 1.44 0.64 1.16 0.92 1.31 0.57 1.11 0.36 0.97
We implement two versions of QF-TraderNet: 1) QF- and the Sharpe ratio is calculated by:
TraderNet Lite (QFTN-L): 2 layers LSTM with 128- Average(CPR)
dimensional hidden vector as the feature representation, and 3 SR (19)
StandardDeviation(CPR)
layers of policy generator network with 128, 64, 32 neurons per
each. The size of action space is 3.2) QF-TraderNet Ultra (QFTN- The result of MARKET denotes that the market is in a downtrend
U): Same architecture with the Lite, but the number of candidate with high volatility in the evaluating interval, due to the recent
actions is enlarged to 7. global economic fluctuation. The price range in testing is not fully
Regarding the training settings, the Adaptive Moment covered in training data in some datasets (crude oil and
Estimation (ADAM) optimizer with 1,500 training epochs is AUDUSD), which tests the models in an unseen environment.
used for all iterative optimization models at a 0.001 learning Under these testing conditions, our QFTN-U trained with CPR
rate. For the algorithms requiring PCA, the target dimensions F~ achieves higher CPR and SR than other comparisons, except the
is set at 4, satisfying the composes matrix has embedded 99.5% SR in S&P500 and Crude Oil. QFTN-L is also comparable to the
TABLE 4 | Decision classification metrices. detail (SUPERVISED counts from the average of RF and FCM), where
Optimal QPL Prediction Trading Direction Prediction the QPL actions dramatically contribute to the Sharpe Ratio of
our full model. These demonstrates the benefit of trading with
Acc. P R F1 Acc. P R F1
QPL to gain considerable profitability and efficient risk-control
Pgn (3x) 0.34 0.25 0.25 0.37 0.34 0.25 0.25 0.37 ability.
Qftn-L (3x) 0.56 0.54 0.50 0.50 0.56 0.54 0.50 0.50 The backtesting results in Table 3 shows the good
Qftn-U (7x) 0.48 — — — 0.80 0.78 0.78 0.82
generalization of the QFTN-U. It is the only strategy for
Bold values indicating the best performance in terms of corresponding metrics. earning a positive profit on almost all datasets, which is
because the day-trading strategy are less affected by the
baselines. It signifies the profitability and robustness of our QF- market trend, compared with other strategies in long,
TraderNet. neutral, and short setting. We also find that the
Moreover, QFTN-L, QFTN-U, and the PGN models yield performance of our model in FOREX datasets is
significantly higher CPR and SR than other RL traders without significantly better than others. FOREX contains more
QPL-based actions (DDR-RNN and FDRNN). The ablation study noise and fluctuations, which indicates the advantages of
in Table 2 also presents the contribution of each component in our models in highly fluctuated products.
Mohan, S., Mullapudi, S., Sammeta, S., Vijayvergia, P., and Anastasiu, D. C.
REFERENCES (2019). “Stock price Prediction Using News Sentiment Analysis,” in 2019
IEEE Fifth International Conference on Big Data Computing Service and
Chen, C., Zhao, L., Bian, J., Xing, C., and Liu, T.-Y. (2019). “Investment Applications (BigDataService), Newark, CA, April 4–9, 2019, 205–208.
Behaviors Can Tell what inside: Exploring Stock Intrinsic Properties for doi:10.1109/BigDataService.2019.00035
Stock Trend Prediction,” in Proceedings of the 25th ACM SIGKDD Moody, J. E., and Saffell, M. (1998). “Reinforcement Learning for Trading,” in
International Conference on Knowledge Discovery & Data Mining, Advances in Neural Information Processing Systems. Cambridge, MA: MIT
Anchorage, AK, August 4–8, 2019, 2376–2384. Press, 917–923.
Chen, J., Luo, C., Pan, L., and Jia, Y. (2021). Trading Strategy of Structured Mutual Moody, J., and Saffell, M. (2001). Learning to Trade via Direct Reinforcement. IEEE
Fund Based on Deep Learning Network. Expert Syst. Appl. 183, 115390. Trans. Neural Netw. 12, 875–889. doi:10.1109/72.935097
doi:10.1016/j.eswa.2021.115390 Neely, C. J., Rapach, D. E., Tu, J., and Zhou, G. (2014). Forecasting the Equity Risk
Colby, R. W., and Meyers, T. A. (1988). The Encyclopedia of Technical Market Premium: the Role of Technical Indicators. Manage. Sci. 60, 1772–1791.
Indicators. Homewood, IL: Dow Jones-Irwin. doi:10.1287/mnsc.2013.1838
Dempster, M. A. H., and Leemans, V. (2006). An Automated Fx Trading System Pagnottoni, P. (2019). Neural Network Models for Bitcoin Option Pricing. Front.
Using Adaptive Reinforcement Learning. Expert Syst. Appl. 30, 543–552. Artif. Intell. 2, 5. doi:10.3389/frai.2019.00005
doi:10.1016/j.eswa.2005.10.012 Peralta, G., and Zareei, A. (2016). A Network Approach to Portfolio Selection.
Deng, Y., Bao, F., Kong, Y., Ren, Z., and Dai, Q. (2016). Deep Direct Reinforcement J. Empirical Finance 38, 157–180. doi:10.1016/j.jempfin.2016.06.003
Learning for Financial Signal Representation and Trading. IEEE Trans. Neural Pichler, A., Poledna, S., and Thurner, S. (2021). Systemic Risk-Efficient
Netw. Learn. Syst. 28, 653–664. doi:10.1109/TNNLS.2016.2522401 Asset Allocations: Minimization of Systemic Risk as a Network
Gers, F. A., Schmidhuber, J., and Cummins, F. (2000). Learning to Forget: Optimization Problem. J. Financial Stab. 52, 100809. doi:10.1016/
Continual Prediction with Lstm. Neural Comput. 12 (10), 2451–2471. j.jfs.2020.100809
doi:10.1162/089976600300015015 Resta, M., Pagnottoni, P., and De Giuli, M. E. (2020). Technical Analysis on the
Giudici, P., Pagnottoni, P., and Polinesi, G. (2020). Network Models to Enhance Bitcoin Market: Trading Opportunities or Investors’ Pitfall? Risks 8, 44.
Automated Cryptocurrency Portfolio Management. Front. Artif. Intell. 3, 22. doi:10.3390/risks8020044
doi:10.3389/frai.2020.00022 Sutton, R. S., and Barto, A. G. (2018). Reinforcement Learning: An Introduction.
Giudici, P., Leach, T., and Pagnottoni, P. (2021). Libra or Librae? Basket Based Cambridge, MA: MIT press.
Stablecoins to Mitigate Foreign Exchange Volatility Spillovers. Finance Res. Sutton, R. S., McAllester, D. A., Singh, S. P., and Mansour, Y. (2000). “Policy
Lett., 102054. doi:10.1016/j.frl.2021.102054 Gradient Methods for Reinforcement Learning with Function
Huang, D.-j., Zhou, J., Li, B., Hoi, S. C. H., and Zhou, S. (2016). Robust Median Approximation,” in Advances in Neural Information Processing Systems,
Reversion Strategy for Online Portfolio Selection. IEEE Trans. Knowl. Data Eng. 1057–1063.
28, 2480–2493. doi:10.1109/tkde.2016.2563433 Tran, D. T., Iosifidis, A., Kanniainen, J., and Gabbouj, M. (2018). Temporal
Lee, R. S. (2019). Chaotic Type-2 Transient-Fuzzy Deep Neuro-Oscillatory Attention-Augmented Bilinear Network for Financial Time-Series Data
Network (Ct2tfdnn) for Worldwide Financial Prediction. IEEE Trans. Fuzzy Analysis. IEEE Trans. Neural Netw. Learn. Syst. 30, 1407–1418. doi:10.1109/
Syst. 28 (4), 731–745. doi:10.1109/tfuzz.2019.2914642 TNNLS.2018.2869225
Lee, R. (2020). Quantum Finance: Intelligent Forecast and Trading Systems. Vella, V., and Ng, W. L. (2015). A Dynamic Fuzzy Money Management
Singapore: Springer. Approach for Controlling the Intraday Risk-Adjusted Performance of Ai
Li, Z., Yang, D., Zhao, L., Bian, J., Qin, T., and Liu, T.-Y. (2019). Trading Algorithms. Intell. Sys. Acc. Fin. Mgmt. 22, 153–178. doi:10.1002/
“Individualized Indicator for All: Stock-wise Technical Indicator isaf.1359
Optimization with Stock Embedding,” in Proceedings of the 25th ACM Wang, J., Wang, J., Fang, W., and Niu, H. (2016). Financial Time Series Prediction
SIGKDD International Conference on Knowledge Discovery & Data Using Elman Recurrent Random Neural Networks. Comput. Intell. Neurosci.
Mining, Anchorage, AK, August 4–8, 2019, 894–902. 2016, 14. doi:10.1155/2016/4742515
Marques, N. C., and Gomes, C. (2010). “Maximus-ai: Using Elman Neural Wang, J., Zhang, Y., Tang, K., Wu, J., and Xiong, Z. (2019). “Alphastock: A Buying-
Networks for Implementing a Slmr Trading Strategy,” in International Winners-And-Selling-Losers Investment Strategy Using Interpretable Deep
Conference on Knowledge Science, Engineering and Management, Belfast, Reinforcement Attention Networks,” in Proceedings of the 25th ACM
United Kingdom, September 1–3, 2010 (Springer), 579–584. doi:10.1007/ SIGKDD International Conference on Knowledge Discovery & Data
978-3-642-15280-1_55 Mining, Anchorage, AK, August 4–8, 2019, 1900–1908.
Meng, X., Zhang, J.-W., Xu, J., and Guo, H. (2015). Quantum Spatial- Wei, B., Yue, J., Rao, Y., and Boris, P. (2017). A Deep Learning Framework
Periodic Harmonic Model for Daily price-limited Stock Markets. for Financial Time Series Using Stacked Autoencoders and Long-Short
Physica A: Stat. Mech. its Appl. 438, 154–160. doi:10.1016/ Term Memory. Plos One 12, e0180944. doi:10.1371/
j.physa.2015.06.041 journal.pone.0180944
Wold, S., Esbensen, K., and Geladi, P. (1987). Principal Component Analysis. Publisher’s Note: All claims expressed in this article are solely those of the authors
Chemometrics Intell. Lab. Syst. 2, 37–52. doi:10.1016/0169-7439(87) and do not necessarily represent those of their affiliated organizations, or those of
80084-9 the publisher, the editors and the reviewers. Any product that may be evaluated in
Xiong, Z., Liu, X.-Y., Zhong, S., Yang, H., and Walid, A. (2018). Practical Deep this article, or claim that may be made by its manufacturer, is not guaranteed or
Reinforcement Learning Approach for Stock Trading. arXiv preprint arXiv:1811.07522. endorsed by the publisher.
Ye, C., and Huang, J. P. (2008). Non-classical Oscillator Model for Persistent
Fluctuations in Stock Markets. Physica A: Stat. Mech. its Appl. 387, 1255–1263. Copyright © 2021 Qiu, Qiu, Yuan, Chen and Lee. This is an open-access article
doi:10.1016/j.physa.2007.10.050 distributed under the terms of the Creative Commons Attribution License (CC BY).
The use, distribution or reproduction in other forums is permitted, provided the original
Conflict of Interest: The authors declare that the research was conducted in the author(s) and the copyright owner(s) are credited and that the original publication in
absence of any commercial or financial relationships that could be construed as a this journal is cited, in accordance with accepted academic practice. No use,
potential conflict of interest. distribution or reproduction is permitted which does not comply with these terms.