A New Fuzzy Reinforcement Learning Method for Effective Chemotherapy

Alsaadi, Fawaz E.; Yasami, Amirreza; Volos, Christos; Bekiros, Stelios; Jahanshahi, Hadi

doi:10.3390/math11020477

Open AccessArticle

A New Fuzzy Reinforcement Learning Method for Effective Chemotherapy

by

Fawaz E. Alsaadi

¹

,

Amirreza Yasami

²,

Christos Volos

^3,*

,

Stelios Bekiros

^4,5,6 and

Hadi Jahanshahi

⁷

¹

Communication Systems and Networks Research Group, Department of Information Technology, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah 21589, Saudi Arabia

²

Department of Mechanical Engineering, University of Alberta, Edmonton, AB T6G 1H9, Canada

³

Nonlinear Laboratory of Nonlinear Systems, Circuits Complexity (LaNSCom), Department of Physics, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece

⁴

FEMA, University of Malta, MSD 2080 Msida, Malta

⁵

LSE Health, Department of Health Policy, London School of Economics and Political Science, London WC2A 2AE, UK

⁶

IPAG Business School (IPAG), 184, bd Saint-Germain, 75006 Paris, France

⁷

Department of Mechanical Engineering, University of Manitoba, Winnipeg, MB R3T 5V6, Canada

^*

Author to whom correspondence should be addressed.

Mathematics 2023, 11(2), 477; https://doi.org/10.3390/math11020477

Submission received: 17 December 2022 / Revised: 9 January 2023 / Accepted: 12 January 2023 / Published: 16 January 2023

(This article belongs to the Special Issue Analysis and Mathematical Modeling of Control Engineering and Path Planning)

Download

Browse Figures

Versions Notes

Abstract

:

A key challenge for drug dosing schedules is the ability to learn an optimal control policy even when there is a paucity of accurate information about the systems. Artificial intelligence has great potential for shaping a smart control policy for the dosage of drugs for any treatment. Motivated by this issue, in the present research paper a Caputo–Fabrizio fractional-order model of cancer chemotherapy treatment was elaborated and analyzed. A fix-point theorem and an iterative method were implemented to prove the existence and uniqueness of the solutions of the proposed model. Afterward, in order to control cancer through chemotherapy treatment, a fuzzy-reinforcement learning-based control method that uses the State-Action-Reward-State-Action (SARSA) algorithm was proposed. Finally, so as to assess the performance of the proposed control method, the simulations were conducted for young and elderly patients and for ten simulated patients with different parameters. Then, the results of the proposed control method were compared with Watkins’s Q-learning control method for cancer chemotherapy drug dosing. The results of the simulations demonstrate the superiority of the proposed control method in terms of mean squared error, mean variance of the error, and the mean squared of the control action—in other words, in terms of the eradication of tumor cells, keeping normal cells, and the amount of usage of the drug during chemotherapy treatment.

Keywords:

Caputo-Fabrizio derivative; cancer chemotherapy drug dosing; fuzzy-reinforcement learning; optimal control; SARSA algorithm; artificial intelligence

MSC:

93C40; 93C42; 68T05

1. Introduction

Cancer is one of the most hazardous and fatal diseases throughout the world [1]. This disease is caused by the abnormal division and spreading of cells that destroy the patient’s body and may lead to death. Though many research efforts are devoted to precisely understanding the interaction between the immune system and tumor cells, the treatment of cancer is still one of the most challenging issues in modern medicine [2,3].

Based on the patient’s conditions and type of cancer, there are several treatments, including radiotherapy, surgery, chemotherapy, immunotherapy, and so forth, to tackle cancer [4]. Cancer treatment can be challenging, as it can have a range of side effects, including fatigue, nausea, and hair loss. Managing these side effects is an important part of cancer care among various treatments. Chemotherapy is one of the most effective treatments for annihilating cancerous cells. For this reason, in this research the chemotherapy method was chosen to treat cancer. Nonetheless, in all chemotherapy treatments, not only do the applied drugs destroy the cancerous cells, but they also affect the healthy cells and kill some of those cells. Hence, it is crucial to make sure that the patient can tolerate the side effects of the treatment [5]. These sorts of side effects lead to a limitation in the dosage of the drugs [6]. Hence, for prescribing drugs, one aim is to decrease the number of cancerous cells as much as possible and reach the minimum side effects [7].

There are many factors which define the treatment schedule and drug dosing. Some of these factors are the weight of the patient, level of white blood cells, the patient’s age, the stage of the tumor, etc.; for this reason, certain established standards are followed by clinicians to define the therapy type and drug dosage for each patient. Howbeit, this approach has some limitations which have been approved by scientific and clinical communities [8,9]. Moreover, evaluating the effectiveness of the chemotherapy plan and its feasibility is of significant importance [9]. So as to evaluate the effectiveness of the chemotherapy plans, the clinical trials are a reliable choice, but they have some limitations such as high costs, long trial times, and they are difficult to be conducted. Keeping the above-mentioned limitations in mind, contriving an efficient chemotherapy treatment plan would be desired.

Mathematical models have been used as valuable tools in understanding the transmission dynamics of cancers [10]. Actually, mathematical modeling could play a pivotal role in the understanding of the disease’s dynamics [11,12,13,14,15]. One common approach to modeling cancer is to use ordinary differential equations (ODEs), which are used to describe the time evolution of a system. These models can represent the growth and spread of cancer cells and the response of the immune system to the cancer. Researchers can then use these models to study the long-term and short-term dynamics of cancer and to identify potential therapeutic strategies. Finding an accurate model of cancer dynamics could pave the way for the long and short-term prediction of the disease as well as the designing of therapies [16]. This field of study not only aims to anticipate the spread of cancerous cells but also to control the disease as effectively as possible [17,18]. A lot of research has been done using mathematical modeling to understand cancer, and various approaches have been used in this field [19,20,21].

Fractional calculus is an excellent tool for the description of hereditary properties, and memory of systems has been widely utilized in various fields of study including economics, biology, ecology, and engineering [22,23,24,25,26,27,28]. In this regard, the fractional-order model of cancers has started to attract some researchers’ attention [29,30]. Although early research studies on fractional-order calculus were based on the Caputo or Riemann–Liouville fractional-order derivative, it has been recently shown that these approaches possess some disadvantages, such as the singularity at the endpoint of an interval of definition [31,32,33]. In tackling this issue, recently other definitions of fractional derivatives have been introduced [34,35,36,37,38,39]. Caputo and Fabrizio offered a new derivative with nonsingular kernel [32], namely the Caputo–Fabrizio derivative. The Caputo and Caputo–Fabrizio derivatives are both types of fractional derivatives, which are a generalization of the classical derivative to functions with non-integer order. The Caputo derivative is based on a power law, which means that it is defined in terms of a power of the function’s argument. The Caputo–Fabrizio derivative, on the other hand, is based on an exponential decay law, which means that it is defined in terms of the exponential decay of the function over time. One key difference between the two derivatives is that the Caputo derivative is defined in terms of a power of the function’s argument, while the Caputo–Fabrizio derivative is defined in terms of an exponential decay law. This means that the two derivatives may behave differently when applied to different types of functions, and they may be used to model different types of physical phenomena.

Caputo and Fabrizio offered a new derivative with nonsingular kernel [32], namely the Caputo-Fabrizio derivative. The principal difference between the Caputo and Caputo–Fabrizio fractional derivative is that the Caputo derivative is based on a power law, while the Caputo–Fabrizio derivative is achieved with an exponential decay law [32,40]. There are several research studies in the literature that demonstrate the applications of the Caputo–Fabrizio derivative to different systems such as biological processes [41,42,43]. In Ref. [44], the underlying physical meaning of the nonsingular kernel as well as the applications of fractional differentiation operators with the non-singular kernel were presented.

During for the past several years, many control strategies have been applied to cancer chemotherapy drug dosing.

In [45], targeted chemotherapy was offered for tumor-immune interaction. As it has been studied in [46,47], in addition to the mathematical interpretation of tumor growth, scheduling appropriate treatment strategies has been an important issue, and control-designing strategies can be utilized to optimize drug dosing mathematically. To this end, up to now, many research studies have worked on these subjects. De Pillis and Radunskaya proposed a dynamical model for immunotherapy and chemotherapy schedules in 2001 [48]. In 2003, to control tumor growth, bang-bang type control was employed with chemotherapy [47]. In 2007, De Pillis et al. used linear and quadratic control to a cure tumor with the use of chemotherapy [49]. Also, through Pontryagin’s maximum principle, Ghaffari and Naserifar have designed optimal therapeutic protocols to schedule immunotherapy [50]. On the other hand, the cost-effectiveness of the treatment strategies has been investigated in [51]. Drug resistance, which is a vital phenomenon in cancerous tumors, has been examined in [52]. In addition, the conditions for an siRNA treatment were investigated to eradicate tumor burden in [53], while a combination of chemotherapy and anti-angiogenic agents was utilized to cure the disease in [54]. In [55], robust adaptive control was presented to adjust the drug dosage with an extended Kalman filter observer. An optimal control strategy based on a linear time-varying approximation technique was proposed in [56]. A model-free method for chemotherapy based on reinforcement learning was proposed in [57] using the closed-loop control. To be precise, they develop an optimal controller using the Q-learning algorithm for cancer chemotherapy treatment. In [47], the authors model the chemotherapy treatment based on optimal control theory, and the objective was to minimize the tumor cells while keeping the healthy cells above a fixed level.

The use of state-of-the-art automatic control methods can help to improve the effectiveness and efficiency of various processes and systems, leading to better outcomes and more sustainable solutions [58,59,60,61,62,63,64,65,66,67]. Among the stated control strategies, intelligent controllers have significant advantages [68,69,70,71,72,73]. Artificial intelligence does combine a wide variety of new technologies to give systems an ability to make decisions in new and unfamiliar conditions [74,75]. In some applications, due to the high value of their tasks and their remarkable risks, implementing a reliable controller is the main concern. Where the degree of uncertainty is high, classical methods of control may fail [76,77]. Hence, artificial intelligence-based controllers are rational choices for such systems. Moreover, optimal controls which could be provided based on artificial intelligence are helpful for drug dosage because of their ability to consider various aspects of the biological systems in the optimization function. For example, through intelligent approaches, we are able to design treatments which can take to account the number of healthy and infected cells as well as the cost and side effects of the drugs without having any mathematical model. Reinforcement learning is one of the most popular learning techniques, which explores the system’s response in possible actions; this way, it learns the optimal action by calculating how the last action pushes the system towards desired situations [78,79,80].

As mentioned, fractional calculus is very advantageous to the modeling of real-world processes [81,82]. Motivated by this background, in the present study, we investigated a Caputo–Fabrizio model of cancer, which is a new model; studies are quite rare in these systems. Also, few studies are dedicated to the evaluation of reinforcement learning methods in chemotherapy drug dosing; to the best of our knowledge, no studies have been done on cancer chemotherapy drug dosing using fuzzy-reinforcement learning-based optimal control. These issues motivated the current study. In this paper, a Caputo–Fabrizio fractional-order model of cancer chemotherapy treatment was analyzed. The existence and uniqueness of the solutions of the presented model were proven. Then, a fuzzy-reinforcement learning-based control method was proposed in order to control the cancer chemotherapy treatment, and the effective performance of the proposed method was illustrated.

The main contributions of this research paper are as follows: First, a Caputo-Fabrizio fractional-order model of cancer chemotherapy treatment was elaborated and analyzed, which is a novel model for this system. Afterwards, a fix-point theorem and an iterative method were implemented to prove the existence and uniqueness of the solutions of the proposed model. Finally, as the last contribution, a fuzzy-reinforcement learning-based control method that uses the SARSA algorithm was proposed to control cancer through chemotherapy treatment.

The rest of the paper has been organized as follows: In Section 2 the preliminaries for this work are given. Later, in Section 3, the Caputo–Fabrizio fractional model of cancer chemotherapy is elaborated, and then sensitivity analysis is done for the parameters of the system. The fuzzy-reinforcement-learning based controller is described in Section 4. Section 5 is devoted to the numerical simulations; finally, in Section 6, conclusions are made and discussed.

2. Preliminaries

As has been shown, the kernels of the Caputo fractional derivative are singular at the endpoint of the interval of integration [83]; therefore, the fractional derivative is not an appropriate kernel to describe the memory effect accurately in real systems. A new fractional derivative has been proposed that has no singularity in its kernels [32]. This section is a summary of the definitions and properties of the Caputo–Fabrizio fractional that have been used in this paper.

Consider

H^{1} (a, b) = {x | x \in L^{2} (a, b)}

and

x^{'} \in L^{2} (a, b)

, where

L^{2} (a, b)

is the space of square integrable functions on interval of

(a, b)

.

Definition 1.

If function

x

is

H^{1} (t_{0}, t)

and derivative order

α \in (0,1)

, then the Caputo–Fabrizio derivative is defined as[32]:

{}_{t_{0}}^{C F}{D_{t}^{α} x (t) = \frac{A (α)}{1 - α} \int_{t_{0}}^{t} x^{'} (z) e^{- α \frac{t - z}{1 - α}}} d z,

(1)

where

A (α)

is a normalization function that satisfies

A (0) = A (1) = 1

. Furthermore, if

x \notin H^{1} (a, b)

, then the Caputo–Fabrizio derivative is defined as:

{}_{t_{0}}^{C F}{D_{t}^{α} x (t) = \frac{α A (α)}{1 - α} \int_{t_{0}}^{t} (x (t) - x (z)) e^{- α \frac{t - z}{1 - α}}} d z,

(2)

Remark 1.

[32]. By considering

β = α / (1 - α)

and

β \in (0, \infty)

, we have

α = 1 / (1 + β)

which

α \in (0, 1)

. Now, Equation (2) can be written in the following form:

{}_{t_{0}}^{C F}{D_{t}^{α} x (t) = \frac{B (β)}{β} \int_{t_{0}}^{t} x^{'} (z) e^{- \frac{t - z}{β}}} d z,

(3)

where

B (β)

is a normalization function corresponding to

A (α)

which

B (0) = B (\infty) = 1

.

Remark 2.

[32]. The following equation is true:

\lim_{β \to 0} \frac{1}{β} e^{- \frac{t - z}{β}} = δ (z - t),

(4)

where

δ (z - t)

is the Dirac delta function.

The modified definition of Caputo-Fabrizio has been proposed by Nieto and Losada [31], and it has been defined as

{}_{t_{0}}^{C F}{D_{t}^{α} x (t) = \frac{(2 - α) A (α)}{2 (1 - α)} \int_{t_{0}}^{t} x^{'} (z) e^{- α \frac{t - z}{1 - α}}} d z,

(5)

Definition 2.

The Caputo–Fabrizio fractional integral of order

α

offunction

x (t)

is defined as:

{}_{0}^{C F}{I_{t}^{α} x (t) = \frac{2 (1 - α)}{(2 - α) A (α)}} x (t) + \frac{2}{(2 - α) A (α)} \int_{0}^{t} x (z) d z, t \geq 0

(6)

Definition 3.

[31]. The fractional Caputo–Fabrizio derivative of order

α

of function

x (t)

is defined as

{}_{t_{0}}^{C F}{D_{t}^{α} x (t) = \frac{1}{1 - α} \int_{t_{0}}^{t} x^{'} (z) e^{- α \frac{t - z}{1 - α}}} d z, t \geq 0 .

(7)

Moreover, the Caputo–Fabrizio fractional integral of order

α

is given as

{}_{0}^{C F}{I_{t}^{α} x (t) = (1 - α) x (t) + α \int_{0}^{t} x (z) d z,} t \geq 0 .

(8)

In this definition, the normalization function has been considered as

A (α) = 2 / (2 - α)

.

3. Caputo–Fabrizio Fractional Model of Cancer Chemotherapy

In this section, in order to derive the Caputo–Fabrizio fractional model of cancer chemotherapy, the nonlinear four-state model given in [47,84,85,86] has been considered. In this model, the first state is

N (t)

, which indicates the number of normal cells; the second variable in this model is

T (t),

and it represents the number of tumor cells. The third state is

I (t)

, which represents the number of immune cells. Finally, the last state is the drug concentration

C (t)

. The original integer-order model [47,87] is given in Equation (9):

\{\begin{matrix} \dot{N} (t) = r_{2} N (t) [1 - b_{2} N (t)] - c_{4} N (t) T (t) - a_{3} N (t) C (t), \\ \dot{T} (t) = r_{1} T (t) [1 - b_{1} T (t)] - c_{2} T (t) I (t) - c_{3} N (t) T (t) - a_{2} T (t) C (t), \\ \binom{\dot{I} (t) = s + \frac{λ T (t) I (t)}{γ + T (t)} - c_{1} T (t) I (t) - d_{1} I (t) - a_{1} I (t) C (t),}{\dot{C} (t) = - d_{2} C (t) + u (t) .} \end{matrix}

(9)

The initial conditions for this model are assumed to be as follows:

N (0) = N_{0}, T (0) = T_{0}, I (0) = I_{0}, C (0) = C_{0} .

(10)

in which

r_{1}

and

r_{2}

are the growth rate of the tumor cells and normal cells, respectively;

b_{1}

and

b_{2}

denote the reciprocal carrying capacities of both cells;

d_{1}

is cell death rate,

d_{2}

stands for the decay rate of the injected drug; and the fractional cell kill rate of the immune cells, tumor cells, and normal cells are represented by

a_{1}

,

a_{2}

and

a_{3}

respectively; also in this model,

λ

and

γ

are positive constants which stand for the immune response rate and the immune threshold rate, respectively. The influx rate of the immune cells is represented by s, and

u (t)

is the drug infusion rate.

Replacing the first-order derivatives on the left-hand side of Equation (9) with the Caputo–Fabrizio fractional derivative that has been defined in Equation (7) leads to our new Caputo–Fabrizio fractional model of cancer chemotherapy. The new model is written in the following equations:

\{\begin{matrix} {}_{0}^{C F}{D_{t_{f}}^{α_{1}} N (t)} = r_{2} N (t) [1 - b_{2} N (t)] - c_{4} N (t) T (t) - a_{3} N (t) C (t), \\ {{}_{0}^{C F}{D_{t_{f}}^{α_{2}} T (t)} = r}_{1} T (t) [1 - b_{1} T (t)] - c_{2} T (t) I (t) - c_{3} N (t) T (t) - a_{2} T (t) C (t), \\ \begin{matrix} {}_{0}^{C F}{D_{t_{f}}^{α_{3}} I (t)} = s + \frac{λ T (t) I (t)}{γ + T (t)} - c_{1} T (t) I (t) - d_{1} I (t) - a_{1} I (t) C (t), \\ {}_{0}^{C F}{D_{t_{f}}^{α_{4}} C (t)} = - d_{2} C (t) + u (t) . \end{matrix} \end{matrix}

(11)

Moreover, the initial condition for this system is:

N (0) = N_{0}, T (0) = T_{0}, I (0) = I_{0}, C (0) = C_{0}

(12)

In our new model of cancer chemotherapy, it has been assumed that the fraction-order of each of the state variables is theoretically different, and

{0 < α}_{i} < 1, i = 1,2, \dots, 4

.

3.1. Existence and Uniqueness of Solutions of the Cancer Chemotherapy Model

This section is devoted to the investigation of the uniqueness and existence of the solutions of our Caputo–Fabrizio fraction model of cancer chemotherapy in Equation (11) with the initial conditions mentioned in Equation (12). To seek this goal, the fix-point theory has been used [88,89].

In the light of the Caputo-Fabrizio fractional order integral operator, defined in Equation (8) and taking the initial conditions in Equation (12) into consideration, in using Equation (12) the following equation will be obtained:

\{\begin{matrix} N (t) - N (0) = {}_{0}^{C F}{I_{t_{f}}^{α_{1}}} {[r}_{2} N (t) [1 - b_{2} N (t)] - c_{4} N (t) T (t) - a_{3} N (t) C (t)], \\ T (t) - T (0) = {}_{0}^{C F}{I_{t_{f}}^{α_{2}}} [r_{1} T (t) [1 - b_{1} T (t)] - c_{2} T (t) I (t) - c_{3} N (t) T (t) - a_{2} T (t) C (t)], \\ \begin{matrix} I (t) - I (0) = {}_{0}^{C F}{I_{t_{f}}^{α_{3}}} [s + \frac{λ T (t) I (t)}{γ + T (t)} - c_{1} T (t) I (t) - d_{1} I (t) - a_{1} I (t) C (t)], \\ C (t) - C (0) = {}_{0}^{C F}{I_{t_{f}}^{α_{4}}} [- d_{2} C (t) + u (t)] . \end{matrix} \end{matrix}

(13)

Then, the following kernels are defined:

\{\begin{matrix} K_{1} (t, N (t)) = r_{2} N (t) [1 - b_{2} N (t)] - c_{4} N (t) T (t) - a_{3} N (t) C (t) \\ K_{2} (t, T (t)) = r_{1} T (t) [1 - b_{1} T (t)] - c_{2} T (t) I (t) - c_{3} N (t) T (t) - a_{2} T (t) C (t), \\ \begin{matrix} K_{3} (t, I (t)) = s + \frac{λ T (t) I (t)}{γ + T (t)} - c_{1} T (t) I (t) - d_{1} I (t) - a_{1} I (t) C (t), \\ K_{4} (t, C (t)) = - d_{2} C (t) + u (t) . \end{matrix} \end{matrix}

(14)

By taking Equation (14) into consideration and then calculating the right-hand side of Equation (13) using the definition of the Caputo–Fabrizio fractional-order integration in Equation (6), the following equation is obtained:

\{\begin{matrix} N (t) - N (0) = Σ (α_{1}) K_{1} (t, N (t)) + σ (α_{1}) \int_{0}^{t} K_{1} (v, N (v)) d v, \\ T (t) - T (0) = Σ (α_{2}) K_{2} (t, T (t)) + σ (α_{2}) \int_{0}^{t} K_{2} (v, T (v)) d v, \\ \begin{matrix} I (t) - I (0) = Σ (α_{3}) K_{3} (t, I (t)) + σ (α_{3}) \int_{0}^{t} K_{3} (v, I (v)) d v, \\ C (t) - C (0) = Σ (α_{4}) K_{4} (t, C (t)) + σ (α_{4}) \int_{0}^{t} K_{4} (v, I (v)) d v \end{matrix} \end{matrix}

(15)

where

Σ (α)

and

σ (α)

are defined as follows:

Σ (α) = \frac{2 (1 - α)}{(2 - α) A (α)} a n d σ (α) = \frac{2 α}{(2 - α) A (α)} .

(16)

Remark 3.

The above-mentioned kernels

K_{1}

,

K_{2}

,…,

K_{4}

satisfy the Lipschitz conditions and are contraction mapping if the following inequality satisfies

0 \leq L = \max \{δ_{1}, δ_{2}, δ_{3}, δ_{4}\} < 1

(17)

In the proof of Remark 3 the following assumption has been made:

‖N (t)‖ \leq θ_{1}, ‖T (t)‖ \leq θ_{2}, ‖I (t)‖ \leq θ_{3} and ‖C (t)‖ \leq θ_{4} .

(18)

Proof.

Let

N

and

N_{1}

be two different functions; then, by taking the kernel

K_{1}

into consideration, we have

\begin{array}{l} || K_{1} (t, N) - K_{1} (t, N_{1}) || \\ = || r_{2} (N - N_{1}) [1 - b_{2} (N - N_{1})] - c_{4} (N - N_{1}) T (t) \\ - a_{3} (N - N_{1}) C (t) || \\ \leq w_{1} ‖(N - N_{1})‖ - w_{2} ‖(N - N_{1})‖ - w_{3} ‖(N - N_{1})‖ \\ \leq (w_{1} - w_{2} - w_{3}) ‖(N - N_{1})‖ \leq δ_{1} ‖(N - N_{1})‖ \end{array}

(19)

For the second kernel we have:

\begin{array}{l} ‖K_{2} (t, T) - K_{2} (t, T_{1})‖ \\ = || r_{1} (T - T_{1}) [1 - b_{1} (T - T_{1})] - c_{2} (T - T_{1}) I (t) \\ - c_{3} (T - T_{1}) N (t) || \\ \leq a_{1} ‖(T - T_{1})‖ - a_{2} ‖(T - T_{1})‖ - a_{3} ‖(T - T_{1})‖ \\ \leq (a_{1} - a_{2} - a_{3}) ‖(T - T_{1})‖ \leq δ_{2} ‖(T - T_{1})‖ \end{array}

(20)

Additionally, for the third and fourth kernels,

K_{3}

and

K_{4}

, the following inequalities can be obtained:

\begin{array}{l} ‖K_{3} (t, I) - K_{3} (t, I_{1})‖ \\ = ‖\frac{λ T (I - I_{1})}{γ + T (t)} - c_{1} T (t) (I - I_{1}) - d_{1} (I - I_{1}) - a_{1} (I - I_{1}) C (t)‖ \\ \leq \frac{λ θ_{2}}{γ + θ_{2}} ‖(I - I_{1})‖ - c_{1} θ_{2} ‖(I - I_{1})‖ - d_{1} ‖(I - I_{1})‖ \\ - a_{1} θ_{4} ‖(I - I_{1})‖ \leq (m_{1} - m_{2} - m_{3} - m_{4}) ‖(I - I_{1})‖ \\ \leq δ_{3} ‖(T - T_{1})‖ \end{array}

(21)

and

‖K_{4} (t, C) - K_{4} (t, C_{1})‖ = ‖- d_{2} (C - C_{1})‖ \leq d_{2} ‖(C - C_{1})‖

(22)

Consequently, the Lipschitz conditions are satisfied for all of the kernels defined in Equation (14). Furthermore, because of

0 \leq L = \max \{δ_{1}, δ_{2}, δ_{3}, δ_{4}\} < 1

, the kernels are contractions.

By considering Equation (16), the following recursive formula can be obtained:

\begin{matrix} N_{n} (t) = Σ (α_{1}) K_{1} (t, N_{n - 1} (t)) + σ (α_{1}) \int_{0}^{t} K_{1} (v, N_{n - 1} (v)) d v, \\ T_{n} (t) = Σ (α_{2}) K_{2} (t, T_{n - 1} (t)) + σ (α_{2}) \int_{0}^{t} K_{2} (v, T_{n - 1} (v)) d v, \\ \begin{matrix} I_{n} (t) = Σ (α_{3}) K_{3} (t, I_{n - 1} (t)) + σ (α_{3}) \int_{0}^{t} K_{3} (v, I_{n - 1} (v)) d v, \\ C_{n} (t) = Σ (α_{4}) K_{4} (t, C_{n - 1} (t)) + σ (α_{4}) \int_{0}^{t} K_{4} (v, I_{n - 1} (v)) d v . \end{matrix} \end{matrix}\} .

(23)

Moreover, the initial components for the above-mentioned recursive formals are considered to be as the following equation:

N_{0} (t) = N_{0}, T_{0} (t) = T_{0}, I_{0} (t) = I_{0}, C_{0} (t) = C_{0}

(24)

Equation (23) can be written in the following formation:

N_{n} (t) = \sum_{i = 1}^{n} μ_{i} (t), T_{n} (t) = \sum_{i = 1}^{n} ξ_{i} (t), I_{n} (t) = \sum_{i = 1}^{n} ω_{i} (t), C_{n} (t) = \sum_{i = 1}^{n} κ_{i} (t) .

(25)

where

μ_{i} (t)

,

ξ_{i} (t)

,

ω_{i} (t)

and

κ_{i} (t)

are defined as follows:

\begin{matrix} μ_{i} (t) = N_{i} (t) - N_{i - 1} (t), ξ_{i} (t) = T_{i} (t) - T_{i - 1} (t), \\ ω_{i} (t) = I_{i} (t) - I_{i - 1} (t), κ_{i} (t) = C_{i} (t) - C_{i - 1} (t) . \end{matrix}\}

(26)

Now, the following inequalities can be derived for the functions defined in Equation (26):

\begin{array}{l} ‖μ_{n} (t)‖ = ‖N_{i} (t) - N_{i - 1} (t)‖ \\ \leq Σ (α_{1}) ‖(K_{1} (t, N_{n - 1} (t)) - K_{1} (t, N_{n - 2} (t)))‖ \\ + σ (α_{1}) \int_{0}^{t} ‖(K_{1} (v, N_{n - 1} (v)) - K_{1} (v, N_{n - 2} (v)))‖ d v \end{array}

(27)

Equation (19) showed that the kernel

K_{1}

satisfies the Lipschitz condition. Therefore, for Equation (27) we have:

‖μ_{n} (t)‖ \leq Σ (α_{1}) δ_{1} ‖N_{n - 1} (t) - N_{n - 2} (t)‖ + σ (α_{1}) δ_{1} \int_{0}^{t} ‖N_{n - 1} (v) - N_{n - 2} (v)‖ d v \leq Σ (α_{1}) δ_{1} ‖μ_{n - 1} (t)‖ + σ (α_{1}) δ_{1} \int_{0}^{t} ‖μ_{n - 1} (v)‖ d v

(28)

The same is true for the other defined functions in Equation (26); for this reason, we can obtain the following inequalities readily:

\begin{matrix} ‖ξ_{n} (t)‖ \leq Σ (α_{2}) δ_{2} ‖ξ_{n - 1} (t)‖ + σ (α_{2}) δ_{2} \int_{0}^{t} ‖ξ_{n - 1} (v)‖ d v, \\ ‖ω_{n} (t)‖ \leq Σ (α_{3}) δ_{3} ‖ω_{n - 1} (t)‖ + σ (α_{3}) δ_{3} \int_{0}^{t} ‖ω_{n - 1} (v)‖ d v, \\ ‖κ_{n} (t)‖ \leq Σ (α_{4}) δ_{4} ‖κ_{n - 1} (t)‖ + σ (α_{4}) δ_{4} \int_{0}^{t} ‖κ_{n - 1} (v)‖ d v . \end{matrix}\},

(29)

□

Remark 4.

The Caputo–Fabrizio fractional model of cancer chemotherapy (Equation (11)) has a system of solutions if following inequalities hold at a time such that

t_{1} > 0

:

Σ (α_{i}) δ_{i} + σ (α_{i}) δ_{i} t_{1} < 1 f o r i = 1, 2, 3, 4

(30)

Proof.

In this part, we have proven the existence and smoothness of the defined functions in Equation (25), the existence of a system of solutions for the model in Equation (11), and Equation (12) has been illustrated.

It has been assumed that

N (t)

,

T (t)

,

I (t)

and

C (t)

are bounded

‖N (t)‖ \leq θ_{1}

,

‖T (t)‖ \leq θ_{2}

,

‖I (t)‖ \leq θ_{3}

and

‖C (t)‖ \leq θ_{4}

. Moreover, we have proven that each of the defined kernels satisfies the Lipschitz condition. Consequently, we can obtain the following results using Equations (28) and (29):

\begin{matrix} ‖μ_{n} (t)‖ \leq ‖N (0)‖ {[Σ (α_{1}) δ_{1} + σ (α_{1}) δ_{1} t]}^{n}, \\ ‖ξ_{n} (t)‖ \leq ‖T (0)‖ {[Σ (α_{2}) δ_{2} + σ (α_{1}) δ_{2} t]}^{n}, \\ \begin{matrix} ‖ω_{n} (t)‖ \leq ‖I (0)‖ {[Σ (α_{3}) δ_{3} + σ (α_{3}) δ_{3} t]}^{n}, \\ ‖κ_{n} (t)‖ \leq ‖C (0)‖ {[Σ (α_{4}) δ_{4} + σ (α_{4}) δ_{4} t]}^{n} . \end{matrix} \end{matrix}\} .

(31)

Now, it has been demonstrated that the functions

N_{n} (t)

,

T_{n} (t)

,

I_{n} (t)

and

C_{n} (t)

defined in Equation (25) converge to solutions of the model (Equation (11)) with the initial condition (Equation (12)), by defining remainder terms after n iteration as follows:

\begin{matrix} N (t) - N (0) = N_{n} (t) - X_{n} (t), \\ {T (t) - T (0) = T}_{n} (t) - Y_{n} (t), \\ \begin{matrix} {I (t) - I (0) = I}_{n} (t) - Z_{n} (t) \\ C (t) - C (0) = C_{n} (t) - V_{n} (t) . \end{matrix} \end{matrix}\} .

(32)

To prove that the functions

N_{n} (t)

,

T_{n} (t)

,

I_{n} (t)

and

C_{n} (t)

converge to the solutions of the model (Equation (11)), we must show that as

n \to \infty

, then the reminder term converges to zero. Using the Lipschitz condition for the kernels defined in Equation (14) and triangle inequality, the following results will be obtained:

\begin{array}{l} ‖X_{n} (t)‖ = \leq Σ (α_{1}) ‖(K_{1} (t, N (t)) - K_{1} (t, N_{n - 1} (t)))‖ \\ + σ (α_{1}) \int_{0}^{t} ‖(K_{1} (v, N (v)) - K_{1} (v, N_{n - 1} (v)))‖ \\ \leq Σ (α_{1}) δ_{1} ‖N_{n - 1} (t) - N_{n - 2} (t)‖ + σ (α_{1}) δ_{1} ‖N_{n - 1} (t) - N_{n - 2} (t)‖ t . \end{array}

(33)

Continuing the above process will lead us to the following inequality:

‖X_{n} (t)‖ \leq {[Σ (α_{1}) δ_{1} + σ (α_{1}) δ_{1} t]}^{n + 1} θ_{1}

(34)

Now, by taking Equation (30) into account and when

n \to \infty

at time

t_{1}

, by taking a limit on both sides of Equation (34) the following result will be acquired:

\lim_{n \to \infty} ‖X_{n} (t)‖ \leq \lim_{n \to \infty} {[Σ (α_{1}) δ_{1} + σ (α_{1}) δ_{1} t]}^{n + 1} θ_{1} .

(35)

It can be seen that the right-hand side of Equation (35) converges to zero; for this reason, it can be concluded that

‖X_{n} (t)‖ \to 0

when

n \to \infty

. Using the same manner for other remainder terms defined in Equation (32), the following results will be obtained:

\lim_{n \to \infty} ‖Y_{n} (t)‖ = 0, \lim_{n \to \infty} ‖Z_{n} (t)‖ = 0, \lim_{n \to \infty} ‖V_{n} (t)‖ = 0

(36)

Equations (35) and (36) demonstrate the existence of a system of solutions for the model in Equations (11) and (12). □

Remark 5.

The system of solutions of the model in Equations (11) and (12) will be unique if the following inequality is satisfied:

Σ (α_{i}) δ_{i} + σ (α_{i}) δ_{i} t < 1 f o r i = 1, 2, 3, 4 .

(37)

Proof.

To prove that the model described in Equation (11) with initial conditions in Equation (12) has a unique system of solutions, we have assumed that the first system of solutions of the model is {

N (t), T (t), I (t), C (t)

}. Furthermore, we have considered another system of solutions for the model, which is denoted by {

N_{1} (t), T_{1} (t), I_{1} (t), C_{1} (t)

}. Then, using Equation (16), we have

\begin{array}{l} N (t) - N_{1} (t) = Σ (α_{1}) (K_{1} (t, N (t)) - K_{1} (t, N_{1} (t))) \\ + σ (α_{1}) \int_{0}^{t} (K_{1} (v, N (v)) - K_{1} (v, N_{1} (v))) d v \end{array}

(38)

Then, we have

\begin{array}{l} ‖N (t) - N_{1} (t)‖ = \\ \leq Σ (α_{1}) ‖(K_{1} (t, N (t)) - K_{1} (t, N_{1} (t)))‖ \\ + σ (α_{1}) ‖\int_{0}^{t} (K_{1} (v, N (v)) - K_{1} (v, N_{1} (v))) d v‖ . \end{array}

(39)

As we proved, the kernels satisfy the Lipschitz conditions; as a result,

‖N (t) - N_{1} (t)‖ \leq Σ (α_{1}) δ_{1} ‖N (t) - N_{1} (t)‖ + σ (α_{1}) δ_{1} ‖N (t) - N_{1} (t)‖ t

(40)

By subtracting the right-hand side of Equation (40) from both side of this inequality, the following inequality is given:

(1 - Σ (α_{1}) δ_{1} - σ (α_{1}) δ_{1} t) ‖N (t) - N_{1} (t)‖ \leq 0

(41)

Considering Equation (37) and the fact that the output of the norm function is non-negative, the following result will be obtained:

‖N (t) - N_{1} (t)‖ = 0

(42)

In the same way for

T (t)

,

I (t)

,

C (t)

, the following equations have been obtained:

\begin{matrix} ‖T (t) - T_{1} (t)‖ = 0, \\ ‖I (t) - I_{1} (t)‖ = 0, \\ ‖C (t) - C_{1} (t)‖ = 0 . \end{matrix}\},

(43)

Equations (42) and (43) imply

N (t) = N_{1} (t), T (t) = T_{1} (t), I (t) - I_{1} (t), C (t) = C_{1} (t)

(44)

Equation (44) shows that the system of solution for the model in Equation (11) with the initial conditions in Equation (12) is unique, and this is the end of the proof. □

3.2. Sensitivity Analysis

In this section, a sensitivity analysis for the chemotherapy drug-using system given in Equation (11) is conducted. The parameters of the system which have been used for analysis are given in Table 1. The results of the sensitivity analysis are given in Figure 1.

The relationship between the patient’s parameter variations is elucidated in Figure 1. Conspicuously, per-unit growth rate of tumor cells (

r_{1})

, immune cell influx rate (

s

), and tumor cell competition term (competition between normal and tumor cells) (

c_{3})

are three parameters whose variations affect the number of normal cells the most. There are four parameters whose variations have the most impact on the number of tumor cells, of which three of them are the same as the ones for normal cells, and the other parameter is the reciprocal carrying capacity of normal cells (

b_{2}

). Moreover, the number of immune cells is highly affected by the variation of the per-unit growth rate of tumor cells (

r_{1}

), the reciprocal carrying capacity of normal cells (

b_{2}

), and the tumor cell competition term (competition between normal and tumor cells) (

c_{3}

).

4. Methodology

In this section, we have described the proposed fuzzy-reinforcement-learning based controller (FRLC) whose aim is to control the number of tumor cells; in other words, the aim of the proposed controller is to reach the desired number of tumor cells

T (t) = T_{d e s i r e}

from a non-zero initial number of tumor cells

T (0) > 0

.

Expected-SARSA is a variation of SARSA wherein the variance in its update rule is decreased in comparison with that of the SARSA algorithm, and it has better performance than SARSA for online applications that can be considered as an on-policy version of Q-Learning. On-policy methods have some advantages over off-policy learning algorithms, such as stronger convergence guarantees in cases in which it is combined with function approximation, while the off-policy methods can diverge in those cases [90,91,92]. They have outstanding performance in online applications because the policy that is estimated in on-policy ways will be improved iteratively, and the agent will behave based on this policy.

The advantage of data-driven controllers is that an accurate model of the system is not required, while selecting the best action for each rule in the fuzzy interface system demands accurate knowledge about the system. However, the proposed control method learns the best action for each rule; therefore, it has the advantages of data-driven controllers and fuzzy controllers simultaneously.

Fuzzy Controller Based on Expected SARSA Learning (FESL)

In this paper, in order to control the number of tumor cells, a fuzzy controller is proposed of which its learning process is based on the expected SARSA algorithm. The fuzzy controller uses Expected-SARSA algorithms in order to find the best map between state and actions. In other words, using the Expected-SARSA algorithm, the best action for each of the fuzzy system inference’s rules will be obtained. Consider the rules in a fuzzy inference system be as follows:

R i : if x_{1} is L_{1 i} and \dots and x_{n} is L_{n i} then (a_{i} is a_{i 1}) or (a_{i} is a_{i 2}) or \dots or (a_{i} is a_{i k}),

in which Ri is the i-th rule of the fuzzy inference system, and

s

is the vector of the n-dimensional input state and is defined as

s = x_{1} \times x_{2} \times \dots \times x_{n}

;

L_{i}

is the n-dimensional strictly convex and normal fuzzy set of the i-th rule with a unique center and is defined as

L_{i} = L_{i 1} \times L_{i 2} \times \dots \times L_{i n}

;

a_{i}

is the consequent action for the i-th rule;

a_{i 1}

is the first candidate action for the i-th rule;

a_{i 2}

is the second candidate action; and finally,

a_{i k}

is the kth candidate action for the i-th rule. The aim of the learning is to find the optimal action for each of the rules of the fuzzy inference system. So as to seek this aim, a value action matrix has been defined and is denoted by

Q

, of which its elements are the value action of each candidate action for each of the rules. The optimal action for each of the rules is the candidate action for that rule, of which its value action is the most among all the candidate actions of that rule.

As aforementioned, the best consequent action for each of the rules will be obtained using reinforcement learning methods. Figure 2 depicts the structure of reinforcement learning. In this paper, the reinforcement learning algorithm that has been implemented to update value actions is the Expected-SARSA method.

The algorithm used in this paper has been delineated in Algorithm 1. The function that is used for calculating the reward of the agent for a transition from state

s_{k}

to state

s_{k + 1}

is as follows:

r_{k + 1} = \{\begin{matrix} \frac{e (k T) - e ((k + 1) T)}{e (k T)}, e ((k + 1) T) < e (k T) \\ 0 e ((k + 1) T) < e (k T) \end{matrix}

(45)

where

e (t), t \geq 0

is defined as:

e (t) = β (T (t) - T_{d} (t)) + (1 - β) (N (t) - N_{d} (t)) .

(46)

The goal of the control agent in reinforcement learning (RL) is to find an optimal policy

π^{*}

; using this policy, the expected discount reward would be maximum. The discount reward is defined as follows:

R_{t} = r_{t + 1} + δ r_{t + 2} + δ^{2} r_{t + 3} + \dots = \sum_{k = 0}^{\infty} δ^{k} r_{t + k + 1}

(47)

where

δ

is discount factor and is a non-negative constant, which satisfies

δ \leq 1

.

The output action of the fuzzy inference system is calculated as

a = \sum_{i = 1}^{n} η_{i} a_{i p} / \sum_{i = 1}^{n} η_{i}

(48)

where

n

is the number of fuzzy rules in the fuzzy inference,

η_{i}

is the firing strength of the i-th rule, and

a_{i p}

is the selected action using the 𝜀-greedy method for the i-th rule among all possible actions. Updating the rule for FESL is considered to be as

Q_{t} [i, p] \leftarrow Q_{t - 1} [i, p] + α \frac{η_{i}}{\sum_{i = 1}^{n} η_{i}} γ_{t}

(49)

where

α

denotes the learning rate of the algorithm. In Equation (49),

γ_{t}

is defined as:

γ_{t} = r_{t + 1} + δ [\sum_{k = 1}^{n} \frac{η_{k}}{\sum_{t = 1}^{n} η_{t}} \sum_{j = 1}^{m} π_{s^{'}} [k, j] Q_{t} [k, j]] - Q_{t - 1} [i, p] .

(50)

in which

δ

is a discount factor,

π_{s^{'}} [k, j]

is the probability of selecting the j-th action for the k-th rule that has been obtained using 𝜀-greedy method, and

Q_{t} [k, j]

is an approximation of the value of the j-th action for the k-th rule.

Algorithm 1 Fuzzy Expected SARSA

1: Initialize Q-table
2: Loop {for all of episodes}
3: Initialize state s
4: repeat {for each step in episode}
5: calculate firing strength of each rule (α_i) at state s
6: choose action (α_i) for each of the rules at state s using policy π
7: calculate action a at state s
8: take action a, observe reward r and next state s′
9: calculate state value of state s′
10: update Q-table
11: until s is terminal state
12: end loop

One of the key features of FRL-based controllers is their ability to handle uncertainty and imprecision in the system. Fuzzy logic allows the controller to make decisions based on approximate rather than precise values, which can be helpful in situations where the system is not well understood or where there is a high degree of variability. In addition to its ability to handle uncertainty, an FRL-based controller can also learn from its past experiences and adapt its behavior to better achieve the desired outcomes. This learning process is known as reinforcement learning, and it allows the controller to optimize its performance over time by adjusting its control inputs based on the consequences of those inputs. Overall, the combination of fuzzy logic and reinforcement learning in an FRL-based controller can make it well-suited for handling unexpected situations and adapting to a range of conditions.

5. Numerical Simulations

In this section, the performance of the FESL method for controlling the closed-loop system of cancer chemotherapy drug dosing, of which its input variable is considered to be a chemotherapy drug such as carboplatin, is assessed through simulations that are done in MATLAB. Oncologists consider different factors when they determine the drug dosage for a cancer patient. Age and gender are two examples of those factors. As has been shown in [87], the growth rate of normal cells and immune cells depends on age, and this rate will be larger for younger patients rather than elderly patients. Therefore, for young patients, oncologists prefer to eradicate cancerous cells as soon as possible without taking the damage to normal cells into consideration, which avoids cancer metastasis. In the current study, we assumed the value of fractional derivative as is mentioned in Table 1. However, estimating the fractional derivative parameter in the same way as the other parameters helps to ensure that the model is as accurate and reliable as possible. This can be done using a variety of techniques such as fitting the model to experimental data or using statistical methods to optimize the model parameters. By taking the time to accurately estimate all of the model parameters, researchers can be more confident in the results of their study and the conclusions they draw from the data.

For the cases in which the patient is elderly, the patient suffers from other diseases, or the cancer is in a vital organ such as the brain, the degeneration of the normal cells is not desirable, and the normal cells should be kept undamaged. The above-mentioned conditions have been taken into consideration by the proposed control method through the defining error in Equation (46).

For this reason, two scenarios have been considered to evaluate the performance of the proposed control method in the abovementioned cases. In scenario “A”, it has been considered that the patient is young. On the other hand, the simulation of scenario “B” has been conducted for an elderly patient. Finally, the results of the simulations for the proposed control method have been compared with those of Watkin’s Q-learning method for both scenarios.

The simulated patient’s parameters are considered to be the same as the ones given in Table 1 [87]; in [93], it was shown how the parameters that are given in Table 1 can be obtained. The number of episodes that have been considered for simulations is considered to be 500, of which each of the episodes is defined as a set of transitions from the initial state to the terminal state. For the simulations, 20 fuzzy-sets have been considered for the fuzzy system, and the membership functions used for the fuzzy sets are considered to be trapezoidal-shape and z-shape membership functions, which have been shown in Figure 3. Table 2 implicates the type of membership function used for each fuzzy-set and their features. Furthermore, in the simulations, we have set

δ = 0.9

, and the learning rate is considered to be

α = 0.2

and should decrease as the number of iterations and episodes increase so as to guarantee the convergence of the learning algorithm.

In Table 2, zmfl denotes the z-shape membership function for the lower bound; trapmf and zmfh stand for trapezoidal-shape and z-shape membership function for the upper bound respectively. Variables “a”, “b”, “c”, and “d” are illustrated in Figure 3.

During simulations, learning algorithms should explore the best actions in order to find the best consequent action for each rule. Keeping this in mind, the 𝜀 value has been considered to be a small number; then, as the number of step-times and the number of episodes increases, the 𝜀 value in the 𝜀-greedy method increases in order to reach a greedy policy. The 𝜀 has been calculated using the following formula:

ε = 0.3 (0.5 + \frac{I}{M N I}) + 0.25 (1 + 3 \frac{E}{M N E})

(51)

where

I

is the iteration number or the time-step number,

M N I

is the maximum number of iterations,

E

is the episode number, and

M N E

is the maximum number of episodes; in the simulations, the maximum number of iterations is chosen as 500. The candidate actions for each of the rules are considered to be

a_{i} \in {0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.8, 1, 1.5, 2, 2.5, 3, 3.5, 4, 5, 6, 7, 8, 9, 10}

. Moreover, the initial value of the state variables of the system model in Equation (11) are considered to be

N (0) = 0.6, T (0) = 0.8, I (0) = 0.3, C (0) = 0.2 .

(52)

5.1. Scenario A

In this scenario, first, the simulations have been conducted for the patient system, which has no uncertainty. Furthermore, the aim is to reach

T = 0

; to put it differently, the controller must take the system from the initial state to the first fuzzy set. Therefore, for the simulation, we have set

β = 1

, which means in this scenario the error will be

(t) = (T (t) - T_{d} (t))

.

The result of the simulation is given in Figure 4 and is comprised of the number of normal cells in (a), the number of tumor cells in (b), the number of immune cells in (c), the concentration of the chemotherapeutic drug in blood (d), and the amount of chemotherapeutic drug executed for scenario “A” (d). As is evident, the number of tumor cells decreased consistently until it reached zero, which means that the proposed controller was able to obliterate all of the tumor cells. However, due to injecting a chemotherapeutic drug into the body of the patient, the number of normal cells and immune cells decreased at the beginning of the chemotherapy. Nonetheless, later, the numbers of normal cells and immune cells increased.

The simulation confirms that the chemotherapy treatment was successful in eliminating all of the tumor cells, but it caused a decrease in the number of normal cells and immune cells at the beginning of treatment. This is a common occurrence with chemotherapy, as the drugs used to kill cancer cells can also damage healthy cells. However, it is also possible for the body to recover and for the number of normal cells and immune cells to increase again after treatment.

It is important to carefully consider the potential benefits and risks of chemotherapy treatment, as it can be an effective way to treat cancer; however, it can also cause significant side effects. It is generally recommended to undergo chemotherapy under the supervision of a medical professional who can help to monitor the patient’s condition and adjust the treatment plan as needed.

So as to prove the robustness of the proposed control method, after training the control agent using the reinforcement learning algorithm, the parameters of the patients were varied during simulation. The parameters of the modified patient system were considered to have a 10% variation from their original values, which are given in Table 1. The result of the simulation is depicted in Figure 5.

Figure 5 has the same sub-figures as Figure 4. As can be seen, the system has shown the same behavior as the system without uncertainty. Obviously, the controller has shown its ability to eradicate the tumor cells in the presence of the uncertainty in the parameters.

The obtained results are remarkable because the following main achievements:

The effectiveness of the proposed controller in eliminating the tumor cells: The results show that the controller was able to successfully reduce the number of tumor cells to zero, which indicates that the treatment was effective in destroying the cancer cells.
The temporary decrease in normal cells and immune cells: The chemotherapy used in the treatment caused a temporary decrease in the number of normal cells and immune cells in the body. This is a common side effect of chemotherapy, as the drugs used can also harm healthy cells.
The recovery of normal cells and immune cells: Despite the initial decrease in normal cells and immune cells, the numbers of these cells eventually increased over time. This suggests that the body was able to recover and rebuild healthy cells after the chemotherapy treatment.
The importance of monitoring the effects of chemotherapy: The results of the simulation highlight the importance of carefully monitoring the effects of chemotherapy to ensure that it is being administered effectively and safely. This includes monitoring the number of cancer cells, normal cells, and immune cells in the body, as well as the concentration of chemotherapy drugs in the blood.

5.2. Scenario B

This scenario has been considered to assess the performance of the proposed control method in control of the tumor-cell state in old patients. In this scenario, the aim is to control both tumor-cells and normal-cells. Consequently, in this scenario, it has been considered that

β = 0.95

, which means the error in this scenario will be

e (t) = 0.95 (T (t) - T_{d} (t)) + 0.05 (N (t) - N_{d} (t))

; as can be seen, the number of normal cells has been taken into consideration in this scenario. The result of the simulations for this scenario is illustrated in Figure 6.

By comparing Figure 4 and Figure 6, it can be confirmed that in the case in which the patient is elderly, the number of normal cells has reached its maximum faster. Moreover, the 2-norm of the amount of drug executed to the patient’s body in the case that the patient is elderly decreased by 5.5 percent.

It is remarkable that the proposed control method was able to reach the maximum number of normal cells and use a smaller amount of chemotherapy drugs more quickly in the case of an elderly patient compared to a non-elderly patient. This suggests that the control method was able to minimize the negative effects of the chemotherapy on healthy cells while still effectively treating the cancer. This is important because chemotherapy can have significant side effects on the body, and minimizing these effects is especially important for elderly patients who may be more sensitive to these effects.

In addition, in this scenario, in order to confirm the robustness of the proposed control method for elderly patients, the simulation was performed for the patient system with 10% uncertainty for each of the system parameters, and the results are given in Figure 7.

It is notable that the control method was able to maintain its effectiveness even when the system parameters were varied by 10%. This indicates that the control method is robust and can be applied effectively to elderly patients with a range of different characteristics. This is important because elderly patients may have a wide range of health conditions and characteristics that can affect their response to chemotherapy.

Figure 7 demonstrates the robustness of the proposed control method against the system’s parameter variation for an elderly patient. The results of the simulations for the elderly patient in both cases show the effectiveness of the proposed control method. Overall, these findings suggest that the proposed control method could be a valuable tool for improving the treatment of elderly patients with chemotherapy. They could help to minimize the negative effects of chemotherapy on healthy cells and provide a more effective treatment for cancer.

5.3. Scenario C

In this scenario, the proposed control method and Watkin’s Q-learning method were exerted to 10 different patients. So as to evaluate the performance of the proposed control method, the simulation results for the proposed control method and the Watkin’s Q-learning method are elucidated in Table 3. The patient’s parameters are considered to be as follows:

a_{i} \in (0.1, 0.5]

for

i = 1, 2, 3

,

a_{i} \in [0.3, 1]

,

i = 1, 2, 3, 4

,

d_{1} \in [0.15, 0.3]

,

r_{1} \in [1.2, 1.6]

,

r_{1} \in [0.3, 0.5], γ \in [0.3, 0.5]

and

λ \in [0.01, 0.05]

; other parameters are the same as ones given in Table 1. Furthermore, the parameters must hold the following conditions [47]:

a_{3} \leq a_{1} \leq a_{2}

and

b_{1} \geq b_{2} = 1

.

Conspicuously, as can be seen in Table 3, the 2-norm of the input signal for the proposed control method is less than that of the Watkin’s Q-learning method; that means that the proposed control method used less drug than the conventional controller. In any case, the number of episodes for the proposed control method is less than 500 episodes, and it is significantly less than the required number of episodes for the Watkins Q-learning algorithm’s convergence, which is 50,000 [57]; therefore, the proposed control method is 100 times faster than the Watkins Q-learning algorithm in term of convergence rate.

Table 3 implies that the 2-norm of the error in the case in which the proposed control method was used decreased by 35 percent for scenario “A” and 24 percent for scenario “B”, in comparison with the case in which the Watkin’s Q-learning algorithm was used. In any case, the 2-norm of the variance of the error and amount of drug usage decreased by 86 percent and 1 percent, respectively, for scenario “A” and 83 percent and 10 percent for scenario “B”.

It is noteworthy to mention that the proposed control method can be exerted by clinicians to find the optimal amount of drug based on the patient’s state to eradicate tumor cells. To this aim, clinicians must first obtain the patients’ parameters by measuring the patient’s state and fit the logged data to the mathematical model of the patient given in Equation (11). Then, using the mathematical model, the control agent should be pre-trained. Afterwards, the control agent would be able to offer the optimal dosage of the drug for each day. Keeping that in mind, the training process of the control agent goes on during the chemotherapy process. For this reason, the control framework performance would not be affected by varying the patient’s parameters during chemotherapy.

6. Conclusions

This study was aimed at the investigation of a Caputo–Fabrizio fractional-order model of cancer chemotherapy treatment. At first, the existence and uniqueness of the solutions of the proposed model were proven through a fix-point theorem and an iterative method. After that, since applying optimal polices for drug dosage is crucial, we proposed to formulate the control problem of the chemotherapy treatment as an optimization problem and find optimal actions using a fuzzy reinforcement learning algorithm. The significant features of the designed fuzzy reinforcement learning-based method are its model-free approach as well as its optimal performance. Finally, three scenarios were considered to evaluate the performance of the proposed control technique. The results for the proposed control method demonstrate the effectiveness of the controller in the annihilation of tumor cells for young patients and elderly patients. The control method has the ability to bring the level of normal cells back to its normal range. In addition, the results implicate the robustness of the proposed control method against the uncertainties of the patients’ parameters. Moreover, because of being real-time, the performance of the proposed control method would not be affected even if the patients’ parameters had been changed during chemotherapy treatment. Also, the presented comparison with the Watkin’s Q-learning method conspicuously demonstrated the superiority of the proposed method in terms of the annihilation of the tumor cells in simulated patients and the drug usage during chemotherapy treatment. While in the current study it was shown that the fuzzy reinforcement learning algorithm is also able to achieve a good level of optimization in this task, deep reinforcement learning has the potential to achieve even better results by taking into account a greater amount of data and making more sophisticated decisions. However, it is important to note that this is an area of active research, and it is not yet clear how well deep reinforcement learning will perform in this specific application. Further research and development will be needed to determine whether this approach is practical and effective for scheduling cancer chemotherapy drug dosing. Furthermore, it would be helpful to run a statistical test using variance to determine whether there are statistically significant differences between two or more categorical groups.

Author Contributions

Methodology, F.E.A., A.Y., C.V., S.B., and H.J.; Validation, F.E.A., A.Y., C.V., S.B., and H.J.; Investigation, F.E.A., A.Y., C.V., S.B., and H.J.; Writing—original draft, F.E.A., A.Y., C.V., S.B., and H.J.; Supervision, F.E.A., A.Y., C.V., S.B., and H.J. All authors have read and agreed to the published version of the manuscript.

Funding

This research work was funded by Institutional Fund Projects under grant no. (IFPIP: 148-611-1443). The authors gratefully acknowledge the technical and financial support provided by the Ministry of Education and King Abdulaziz University, DSR, Jeddah, Saudi Arabia.

Data Availability Statement

Not applicable.

Acknowledgments

This research work was funded by Institutional Fund Projects under grant no. (IFPIP: 148-611-1443). The authors gratefully acknowledge the technical and financial support provided by the Ministry of Education and King Abdulaziz University, DSR, Jeddah, Saudi Arabia.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

FESL	fuzzy controller based on Expected-SARSA learning
FRLC	Fuzzy reinforcement learning based controller
RL	reinforcement learning
trapmf	trapezoidal-shape membership function for upper bound
SARSA	State-Action-Reward-State-Action
zmfl	z-shape membership function for lower bound
zmfh	z-shape membership function for upper bound

References

Miller, K.D.; Nogueira, L.; Mariotto, A.B.; Rowland, J.H.; Yabroff, K.R.; Alfano, C.M.; Jemal, A.; Kramer, J.L.; Siegel, R.L. Cancer treatment and survivorship statistics, 2019. CA Cancer J. Clin. 2019, 69, 363–385. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Dos Santos, A.l.F.; De Almeida, D.R.Q.; Terra, L.F.; Baptista, M.c.S.; Labriola, L. Photodynamic therapy in cancer treatment-an update review. J. Cancer Metastasis Treat 2019, 5, 25. [Google Scholar] [CrossRef] [Green Version]
Thallinger, C.; Füreder, T.; Preusser, M.; Heller, G.; Müllauer, L.; Höller, C.; Prosch, H.; Frank, N.; Swierzewski, R.; Berger, W. Review of cancer treatment with immune checkpoint inhibitors. Wien. Klin. Wochenschr. 2018, 130, 85–91. [Google Scholar] [CrossRef] [Green Version]
Yagawa, Y.; Tanigawa, K.; Kobayashi, Y.; Yamamoto, M. Cancer immunity and therapy using hyperthermia with immunotherapy, radiotherapy, chemotherapy, and surgery. J. Cancer Metastasis Treat 2017, 3, 218. [Google Scholar] [CrossRef]
Cui, C.; Yang, J.; Li, X.; Liu, D.; Fu, L.; Wang, X. Functions and mechanisms of circular RNAs in cancer radiotherapy and chemotherapy resistance. Mol. Cancer 2020, 19, 1–16. [Google Scholar] [CrossRef] [Green Version]
Coates, A.; Abraham, S.; Kaye, S.B.; Sowerbutts, T.; Frewin, C.; Fox, R.M.; Tattersall, M.H.N. On the receiving end—Patient perception of the side-effects of cancer chemotherapy. Eur. J. Cancer Clin. Oncol. 1983, 19, 203–208. [Google Scholar] [CrossRef]
Toker, D.; Sommer, F.T.; D’Esposito, M. A simple method for detecting chaos in nature. Commun. Biol. 2020, 3, 11. [Google Scholar] [CrossRef] [Green Version]
Chen, T.; Kirkby, N.F.; Jena, R. Optimal dosing of cancer chemotherapy using model predictive control and moving horizon state/parameter estimation. Comput. Methods Programs Biomed. 2012, 108, 973–983. [Google Scholar] [CrossRef]
Sbeity, H.; Younes, R. Review of optimization methods for cancer chemotherapy treatment planning. J. Comput. Sci. Syst. Biol. 2015, 8, 74. [Google Scholar] [CrossRef]
Michor, F. Mathematical models of cancer stem cells. J. Clin. Oncol. 2008, 26, 2854–2861. [Google Scholar] [CrossRef]
Granata, D.; Lorenzi, L. An Evaluation of Propagation of the HIV-Infected Cells via Optimization Problem. Mathematics 2022, 10, 2021. [Google Scholar] [CrossRef]
Mokhtare, Z.; Vu, M.T.; Mobayen, S.; Rojsiraphisal, T. An adaptive barrier function terminal sliding mode controller for partial seizure disease based on the Pinsky–Rinzel mathematical model. Mathematics 2022, 10, 2940. [Google Scholar] [CrossRef]
Jahanshahi, H.; Munoz-Pacheco, J.M.; Bekiros, S.; Alotaibi, N.D. A fractional-order SIRD model with time-dependent memory indexes for encompassing the multi-fractional characteristics of the COVID-19. Chaos Solitons Fractals 2021, 143, 110632. [Google Scholar] [CrossRef] [PubMed]
Jahanshahi, H. Smooth control of HIV/AIDS infection using a robust adaptive scheme with decoupled sliding mode supervision. Eur. Phys. J. Spec. Top. 2018, 227, 707–718. [Google Scholar] [CrossRef]
Jahanshahi, H.; Shanazari, K.; Mesrizadeh, M.; Soradi-Zeid, S.; Gómez-Aguilar, J.F. Numerical analysis of Galerkin meshless method for parabolic equations of tumor angiogenesis problem. Eur. Phys. J. Plus 2020, 135, 866. [Google Scholar] [CrossRef]
Bachmann, J.; Raue, A.; Schilling, M.; Becker, V.; Timmer, J.; Klingmüller, U. Predictive mathematical models of cancer signalling pathways. J. Intern. Med. 2012, 271, 155–165. [Google Scholar] [CrossRef] [PubMed]
Wilkie, K.P. A review of mathematical models of cancer–immune interactions in the context of tumor dormancy. In Systems Biology of Tumor Dormancy; Springer: Berlin/Heidelberg, Germany, 2013; pp. 201–234. [Google Scholar]
Brady, R.; Enderling, H. Mathematical models of cancer: When to predict novel therapies, and when not to. Bull. Math. Biol. 2019, 81, 3722–3731. [Google Scholar] [CrossRef] [Green Version]
Ira, J.I.; Islam, M.S.; Misra, J.C. Mathematical Modelling of the Dynamics of Tumor Growth and its Optimal Control. Preprints 2020, 2020040391. Available online: https://www.preprints.org/manuscript/202004.0391/v2 (accessed on 24 December 2022).
Eisen, M. Mathematical Models in Cell Biology and Cancer Chemotherapy; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2013; Volume 30. [Google Scholar]
Schättler, H.; Ledzewicz, U. Optimal Control for Mathematical Models of Cancer Therapies; Springer: Berlin/Heidelberg, Germany, 2015; Volume 42. [Google Scholar]
Kumar, D.; Singh, J.; Baleanu, D. On the analysis of vibration equation involving a fractional derivative with Mittag-Leffler law. Math. Methods Appl. Sci. 2020, 43, 443–457. [Google Scholar] [CrossRef]
Soradi-Zeid, S.; Jahanshahi, H.; Yousefpour, A.; Bekiros, S. King algorithm: A novel optimization approach based on variable-order fractional calculus with application in chaotic financial systems. Chaos Solitons Fractals 2020, 132, 109569. [Google Scholar] [CrossRef]
Kumar, D.; Singh, J.; Al Qurashi, M.; Baleanu, D. A new fractional SIRS-SI malaria disease model with application of vaccines, antimalarial drugs, and spraying. Adv. Differ. Equ. 2019, 2019, 278. [Google Scholar] [CrossRef]
Srivastava, H.M.; Dubey, V.P.; Kumar, R.; Singh, J.; Kumar, D.; Baleanu, D. An efficient computational approach for a fractional-order biological population model with carrying capacity. Chaos Solitons Fractals 2020, 138, 109880. [Google Scholar] [CrossRef]
Chen, S.-B.; Jahanshahi, H.; Abba, O.A.; Solís-Pérez, J.E.; Bekiros, S.; Gómez-Aguilar, J.F.; Yousefpour, A.; Chu, Y.-M. The effect of market confidence on a financial system from the perspective of fractional calculus: Numerical investigation and circuit realization. Chaos Solitons Fractals 2020, 140, 110223. [Google Scholar] [CrossRef]
Chen, S.-B.; Soradi-Zeid, S.; Jahanshahi, H.; Alcaraz, R.; Gómez-Aguilar, J.F.; Bekiros, S.; Chu, Y.-M. Optimal Control of Time-Delay Fractional Equations via a Joint Application of Radial Basis Functions and Collocation Method. Entropy 2020, 22, 1213. [Google Scholar] [CrossRef] [PubMed]
Singh, J.; Kumar, D.; Baleanu, D. A new analysis of fractional fish farm model associated with Mittag-Leffler-type kernel. Int. J. Biomath. 2020, 13, 2050010. [Google Scholar] [CrossRef]
Morales-Delgado, V.F.; Gómez-Aguilar, J.F.; Saad, K.; Escobar Jiménez, R.F. Application of the Caputo-Fabrizio and Atangana-Baleanu fractional derivatives to mathematical model of cancer chemotherapy effect. Math. Methods Appl. Sci. 2019, 42, 1167–1193. [Google Scholar] [CrossRef]
Dokuyucu, M.A.; Celik, E.; Bulut, H.; Baskonus, H.M. Cancer treatment model with the Caputo-Fabrizio fractional derivative. Eur. Phys. J. Plus 2018, 133, 1–6. [Google Scholar] [CrossRef]
Losada, J.; Nieto, J.J. Properties of a new fractional derivative without singular kernel. Progr. Fract. Differ. Appl. 2015, 1, 87–92. [Google Scholar]
Caputo, M.; Fabrizio, M. A new definition of fractional derivative without singular kernel. Progr. Fract. Differ. Appl. 2015, 1, 73–85. [Google Scholar]
Alsaedi, A.; Nieto, J.J.; Venktesh, V. Fractional electrical circuits. Adv. Mech. Eng. 2015, 7, 1687814015618127. [Google Scholar] [CrossRef]
Singh, J.; Kumar, D.; Baleanu, D. New aspects of fractional Biswas–Milovic model with Mittag-Leffler law. Math. Model. Nat. Phenom. 2019, 14, 303. [Google Scholar] [CrossRef] [Green Version]
Kumar, D.; Tchier, F.; Singh, J.; Baleanu, D. An efficient computational technique for fractal vehicular traffic flow. Entropy 2018, 20, 259. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Owolabi, K.M.; Atangana, A. Analysis and application of new fractional Adams–Bashforth scheme with Caputo–Fabrizio derivative. Chaos Solitons Fractals 2017, 105, 111–119. [Google Scholar] [CrossRef]
Kumar, D.; Singh, J.; Al Qurashi, M.; Baleanu, D. Analysis of logistic equation pertaining to a new fractional derivative with non-singular kernel. Adv. Mech. Eng. 2017, 9, 1687814017690069. [Google Scholar] [CrossRef]
Tateishi, A.A.; Ribeiro, H.V.; Lenzi, E.K. The role of fractional time-derivative operators on anomalous diffusion. Front. Phys. 2017, 5, 52. [Google Scholar] [CrossRef] [Green Version]
Atangana, A.; Alkahtani, B. Analysis of the Keller–Segel model with a fractional derivative without singular kernel. Entropy 2015, 17, 4439–4453. [Google Scholar] [CrossRef]
Diethelm, K. The Analysis of Fractional Differential Equations: An Application-Oriented Exposition Using Differential Operators of Caputo Type; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2010. [Google Scholar]
Baleanu, D.; Jajarmi, A.; Mohammadi, H.; Rezapour, S. A new study on the mathematical modelling of human liver with Caputo–Fabrizio fractional derivative. Chaos Solitons Fractals 2020, 134, 109705. [Google Scholar] [CrossRef]
Bushnaq, S.; Khan, S.A.; Shah, K.; Zaman, G. Mathematical analysis of HIV/AIDS infection model with Caputo-Fabrizio fractional derivative. Cogent Math. Stat. 2018, 5, 1432521. [Google Scholar] [CrossRef]
Ullah, S.; Khan, M.A.; Farooq, M.; Hammouch, Z.; Baleanu, D. A fractional model for the dynamics of tuberculosis infection using Caputo-Fabrizio derivative. Discret. Contin. Dyn. Syst.-S 2020, 13, 975–993. [Google Scholar] [CrossRef] [Green Version]
Hristov, J. Derivatives with non-singular kernels from the Caputo–Fabrizio definition and beyond: Appraising analysis with emphasis on diffusion models. Front. Fract. Calc. 2017, 1, 270–342. [Google Scholar]
Liu, P.; Liu, X. Dynamics of a tumor-immune model considering targeted chemotherapy. Chaos Solitons Fractals 2017, 98, 7–13. [Google Scholar] [CrossRef]
Moradi, H.; Sharifi, M.; Vossoughi, G. Adaptive robust control of cancer chemotherapy in the presence of parametric uncertainties: A comparison between three hypotheses. Comput. Biol. Med. 2015, 56, 145–157. [Google Scholar] [CrossRef]
De Pillis, L.G.; Radunskaya, A. The dynamics of an optimally controlled tumor model: A case study. Math. Comput. Model. 2003, 37, 1221–1244. [Google Scholar] [CrossRef]
De Pillis, L.G.; Radunskaya, A. A mathematical tumor model with immune resistance and drug therapy: An optimal control approach. Comput. Math. Methods Med. 2001, 3, 79–100. [Google Scholar] [CrossRef]
de Pillis, L.G.; Gu, W.; Fister, K.R.; Head, T.A.; Maples, K.; Murugan, A.; Neal, T.; Yoshida, K. Chemotherapy for tumors: An analysis of the dynamics and a study of quadratic and linear optimal controls. Math. Biosci. 2007, 209, 292–315. [Google Scholar] [CrossRef] [PubMed]
Ghaffari, A.; Naserifar, N. Optimal therapeutic protocols in cancer immunotherapy. Comput. Biol. Med. 2010, 40, 261–270. [Google Scholar] [CrossRef]
Pang, L.; Zhao, Z.; Song, X. Cost-effectiveness analysis of optimal strategy for tumor treatment. Chaos Solitons Fractals 2016, 87, 293–301. [Google Scholar] [CrossRef]
Ledzewicz, U.; Schättler, H. Drug resistance in cancer chemotherapy as an optimal control problem. Discret. Contin. Dyn. Syst. -B 2006, 6, 129. [Google Scholar] [CrossRef]
Arciero, J.C.; Jackson, T.L.; Kirschner, D.E. A mathematical model of tumor-immune evasion and siRNA treatment. Discret. Contin. Dyn. Syst.-B 2004, 4, 39. [Google Scholar]
Letellier, C.; Sasmal, S.K.; Draghi, C.; Denis, F.; Ghosh, D. A chemotherapy combined with an anti-angiogenic drug applied to a cancer model including angiogenesis. Chaos Solitons Fractals 2017, 99, 297–311. [Google Scholar] [CrossRef]
Rokhforoz, P.; Jamshidi, A.A.; Sarvestani, N.N. Adaptive robust control of cancer chemotherapy with extended Kalman filter observer. Inform. Med. Unlocked 2017, 8, 1–7. [Google Scholar] [CrossRef]
Itik, M.; Salamci, M.U.; Banks, S.P. Optimal control of drug therapy in cancer treatment. Nonlinear Anal. Theory Methods Appl. 2009, 71, e1473–e1486. [Google Scholar] [CrossRef]
Padmanabhan, R.; Meskin, N.; Haddad, W.M. Reinforcement learning-based control of drug dosing for cancer chemotherapy treatment. Math. Biosci. 2017, 293, 11–20. [Google Scholar] [CrossRef] [PubMed]
Yousefpour, A.; Haji Hosseinloo, A.; Reza Hairi Yazdi, M.; Bahrami, A. Disturbance observer–based terminal sliding mode control for effective performance of a nonlinear vibration energy harvester. J. Intell. Mater. Syst. Struct. 2020, 31, 1495–1510. [Google Scholar] [CrossRef]
Yousefpour, A.; Yasami, A.; Beigi, A.; Liu, J. On the development of an intelligent controller for neural networks: A type 2 fuzzy and chatter-free approach for variable-order fractional cases. Eur. Phys. J. Spec. Top. 2022, 231, 2045–2057. [Google Scholar] [CrossRef]
Yousefpour, A.; Jahanshahi, H. Fast disturbance-observer-based robust integral terminal sliding mode control of a hyperchaotic memristor oscillator. Eur. Phys. J. Spec. Top. 2019, 228, 2247–2268. [Google Scholar] [CrossRef]
Yousefpour, A.; Jahanshahi, H.; Bekiros, S.; Muñoz-Pacheco, J.M. Robust adaptive control of fractional-order memristive neural networks. In Mem-Elements for Neuromorphic Circuits with Artificial Intelligence Applications; Elsevier: Amsterdam, The Netherlands, 2021; pp. 501–515. [Google Scholar]
Yousefpour, A.; Jahanshahi, H.; Gan, D. Fuzzy integral sliding mode technique for synchronization of memristive neural networks. In Mem-Elements for Neuromorphic Circuits with Artificial Intelligence Applications; Elsevier: Amsterdam, The Netherlands, 2021; pp. 485–500. [Google Scholar]
Jahanshahi, H.; Sajjadi, S.S.; Bekiros, S.; Aly, A.A. On the development of variable-order fractional hyperchaotic economic system with a nonlinear model predictive controller. Chaos Solitons Fractals 2021, 144, 110698. [Google Scholar] [CrossRef]
Jahanshahi, H.; Yousefpour, A.; Wei, Z.; Alcaraz, R.; Bekiros, S. A financial hyperchaotic system with coexisting attractors: Dynamic investigation, entropy analysis, control and synchronization. Chaos Solitons Fractals 2019, 126, 66–77. [Google Scholar] [CrossRef]
Yao, Q.; Jahanshahi, H.; Bekiros, S.; Mihalache, S.F.; Alotaibi, N.D. Gain-Scheduled Sliding-Mode-Type Iterative Learning Control Design for Mechanical Systems. Mathematics 2022, 10, 3005. [Google Scholar] [CrossRef]
Yao, Q.; Jahanshahi, H.; Batrancea, L.M.; Alotaibi, N.D.; Rus, M.-I. Fixed-Time Output-Constrained Synchronization of Unknown Chaotic Financial Systems Using Neural Learning. Mathematics 2022, 10, 3682. [Google Scholar] [CrossRef]
Alsaadi, F.E.; Yasami, A.; Alsubaie, H.; Alotaibi, A.; Jahanshahi, H. Control of a Hydraulic Generator Regulating System Using Chebyshev-Neural-Network-Based Non-Singular Fast Terminal Sliding Mode Method. Mathematics 2023, 11, 168. [Google Scholar] [CrossRef]
Jahanshahi, H.; Yao, Q.; Khan, M.I.; Moroz, I. Unified neural output-constrained control for space manipulator using tan-type barrier Lyapunov function. Adv. Space Res. 2022; in press. [Google Scholar]
Jahanshahi, H.; Zambrano-Serrano, E.; Bekiros, S.; Wei, Z.; Volos, C.; Castillo, O.; Aly, A.A. On the dynamical investigation and synchronization of variable-order fractional neural networks: The Hopfield-like neural network model. Eur. Phys. J. Spec. Top. 2022, 231, 1757–1769. [Google Scholar] [CrossRef]
Jahanshahi, H.; Yousefpour, A.; Soradi-Zeid, S.; Castillo, O. A review on design and implementation of type-2 fuzzy controllers. Math. Methods Appl. Sci. 2022. [Google Scholar] [CrossRef]
Yao, Q.; Jahanshahi, H.; Bekiros, S.; Mihalache, S.F.; Alotaibi, N.D. Indirect neural-enhanced integral sliding mode control for finite-time fault-tolerant attitude tracking of spacecraft. Mathematics 2022, 10, 2467. [Google Scholar] [CrossRef]
Alsaade, F.W.; Yao, Q.; Bekiros, S.; Al-zahrani, M.S.; Alzahrani, A.S.; Jahanshahi, H. Chaotic attitude synchronization and anti-synchronization of master-slave satellites using a robust fixed-time adaptive controller. Chaos Solitons Fractals 2022, 165, 112883. [Google Scholar] [CrossRef]
Yao, Q.; Jahanshahi, H.; Moroz, I.; Bekiros, S.; Alassafi, M.O. Indirect neural-based finite-time integral sliding mode control for trajectory tracking guidance of Mars entry vehicle. Adv. Space Res. 2022; in press. [Google Scholar]
Chen, X. Research on application of artificial intelligence model in automobile machinery control system. Int. J. Heavy Veh. Syst. 2020, 27, 83–96. [Google Scholar] [CrossRef]
Das, P.; Chanda, S.; De, A. Artificial Intelligence-Based Economic Control of Micro-grids: A Review of Application of IoT. In Computational Advancement in Communication Circuits and Systems; Springer: Berlin/Heidelberg, Germany, 2020; pp. 145–155. [Google Scholar]
Hamet, P.; Tremblay, J. Artificial intelligence in medicine. Metabolism 2017, 69, S36–S40. [Google Scholar] [CrossRef]
Lim, C.W.; Zhang, G.; Reddy, J.N. A higher-order nonlocal elasticity and strain gradient theory and its applications in wave propagation. J. Mech. Phys. Solids 2015, 78, 298–313. [Google Scholar] [CrossRef]
Mao, Y.; Wang, J.; Jia, P.; Li, S.; Qiu, Z.; Zhang, L.; Han, Z. A reinforcement learning based dynamic walking control. In Proceedings of the 2007 IEEE International Conference on Robotics and Automation, Rome, Italy, 10–14 April 2007; pp. 3609–3614. [Google Scholar]
Qiao, J.; Hou, Z.; Ruan, X. Application of reinforcement learning based on neural network to dynamic obstacle avoidance. In Proceedings of the 2008 International Conference on Information and Automation, Changsha, China, 20–23 June 2008; pp. 784–788. [Google Scholar]
Wei, C.; Zhang, Z.; Qiao, W.; Qu, L. Reinforcement-learning-based intelligent maximum power point tracking control for wind energy conversion systems. IEEE Trans. Ind. Electron. 2015, 62, 6360–6370. [Google Scholar] [CrossRef]
Sabatier, J.; Farges, C.; Merveillaut, M.; Feneteau, L. On observability and pseudo state estimation of fractional order systems. Eur. J. Control 2012, 18, 260–271. [Google Scholar] [CrossRef]
Wang, S.; He, S.; Yousefpour, A.; Jahanshahi, H.; Repnik, R.; Perc, M. Chaos and complexity in a fractional-order financial system with time delays. Chaos Solitons Fractals 2020, 131, 109521. [Google Scholar] [CrossRef]
Kilbas, A.A.; Srivastava, H.M.; Trujillo, J.J. Theory and Applications of Fractional Differential Equations; Elsevier: Amsterdam, The Netherlands, 2006; Volume 204. [Google Scholar]
Moore, B.L.; Pyeatt, L.D.; Kulkarni, V.; Panousis, P.; Padrez, K.; Doufas, A.G. Reinforcement learning for closed-loop propofol anesthesia: A study in human volunteers. J. Mach. Learn. Res. 2014, 15, 655–696. [Google Scholar]
Martín-Guerrero, J.D.; Gomez, F.; Soria-Olivas, E.; Schmidhuber, J.; Climente-Martí, M.; Jiménez-Torres, N.V. A reinforcement learning approach for individualizing erythropoietin dosages in hemodialysis patients. Expert Syst. Appl. 2009, 36, 9737–9742. [Google Scholar] [CrossRef]
Padmanabhan, R.; Meskin, N.; Haddad, W.M. Closed-loop control of anesthesia and mean arterial pressure using reinforcement learning. Biomed. Signal Process. Control 2015, 22, 54–64. [Google Scholar] [CrossRef]
Batmani, Y.; Khaloozadeh, H. Optimal chemotherapy in cancer treatment: State dependent Riccati equation control and extended Kalman filter. Optim. Control Appl. Methods 2013, 34, 562–577. [Google Scholar] [CrossRef]
Hunter, J.K.; Nachtergaele, B. Applied Analysis; World Scientific Publishing Company, Toh Tuck Link: Singapore, 2001. [Google Scholar]
Kreyszig, E. Introductory Functional Analysis with Applications; Wiley: New York, NY, USA, 1978; Volume 1. [Google Scholar]
Baird, L. Residual algorithms: Reinforcement learning with function approximation. In Machine Learning Proceedings 1995; Elsevier: Amsterdam, The Netherlands, 1995; pp. 30–37. [Google Scholar]
Gordon, G.J. Stable function approximation in dynamic programming. In Machine Learning Proceedings 1995; Elsevier: Amsterdam, The Netherlands, 1995; pp. 261–268. [Google Scholar]
Boyan, J.A.; Moore, A.W. Generalization in reinforcement learning: Safely approximating the value function. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 1995; pp. 369–376. [Google Scholar]
Kuznetsov, V.A.; Makalkin, I.A.; Taylor, M.A.; Perelson, A.S. Nonlinear dynamics of immunogenic tumors: Parameter estimation and global bifurcation analysis. Bull. Math. Biol. 1994, 56, 295–321. [Google Scholar] [CrossRef]

Figure 1. Sensitivity analysis.

Figure 2. Structure of reinforcement learning.

Figure 3. Membership functions used for fuzzy-sets. (a) Trapezoidal-shape membership function. (b) Z-shape membership function for upper bound. (c) Z-shape membership function for lower bound.

Figure 4. The result of simulation using the proposed control method for young patient without uncertainty. (a) Normal-cells (b) Tumor-cells (c) Immune-cells (d) Concentration of cells (e) Control action.

Figure 5. The result of simulation using the proposed control method for young patient with uncertainty. (a) Normal-cells (b) Tumor-cells (c) Immune-cells (d) Concentration of cells (e) Control action.

Figure 6. The result of simulation using the proposed control method for elderly patient without uncertainty. (a) Normal-cells (b) Tumor-cells (c) Immune-cells (d) Concentration of cells (e) Control action.

Figure 7. The result of simulation using the proposed control method for elderly patient with uncertainty. (a) Normal-cells (b) Tumor-cells (c) Immune-cells (d) Concentration of cells (e) Control action.

Table 1. The simulated patient’s parameters [47].

Parameter	Description (Unit)	Value
$a_{1}$	Fractional immune cell kill rate (mg⁻¹ lday⁻¹)	0.2
$a_{2}$	Fractional tumor cell kill rate (mg⁻¹ lday⁻¹)	0.3
$a_{3}$	Fractional normal cell kill rate (mg⁻¹ lday⁻¹)	0.1
$b_{1}$	Reciprocal carrying capacity of tumor cells (cell⁻¹)	1
$b_{2}$	Reciprocal carrying capacity of normal cells (cell⁻¹)	1
$c_{1}$	Immune cell competition term (competition between immune and tumor cells) (cell⁻¹ day⁻¹)	1
$c_{2}$	Tumor cell competition term (competition between immune and tumor cells) (cell⁻¹ day⁻¹)	0.5
$c_{3}$	Tumor cell competition term (competition between normal and tumor cells) (cell⁻¹ day⁻¹)	1
$c_{4}$	Normal cell competition term (competition between normal and tumor cells) (cell⁻¹ day⁻¹)	1
$d_{1}$	Immune cell death rate (day⁻¹)	0.2
$d_{2}$	Decay rate of injected drug (day⁻¹)	1
$r_{1}$	Per unit growth rate of tumor cells (day⁻¹)	1.5
$r_{2}$	Per unit growth rate of normal cells (day⁻¹)	1
$s$	Immune cell influx rate (cell day⁻¹)	0.33
$γ$	Immune threshold rate (cell)	0.3
$λ$	Immune response rate (day⁻¹)	0.01

Table 2. Fuzzy-sets and their membership functions.

#	Type	[a, b, c, d]	#	Type	[a, b, c, d]
1	zmfl	[0.0055, 0.0068, ~, ~]	11	trapmf	[0.3495, 0.3505, 0.3995, 0.4005]
2	trapmf	[0.0058, 0.0068, 0.0120, 0.0130]	12	trapmf	[0.3995, 0.4005, 0.4495, 0.4505]
3	trapmf	[0.0120, 0.0130, 0.0245, 0.0255]	13	trapmf	[0.4495, 0.4505, 0.4995, 0.5005]
4	trapmf	[0.0245, 0.0255, 0.0395, 0.0405]	14	trapmf	[0.4995, 0.5005, 0.5495, 0.5505]
5	trapmf	[0.0395, 0.0405, 0.0495, 0.0505]	15	trapmf	[0.5495, 0.5505, 0.5995, 0.6005]
6	trapmf	[0.0495, 0.0505, 0.0995, 0.1005]	16	trapmf	[0.5995, 0.6005, 0.6495, 0.6505]
7	trapmf	[0.0995, 0.1005, 0.1995, 0.2005]	17	trapmf	[0.6495, 0.6505, 0.6995, 0.7005]
8	trapmf	[0.1995, 0.2005, 0.2495, 0.2505]	18	trapmf	[0.6995, 0.7005, 0.7995, 0.8005]
9	trapmf	[0.2495, 0.2505, 0.2995, 0.3005]	19	trapmf	[0.7995, 0.8005, 0.8995, 0.9005]
10	trapmf	[0.2995, 0.3005, 0.3495, 0.3505]	20	zmfh	[0.8995, 0.9005, ~, ~]

Table 3. Result of simulations.

	Young Patients			Old Patients
	${M (‖E‖}_{2})$	$M (v a r (E))$	${M (‖U‖}_{2})$	${M (‖E‖}_{2})$	$M (v a r (E))$	${M (‖U‖}_{2})$
FESL	1.9633	0.0011	130.1827	3.0863	0.0011	123.0201
Q-Learning	3.0206	0.0079	131.7332	4.1083	0.0066	137.0524

where

M (.)

,

v a r (.)

and

{‖.‖}_{2}

denote the mean function, variance function, and the norm function, respectively.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Alsaadi, F.E.; Yasami, A.; Volos, C.; Bekiros, S.; Jahanshahi, H. A New Fuzzy Reinforcement Learning Method for Effective Chemotherapy. Mathematics 2023, 11, 477. https://doi.org/10.3390/math11020477

AMA Style

Alsaadi FE, Yasami A, Volos C, Bekiros S, Jahanshahi H. A New Fuzzy Reinforcement Learning Method for Effective Chemotherapy. Mathematics. 2023; 11(2):477. https://doi.org/10.3390/math11020477

Chicago/Turabian Style

Alsaadi, Fawaz E., Amirreza Yasami, Christos Volos, Stelios Bekiros, and Hadi Jahanshahi. 2023. "A New Fuzzy Reinforcement Learning Method for Effective Chemotherapy" Mathematics 11, no. 2: 477. https://doi.org/10.3390/math11020477

APA Style

Alsaadi, F. E., Yasami, A., Volos, C., Bekiros, S., & Jahanshahi, H. (2023). A New Fuzzy Reinforcement Learning Method for Effective Chemotherapy. Mathematics, 11(2), 477. https://doi.org/10.3390/math11020477

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A New Fuzzy Reinforcement Learning Method for Effective Chemotherapy

Abstract

1. Introduction

2. Preliminaries

3. Caputo–Fabrizio Fractional Model of Cancer Chemotherapy

3.1. Existence and Uniqueness of Solutions of the Cancer Chemotherapy Model

3.2. Sensitivity Analysis

4. Methodology

Fuzzy Controller Based on Expected SARSA Learning (FESL)

5. Numerical Simulations

5.1. Scenario A

5.2. Scenario B

5.3. Scenario C

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI