Algorithms 17 00098 v2
Algorithms 17 00098 v2
Algorithms 17 00098 v2
Article
Reinforcement Learning-Based Optimization for Sustainable and
Lean Production within the Context of Industry 4.0
Panagiotis D. Paraschos 1, * , Georgios K. Koulinas 1, * and Dimitrios E. Koulouriotis 2
Abstract: The manufacturing industry often faces challenges related to customer satisfaction, system
degradation, product sustainability, inventory, and operation management. If not addressed, these
challenges can be substantially harmful and costly for the sustainability of manufacturing plants.
Paradigms, e.g., Industry 4.0 and smart manufacturing, provide effective and innovative solutions,
aiming at managing manufacturing operations, and controlling the quality of completed goods of-
fered to the customers. Aiming at that end, this paper endeavors to mitigate the described challenges
in a multi-stage degrading manufacturing/remanufacturing system through the implementation of
an intelligent machine learning-based decision-making mechanism. To carry out decision-making,
reinforcement learning is coupled with lean green manufacturing. The scope of this implementation
is the creation of a smart lean and sustainable production environment that has a minimal envi-
ronmental impact. Considering the latter, this effort is made to reduce material consumption and
extend the lifecycle of manufactured products using pull production, predictive maintenance, and
circular economy strategies. To validate this, a well-defined experimental analysis meticulously
investigates the behavior and performance of the proposed mechanism. Results obtained by this
analysis support the presented reinforcement learning/ad hoc control mechanism’s capability and
competence achieving both high system sustainability and enhanced material reuse.
revolution, Industry 4.0 is a manufacturing concept that strives to transform traditional pro-
duction plants into smart ones, aiming to optimize their productivity and cost-effectiveness
through intelligent manufacturing [5]. It is a transition achieved through the coupling
of integrated manufacturing operations and components, e.g., machines, with intelligent
technologies, including Internet of Things (IoT) [6,7]. These devices communicate with the
system, gathering significant data [8,9]. With the aid of machine learning, they analyze
and utilize the accumulated data to schedule operations, predict failures, manage product
inventory, and conduct quality inspections [10,11]. As a result, corresponding predic-
tive models are developed and assume an integral role in the decision-making process
conducted within smart production system [12,13].
Along with Industry 4.0, the lean green manufacturing concept could be employed
as well to improve both sustainability and output product quality within smart produc-
tion systems [14,15]. It is a relatively new paradigm that simultaneously exploits the
strengths of lean and green manufacturing in order to identify waste and assess environ-
mental effects [16]. To understand the strengths of lean and green manufacturing, let us
define each concept separately. Devised by Toyota, lean manufacturing is a manufacturing
paradigm that intends to improve the value of a product. It attempts to reduce mate-
rial waste within manufacturing systems and to support seamless collaboration between
entities, including suppliers and customers, involved in manufacturing operations and
processes [17,18]. The goal of this application is to promptly and cost-effectively manu-
facture high-quality items [19]. One of the limitations of lean manufacturing is the lack of
performance metrics that assess the environmental impact of manufacturing processes [20].
On the other hand, green manufacturing endeavors to improve the plants’ productivity
with circular economy practices, such as recycling and refurbishing [21]. It aims at pro-
ducing items that demonstrate conformance to the quality and environmental standards
by decreasing the negative impact associated with human intervention [22]. However,
the implementation of lean and green practices within production systems is still hindered
by a plethora of barriers, including the lack of proper training, inventory management,
and effective maintenance plans [23]. Industry 4.0 could solve these barriers with its
technologies and finally enable the integration of lean green manufacturing in the actual
industry [24].
With regard to the above, the present paper aims to formulate smart production
environments that are lean, sustainable, and eco-friendly. It presents a scheduling optimiza-
tion framework that devises joint control policies for multi-stage manufacturing systems.
For this effort, it exploits a reinforcement learning (RL) decision-making technique. RL,
as suggested by [25], is a machine learning approach that attempts to learn an action in an
effort to increase the obtained reward. Compared to methods, e.g., Dynamic Programming,
RL attempts to solve decision-making problems without requiring an explicit model of its
environment [26,27].
The RL-based decision-making process is complemented with lean green practices.
These practices combine two types of manufacturing; that is, lean manufacturing and
green manufacturing. Considering the lean part, the present paper adopts pull produc-
tion and total productive maintenance, which are two well-known lean manufacturing
techniques [28]. In this respect, pull production is enabled through ad hoc policies, such
as Base Stock [29,30], while total productive maintenance is integrated through predictive
maintenance policies, e.g., condition-based maintenance, that seek to minimize mainte-
nance costs and improve the availability of processing machines [31]. The implemented
ad hoc policies are frequently employed in the real-world industry and the academic
literature [32,33]. On the other hand, the green manufacturing part is supported with
circular economy policies, namely refurbishing and remanufacturing. These policies aim to
minimize the environmental impact of products by extending their lifecycle [34]. In this
respect, low-quality or returned material is processed again in order to create new and
usable products. This leads to a substantial reduction in material consumption and raw
material dependence.
Algorithms 2024, 17, 98 3 of 22
2. Literature Review
2.1. Sustainable Production
Sustainable production is mainly characterized by the generation of products using
minimal materials and natural resources [35]. It involves a number of activities, e.g., reman-
ufacturing, aiming to extend product cycle and reduce the negative effects of manufacturing
processes on the environment, such as material waste and pollution [36,37]. In this respect,
Industry 4.0 has introduced new technologies that enable corresponding practices in produc-
tion plants, contributing positively to the value creation in the economic [38], social [39], and
environmental [40] aspects of sustainability. Given these implications, the state-of-the-art
literature has implemented the sustainable production concept to improve the sustainability
of manufacturing systems, focusing mainly on their energy efficiency [41]. Specifically,
these applications utilize detailed probabilistic models, optimized by machine learning
algorithms, to make forecasts for energy costs, improving the decision-making carried out
in industries and thus paving the way for a sustainable future [42,43]. For example, focus-
ing on the lot-sizing and scheduling optimization problem, Roshani et al. [44] endeavored
to minimize operational costs and the energy consumption of the examined single-stage
Algorithms 2024, 17, 98 4 of 22
production system. In this effort, they generated manufacturing and remanufacturing plans
with a mixed-integer programming modeling approach and metaheuristic algorithms.
Another noteworthy aspect of sustainable production in Industry 4.0 is the integration
of lean manufacturing, aimed to improve the final quality of products and satisfy cus-
tomers [38]. In the pertinent literature, its applications range from applying lean VSM [45]
to supply chain management [46]. For instance, Ferreira et al. [45] developed and simulated
a VSM model that implements lean manufacturing environments in the context of Industry
4.0. For this effort, it incorporates a variety of decision-making agents, specializing on
discrete aspects of manufacturing environments, such as coordination of operations or re-
source management. Following a sophisticated approach, Soltani et al. [47] presented a lean
manufacturing approach that assess the manufacturing sustainability of plants with VSM
and suggests solutions for minimizing generated waste through multi-criteria decision
analysis methods, e.g., TOPSIS.
In addition to lean manufacturing, green manufacturing practices emerge as an effec-
tive manufacturing solution that aims to build a truly circular manufacturing environment
with near-zero environmental impact [22]. An example of such an implementation is
presented in [48]. In this publication, the authors considered inventory control models
that authorize manufacturing and remanufacturing activities, endeavoring to generate
innovative green products, i.e., products that are generated using used and returned items.
Focusing on car manufacturing, Liu and De Giovanni [49] strove to find an optimal trade-off
between economical and environmental effects considering a supply chain. They mod-
eled the performance of the supply chain to evaluate the impact of green process on the
production costs.
Moreover, the combined concept of lean and green manufacturing is still studied and
evaluated by the literature, with the aim of creating a ubiquitous framework that could
be easily implemented in manufacturing plants. For example, Kurdve and Bellgran [50]
discussed how the lean green manufacturing combined with circular economy practices can
be incorporated in the plants’ shop-floor and presented a framework that could realize that
concept. Tripathi et al. [51] presented a process optimization framework integrating lean
and green concepts. Furthermore, they provided guidelines for implementing the frame-
work in Industry 4.0-enabled shop-floor systems in order devise corresponding production
plans. Duarte and Cruz-Machado [52] presented a framework conceptualizing a lean green
supply chain within Industry 4.0 to investigate how these concepts are intertwined with
each other and the relationship between them.
Table 1 summarizes the contribution of the cited papers in the pertinent literature.
According to this table, a large portion of studies focused on the derivation of policies
that merely authorize single type of activities on the basis of specific sustainability aspect,
such as energy consumption. To this end, they applied complex mathematical models,
e.g., mixed integer model, that capture a snapshot of the investigated problem. Further-
more, the literature review has revealed that reinforcement learning is scarcely applied
within the context of sustainable production. To this end, the present paper presents a
machine learning-based decision-making mechanism integrated in a multi-stage manufac-
turing system. The proposed decision-making methodology integrates a RL method and
lean green practices. Implementing these techniques, the proposed mechanism strives to
evaluate the synergy between Industry 4.0 and lean green manufacturing concepts and
within smart manufacturing system. The aim of such synergy is to reduce the impact of
the considered systems upon the environment and improve sustainability with circular
economy and lean thinking practices. To do so, the mechanism endeavors to jointly op-
timize processes related to manufacturing, remanufacturing, maintenance, raw-material
procurement, and refurbishing.
manufacturing facility processes the WIP products completed in the first stage. As a result,
the final items are completed and stored in the second storage facility.
In terms of operability, frequent failures persistently degrade the completed products
and the system. In this context, the system condition can be divided into o phases; with
each failure, the examined system is moved from o into o + 1. To preempt any further
degradation, the system is being maintained systematically. It is considered that the
maintenance cost is considerable increased at late deterioration stages. In case of complete
inoperability, repair activities restore the operability of the system by restoring its good
condition (o = 0).
The system produces standard products and defective items due to the persistent
deterioration. In terms of minimizing material waste, the faulty goods can be remanufac-
tured by the system. Considering the system revenues, the system can sell standard and
remanufactured products to the customers. However, if the customers feel dissatisfied
with their purchased products, they can return them to the system. These items can be
refurbished. The refurbished items can be sold to customers.
Figure 1. The presented RL/ad hoc production and maintenance control mechanism.
In the expression above, δ(t) ∈ [0, 1, 2, . . . , o ] refers to the degradation of the man-
ufacturing facility, σ(t) ∈ [downtime, working, idle, maintained] denotes the status of the
manufacturing facilities, and π a (t), πϵ (t) ∈ [0, 1, 2, . . . , Smax ] represents the inventory of
standard and faulty goods, respectively. Furthermore, it does not perform decision-making
if any machine ceases operation, or S1 and S2 reach their maximum capacity.
4.2. Actions
Through its two agents, the mechanism devises policies that initiate production and
maintenance operations using parametric policies, procure raw materials, authorize refur-
bishing and remanufacturing activities for returned and faulty items, respectively. The set
of the actions can be formulated as:
the manufacturing system produces items in respect to defined thresholds associated with
the product inventory. In this regard, the thresholds are s and S, denoting the minimum
and maximum allowable inventory, respectively. When the product inventory is less than
s, the production process is initiated in the facilities. If the inventory reaches S, then the
facilities stop producing items.
Customer Demand
Figure 3. An illustration of a BS system.
Lastly, the Extended KANBAN (EK) [64] is formed on the basis of BS and KANBAN
policies. The functionality of this policy is presented in Figure 4, and can be described as
follows. After the placement of new orders, A0 and A1 receive information regarding the
customer demand. To begin the production process in MF1 and MF2 , a product must be
removed from the facilities and its KANBAN card must be sent to the respective K0 and K1 .
K0 K1
Customer Demand
Figure 4. A multi-stage EK system.
In (5), Xα and X β denote the costs associated with producing and storing products.
Xγ refers to the cost of procuring raw materials. Xδo is the cost of maintenance occurred at
ϵ p , and Xϵ represents the cost of repair activities. Note that Xδ0 < Xδ1 < Xδ2 < . . . < Xδo .
Xζ , Xη and Fα are the cost of remanufacturing, refurbishing, and the fee of returned goods,
respectively. In regard to system profits, Kα refers to the standard product revenue, and K β
denotes the remanufactured product revenue. Lastly, Kγ is the revenue associated with
sold refurbished products.
At every step, the mechanism attempts to increase Aexp using (5). Clearly, this expres-
sion is the objective function of the agents. In other words, the proposed mechanism seeks
to formulate a policy p that maximizes the defined goal. In this regard, the agents integrate
a Markov Decision Process. During this process, the next environment state and the ex-
pected reward can be estimated on the basis of the present state and opted action using (6).
According to Sutton and Barto [25], both expressions represent the dynamics of the inter-
acting environment. Therefore, the dynamics of the studied manufacturing environment
are estimated as follows (for the sake of brevity, the time variables are omitted):
where θ, ψ are the current state and the action at ϵ p , θ ′ and ψ′ denote the subsequent state
and action in the next epoch ϵ p + 1, A and A correspond to the reward and average reward
obtained by the agents, and β is a hyper-parameter that assumes real values. The state-
action value corresponds to the action taken in a specific system state. Every value is stored
in a table, called “q-table”, which is updated at every ϵ p with (7).
The actions are selected by exploring/exploiting the state-action space. To this end,
the e-greedy strategy is used. This strategy can opt for a greedy action (i.e., the action with
the optimum value in the q-table) with probability 1 − e, or a random one with probability
e. When the greedy action is opted, the corresponding reward is calculated as follows:
# µπ µp λ ao λ mo µe µr µρ
(1.25, 1.78, 2.53, (30.27, 29.74, 28.3,
1 2.15 1.6 34.15 6.22 7.17
3.84, 4.29, 5.23) 27.68, 27.23, 26.29)
(1.25, 1.78, 2.53, (30.27, 29.74, 28.3,
2 2.15 1.6 34.15 6.22 7.17
3.84, 4.29, 5.23) 27.68, 27.23, 26.29)
(5.44, 5.97, 6.63, (30.27, 29.74, 28.3,
3 2.15 2.48 34.15 5.73 6.58
7.38, 7.49, 8.02) 27.68, 27.23, 26.29)
(5.44, 5.97, 6.63, (30.27, 29.74, 28.3,
4 2.15 2.48 34.15 5.73 6.58
7.38, 7.49, 8.02) 27.68, 27.23, 26.29)
(5.44, 5.97, 6.63, (30.27, 29.74, 28.3,
5 2.15 2.48 34.15 5.73 6.58
7.38, 7.49, 8.02) 27.68, 27.23, 26.29)
(5.44, 5.97, 6.63, (30.27, 29.74, 28.3,
6 4.8 3.81 23.79 4.74 5.21
7.38, 7.49, 8.02) 27.68, 27.23, 26.29)
(5.44, 5.97, 6.63, (30.27, 29.74, 28.3,
7 4.8 3.81 23.79 4.74 5.21
7.38, 7.49, 8.02) 27.68, 27.23, 26.29)
(5.44, 5.97, 6.63, (30.27, 29.74, 28.3,
8 4.8 3.81 23.79 4.74 5.21
7.38, 7.49, 8.02) 27.68, 27.23, 26.29)
(5.44, 5.97, 6.63, (30.27, 29.74, 28.3,
9 4.8 3.81 23.79 4.74 5.21
7.38, 7.49, 8.02) 27.68, 27.23, 26.29)
(5.44, 5.97, 6.63, (30.27, 29.74, 28.3,
10 7.35 4.63 23.79 3.47 4.35
7.38, 7.49, 8.02) 27.68, 27.23, 26.29)
(8.28, 8.81, 9.08, (30.27, 29.74, 28.3,
11 7.35 5.32 13.43 2.53 3.12
10.68, 11.68, 11.68) 27.68, 27.23, 26.29)
(8.28, 8.81, 9.08, (30.27, 29.74, 28.3,
12 7.35 5.32 13.43 2.53 3.12
10.68, 11.68, 11.68) 27.68, 27.23, 26.29)
(8.28, 8.81, 9.08, (30.27, 29.74, 28.3,
13 7.35 5.32 13.43 2.53 3.12
10.68, 11.68, 11.68) 27.68, 27.23, 26.29)
to scarcely authorize maintenance and repair in the system. That is, their corresponding
rates received moderate values.
Furthermore, since the experiments simulated the behavior of the examined stochastic
system, preliminary experiments were conducted to determine how many times the experi-
ments would be replicated in order to obtain satisfactory results. During these experiments,
the performance of the studied manufacturing under R-Smart was assessed to determine
the appropriate number of replication. For simplicity reasons, the preliminary experiments
were based on the 1st scenario listed in Table 2, and the number of the completed item
inventory was kept constant at 6 million, since it does not considerably affect the perfor-
mance of the simulated system. In this respect, the simulation of two-stage manufacturing
system under the 1st experimental scenario was replicated 2–20 times. The obtained re-
sults demonstrated that the system was efficient in the case of experiments, where the
replication parameter assumed values equal, or above 15. Therefore, it was reasonable to
replicate every experiment scenario 15 times, stockpiling 6 million goods in the second
storage facility.
The values assumed by parameters of the experimental setup are listed in Table 3.
coe f coe f
Considering the table’s notation, Xα and X β are the product manufacturing and storing
coe f coe f coe f
cost coefficients, Xγ , Xδo and Xϵ are the relative cost coefficients related to raw
coe f coe f
materials, maintenance, and repair, Xζ , Xη denote the coefficients of remanufacturing
coe f coe f coe f
and refurbishing costs, Kα , K β , and Kγ represent coefficients of profits procured after
coe f
selling standard products, remanufactured, and refurbished goods. Finally, Fα is the fee
coefficient relevant to returned items.
coe f coe f coe f coe f coe f coe f coe f coe f coe f coe f coe f
Facility Smax Xα Xβ Xγ Xδo Xϵ Xζ Xη Fα Kα Kβ Kγ γ β e
Finally, the experimental analysis presented in later sections compares different versions
of the presented mechanism. The implemented mechanism versions are: (a) RL(EK-CBM),
(b) RL(EK-CM), (c) RL(EK-PM), (d) RL(BS-PM), (e) RL(BS-CM), (f) RL(BS-CBM), (g) RL((s, S)-
CBM), (h) RL((s, S)-PM), (i) RL((s, S)-CM), (j) RL(KANBAN-CBM), (k) RL(KANBAN-PM),
(l) RL(KANBAN-CM), (m) RL(CBM), (n) RL(PM), (o) RL(CM), (p) RL(R-Learning). Table 4
disseminates the ad hoc production and maintenance policies that are integrated in the
RL-based mechanisms, excluding RL(R-Learning). Using these policies, the RL agent initiates
production and maintenance in the system according to (2) and (3).
RL(CBM), RL(PM), RL(CM), and RL(R-Learning) were adapted from the approaches
presented in [26,60,66]. Elaborately, as presented in [26,66], RL(R-Learning) integrates a
single R-Learning-based agent for the joint control optimization of single-stage production
systems. Similarly, RL(CBM), RL(PM), and RL(CM) employ one agent for deriving policies.
This agent follows the RL-based decision-making approach [60], which involved ad hoc
production policies instead of maintenance ones. Given the above, the behavior of RL(CBM),
RL(PM), RL(CM), and RL(R-Learning) was accordingly adjusted for the purposes of the
present study.
Note, in the following figures, that each illustrated quantity is associated with its
respective unit. Analytically, the average inventory is measured in parts; the average
event rates (e.g., µ p ) are measured in time units; the obtained revenues and costs are
associated with monetary units; the simulation scenarios are dimensionless. For readability
reasons, these units, except for “parts”, are abbreviated, and the following notation is
utilized: “t.u.” denotes time unit, “m.u.” represents monetary unit, and “d.u.” refers to
dimensionless unity.
Algorithms 2024, 17, 98 12 of 22
improving the system’s condition. Although, they are significantly costly when they are
conducted in later degradation stages.
RL(PM), RL(CM), RL(CBM), and RL(R-Learning) are the worst-performing ones. Given
that, the analysis of this section attempts to explain their performance by assessing the
number of maintenance, remanufacturing, and refurbishing operations they authorized
within the system.
Figure 7 shows the maintenance costs attained by different versions of the proposed
scheduling mechanism. In this figure, RL(R-Learning) procured increased costs compared
to its counterparts. This illustrates that the decision-making agent of this version decided
to recurrently maintain the system to prevent degradation. Due to this repetition, essential
processes, such as remanufacturing and manufacturing, were likely delayed, resulting in
decreasing the productivity of the system, which is merely verified in Figure 5. Moreover,
the maintenance-related costs of the versions integrating only ad hoc maintenance policies
are lesser than the ones of RL(EK-CBM) and RL(BS-CBM). It is likely that the two-stage
system under the latter mechanisms was maintained in late degradation stages. Finding
itself in these stages, the system tends to procure increased costs maintenance-related costs,
as mentioned in Section 4. Despite that, RL(EK-CBM) and RL(BS-CBM) managed to be
sufficiently profitable as suggested by Figures 5 and 6.
Figure 8a,b illustrate how variations in the proposed mechanism perform in terms of
remanufacturing and refurbishing-related costs. Based on these figures, an interesting note
should be made in respect to the attained costs. That is, the costs relevant to refurbishing
activities are relative higher than the ones procured after conducting remanufacturing
operations. This indication can be attributed to the significant quantity of items received
from customers, considering the number of refurbished products illustrated in Figure 7.
Clearly, the system revenue stream is depended on the refurbishing of returned items.
Furthermore, similar to Figure 7, the costly version of the mechanism is RL(R-Learning),
procuring the highest remanufacturing and refurbishing costs compared to its counterparts.
That is, this indication suggests that the described variation’s agent decided on frequently
generating remanufactured and refurbished items to decrease the amount of faulty and
return products, respectively.
In terms of refurbishing activities, Figure 8a suggests that RL(BS-CBM) receives lesser
costs compared to the other variations. By merely integrating BS policy, the mechanism
achieves at generating stock of products by transmitting explicit information regarding
demand to facilities. In this context, BS-integrated mechanism outperforms EK-based one,
despite being the most profitable variant throughout the simulations. In terms of conducted
remanufacturing operations, though, Figure 8b illustrates the cost-effectiveness of both
RL(BS-CBM) and RL(EK-CBM). This can be likely attributed to the decreased storage of
faulty products. In that case, the majority of the manufactured items are of standard quality,
as the system is well-maintained under the CBM policy. Thus, remanufacturing activities
are considered to be redundant.
Algorithms 2024, 17, 98 15 of 22
(a) (b)
Figure 8. The variance of (a) average refurbishing cost, and (b) average remanufacturing cost.
remanufactured items are limited. Nevertheless, the filling of facilities with refurbished
items is a clear indication that the two-agent mechanism turns the loss associated with re-
turned products into an effective and sustainable revenue stream for the examined system,
since the refurbished items are later sold to customers.
(a) (b)
Figure 11. Analysis of revenues with comparison of (a) average production rate vs average failure
rate, and (b) average remanufacturing rate vs average refurbishing rate.
(a) (b)
Figure 12. The comparison of average manufacturing rate against (a) remanufacturing and (b) refur-
bishing rates in terms of system revenues.
(a) (b)
Figure 13. The impact of (a) β and (b) γ on the performance of the system.
6. Discussion
Due to the results presented above, one can suggest the proposed mechanism’s in-
tegration in real-world manufacturing plants. To better understand its applicability, let
us examine the functionality of the presented reinforcement learning-based optimization
framework in a real-world industry context. In this regard, the mechanism could gather
data on the state of manufacturing machines involved in plants. These data could range
from the quality of output inventory to the suppliers providing raw materials. Correspond-
ing to the gathered data, the mechanism could make adjustments to operations, processes,
output, and returned products, aiming at improving the revenue stream of the plant.
Specifically, it was indicated in the previous section that the mechanism prefers reusing
manufactured material over producing new items. Under these terms, the generation of
low-cost products is supported by remanufacturing and refurbishing low-quality and used
goods contained in inventory.
Furthermore, the mechanism could improve the efficiency and productivity of imple-
mented manufacturing machine by generating make-to-stock items, given the demand
awareness demonstrated by production policies, e.g., BS. Along with this, the machine
could sustain the near-perfect condition of machines and output products through ad hoc
maintenance policies, such as CBM. Summarizing the mentioned merits, the proposed
reinforcement learning-based optimization framework supports smart lean-green man-
ufacturing and enhances the decision-making happened on highly complex plants by
intelligently automating the operation scheduling. Finally, despite being evaluated in a
two-stage system, the proposed approach could be handily employed for a plethora of
manufacturing systems, e.g., assembly systems, that regularly process items over multiple
stages. In this context, the nature of the reinforcement learning/parametric control ap-
proach could be marginally modified by the decision-makers. This modification is feasible,
as the approach does not require of an explicit model of manufacturing environments
and involves configurable parameters that provide effective control over manufacturing
and maintenance operations. Therefore, the presented mechanism could be considered
as a robust and easily-configurable framework that could be employed in a breadth of
real-world manufacturing applications, such as manufacturing scheduling.
Despite its strengths, the application of the proposed approach in real-world produc-
tion systems might involve several challenges. This paragraph examines two challenges
that might hinder the real-world implementation of the presented mechanism. Plausible
solutions to these challenges are presented as well. The first challenge is the learning
process of the agents. The agents would require a high amount of data to achieve optimal
behavior within real-world production systems, given their complexity. This process could
Algorithms 2024, 17, 98 19 of 22
7. Conclusions
In this paper, a two-agent scheduling framework was presented and implemented in
a two-stage production system, prone to degradation. The aim of this implementation was
to formulate a smart lean, and sustainable environment, having minimal environmental
impact. To this end, the proposed scheduling framework combines RL-based decision-
making with lean green manufacturing. Considering the latter, pull production, total
productive maintenance and circular economy policies are considered. The experimental
evaluation of the proposed mechanism demonstrated that the production system, controlled
by the policies devised by the mechanism, succeeds at being profitable and sustainable.
Implementing pull control and predictive maintenance policies, the mechanism promptly
produces enough stock of standard products, without any disruption in manufacturing
flow. As such, the storage costs and material consumption are dramatically reduced, as the
manufacturing activities are authorized only when an order is placed. Furthermore, already
produced material, forming faulty and returned items, is effectively being reused as well,
extending the lifecycle of products.
In terms of performance, the results mentioned above are comparable to the ones
presented in several publications [60,66]. That is, the framework can be considered as
effective as the ones presented in cited literature. However, the functionality of the latter is
focused and limited on the manufacturing problem and system studied in their respective
publications. In that case, the applicability of the presented framework can be considered
more robust and flexible. Analytically, the proposed mechanism can be applied to a plethora
of manufacturing environments and research problems with minimal modifications, due
to the nature of implemented amalgamation of reinforcement learning and parametric
control policies. Furthermore, due to the employment of machine learning and lean green
manufacturing strategies, it is clear that the presented approach facilitates the creation of
smart lean and sustainable production environments in contrast to other methodologies,
e.g., VSM [47]. It can dynamically assess the manufacturing environment and tackle in real-
time emerging issues, e.g., manufacturing waste, with joint policies related to operations,
e.g., raw material procurement. In this regard, the proposed mechanism can be considered
as a holistic solution to lean and sustainable production.
Algorithms 2024, 17, 98 20 of 22
Concluding this section, future guidelines of this study are provided as follows.
In future research, the mechanism could be employed for complex job-shop systems in
an effort to tackle task scheduling problems. In that case, it would be imperative to
construct a thorough model of the system applying to it machine learning algorithms, such
as deep learning, since the task scheduling problems are rather multifaceted. Moreover,
the proposed mechanism involved ad hoc policies for manufacturing and maintenance
policies. To expand the mechanism’s functionality, one could include policies relevant to
the procurement of raw materials, remanufacturing and refurbishing activities. Such an
implementation would offer additional insights on activities, other than manufacturing
and maintenance.
Author Contributions: Conceptualization, P.D.P., G.K.K. and D.E.K.; Methodology, P.D.P., G.K.K. and
D.E.K.; Software, P.D.P.; Validation, P.D.P., G.K.K. and D.E.K.; Investigation, P.D.P., G.K.K. and D.E.K.;
Formal Analysis, P.D.P.; Data Curation, P.D.P. and G.K.K.; Writing—Original Draft Preparation, P.D.P.;
Writing—Review and Editing, P.D.P., G.K.K. and D.E.K.; Visualization, P.D.P. and G.K.K.; Supervision,
D.E.K. All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.
Data Availability Statement: Data are contained within the article.
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Hu, S.J. Evolving Paradigms of Manufacturing: From Mass Production to Mass Customization and Personalization. Procedia
CIRP 2013, 7, 3–8. [CrossRef]
2. Wen, H.; Wen, C.; Lee, C.C. Impact of digitalization and environmental regulation on total factor productivity. Inf. Econ. Policy
2022, 61, 101007. [CrossRef]
3. Silva, A.; Rosano, M.; Stocker, L.; Gorissen, L. From waste to sustainable materials management: Three case studies of the
transition journey. Waste Manag. 2017, 61, 547–557. [CrossRef]
4. Ciliberto, C.; Szopik-Depczyńska, K.; Tarczyńska-Łuniewska, M.; Ruggieri, A.; Ioppolo, G. Enabling the Circular Economy
transition: A sustainable lean manufacturing recipe for Industry 4.0. Bus. Strateg. Environ. 2021, 30, 3255–3272. [CrossRef]
5. Dalzochio, J.; Kunst, R.; Pignaton, E.; Binotto, A.; Sanyal, S.; Favilla, J.; Barbosa, J. Machine learning and reasoning for predictive
maintenance in Industry 4.0: Current status and challenges. Comput. Ind. 2020, 123, 103298. [CrossRef]
6. Dalenogare, L.S.; Benitez, G.B.; Ayala, N.F.; Frank, A.G. The expected contribution of Industry 4.0 technologies for industrial
performance. Int. J. Prod. Econ. 2018, 204, 383–394. [CrossRef]
7. Chen, B.; Wan, J.; Shu, L.; Li, P.; Mukherjee, M.; Yin, B. Smart Factory of Industry 4.0: Key Technologies, Application Case, and
Challenges. IEEE Access 2018, 6, 6505–6519. [CrossRef]
8. Frank, A.G.; Dalenogare, L.S.; Ayala, N.F. Industry 4.0 technologies: Implementation patterns in manufacturing companies. Int. J.
Prod. Econ. 2019, 210, 15–26. [CrossRef]
9. Barrios, P.; Danjou, C.; Eynard, B. Literature review and methodological framework for integration of IoT and PLM in
manufacturing industry. Comput. Ind. 2022, 140, 103688. [CrossRef]
10. Rossit, D.A.; Tohmé, F.; Frutos, M. A data-driven scheduling approach to smart manufacturing. J. Ind. Inf. Integr. 2019, 15, 69–79.
[CrossRef]
11. Demertzi, V.; Demertzis, S.; Demertzis, K. An Overview of Privacy Dimensions on the Industrial Internet of Things (IIoT).
Algorithms 2023, 16, 378. [CrossRef]
12. Chabanet, S.; Bril El-Haouzi, H.; Thomas, P. Coupling digital simulation and machine learning metamodel through an active
learning approach in Industry 4.0 context. Comput. Ind. 2021, 133, 103529. [CrossRef]
13. Jimeno-Morenilla, A.; Azariadis, P.; Molina-Carmona, R.; Kyratzi, S.; Moulianitis, V. Technology enablers for the implementation
of Industry 4.0 to traditional manufacturing sectors: A review. Comput. Ind. 2021, 125, 103390. [CrossRef]
14. Erro-Garcés, A. Industry 4.0: Defining the research agenda. Benchmarking Int. J. 2021, 28, 1858–1882. [CrossRef]
15. Queiroz, G.A.; Alves Junior, P.N.; Costa Melo, I. Digitalization as an Enabler to SMEs Implementing Lean-Green? A Systematic
Review through the Topic Modelling Approach. Sustainability 2022, 14, 14089. [CrossRef]
16. Yadav, V.; Gahlot, P.; Rathi, R.; Yadav, G.; Kumar, A.; Kaswan, M.S. Integral measures and framework for green lean six sigma
implementation in manufacturing environment. Int. J. Sustain. Eng. 2021, 14, 1319–1331. [CrossRef]
17. Sundar, R.; Balaji, A.; Kumar, R.S. A Review on Lean Manufacturing Implementation Techniques. Procedia Eng. 2014, 97, 1875–1885.
[CrossRef]
18. Mostafa, S.; Dumrak, J.; Soltan, H. A framework for lean manufacturing implementation. Prod. Manuf. Res. 2013, 1, 44–64. [CrossRef]
19. Gupta, S.; Jain, S.K. A literature review of lean manufacturing. Int. J. Manag. Sci. Eng. Manag. 2013, 8, 241–249. [CrossRef]
Algorithms 2024, 17, 98 21 of 22
20. Banawi, A.; Bilec, M.M. A framework to improve construction processes: Integrating Lean, Green and Six Sigma. Int. J. Constr.
Manag. 2014, 14, 45–55. [CrossRef]
21. Rathi, R.; Kaswan, M.S.; Garza-Reyes, J.A.; Antony, J.; Cross, J. Green Lean Six Sigma for improving manufacturing sustainability:
Framework development and validation. J. Clean. Prod. 2022, 345, 131130. [CrossRef]
22. Touriki, F.E.; Benkhati, I.; Kamble, S.S.; Belhadi, A.; El fezazi, S. An integrated smart, green, resilient, and lean manufacturing
framework: A literature review and future research directions. J. Clean. Prod. 2021, 319, 128691. [CrossRef]
23. Singh, R.K.; Kumar Mangla, S.; Bhatia, M.S.; Luthra, S. Integration of green and lean practices for sustainable business
management. Bus. Strateg. Environ. 2022, 31, 353–370. [CrossRef]
24. Leong, W.D.; Lam, H.L.; Ng, W.P.Q.; Lim, C.H.; Tan, C.P.; Ponnambalam, S.G. Lean and Green Manufacturing—A Review on its
Applications and Impacts. Process Integr. Optim. Sustain. 2019, 3, 5–23. [CrossRef]
25. Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction, 2nd ed.; The MIT Press: Cambridge, MA, USA, 2018.
26. Paraschos, P.D.; Koulinas, G.K.; Koulouriotis, D.E. Reinforcement learning for combined production-maintenance and quality
control of a manufacturing system with deterioration failures. J. Manuf. Syst. 2020, 56, 470–483. [CrossRef]
27. Paraschos, P.D.; Koulinas, G.K.; Koulouriotis, D.E. A reinforcement learning/ad-hoc planning and scheduling mechanism for
flexible and sustainable manufacturing systems. Flex. Serv. Manuf. J. 2023. [CrossRef]
28. Pagliosa, M.; Tortorella, G.; Ferreira, J.C.E. Industry 4.0 and Lean Manufacturing. J. Manuf. Technol. Manag. 2019, 32, 543–569.
[CrossRef]
29. Koulinas, G.; Paraschos, P.; Koulouriotis, D. A machine learning-based framework for data mining and optimization of a
production system. Procedia Manuf. 2021, 55, 431–438. [CrossRef]
30. Paraschos, P.D.; Koulinas, G.K.; Koulouriotis, D.E. Parametric and reinforcement learning control for degrading multi-stage
systems. Procedia Manuf. 2021, 55, 401–408. [CrossRef]
31. Samadhiya, A.; Agrawal, R.; Garza-Reyes, J.A. Integrating Industry 4.0 and Total Productive Maintenance for global sustainability.
TQM J. 2022, 36, 24–50. [CrossRef]
32. Xanthopoulos, A.S.; Koulouriotis, D.E. Multi-objective optimization of production control mechanisms for multi-stage serial
manufacturing-inventory systems. Int. J. Adv. Manuf. Technol. 2014, 74, 1507–1519. [CrossRef]
33. Koulinas, G.; Paraschos, P.; Koulouriotis, D. A Decision Trees-based knowledge mining approach for controlling a complex
production system. Procedia Manuf. 2020, 51, 1439–1445. [CrossRef]
34. Dahmani, N.; Benhida, K.; Belhadi, A.; Kamble, S.; Elfezazi, S.; Jauhar, S.K. Smart circular product design strategies towards
eco-effective production systems: A lean eco-design industry 4.0 framework. J. Clean. Prod. 2021, 320, 128847. [CrossRef]
35. Amjad, M.S.; Rafique, M.Z.; Khan, M.A. Leveraging Optimized and Cleaner Production through Industry 4.0. Sustain. Prod.
Consum. 2021, 26, 859–871. [CrossRef]
36. Jayal, A.; Badurdeen, F.; Dillon, O.; Jawahir, I. Sustainable manufacturing: Modeling and optimization challenges at the product,
process and system levels. CIRP J. Manuf. Sci. Technol. 2010, 2, 144–152. [CrossRef]
37. Aruanno, B. EcoPrintAnalyzer: Assessing Sustainability in Material Extrusion Additive Manufacturing for Informed Decision-
Making. Sustainability 2024, 16, 615. [CrossRef]
38. Kamble, S.S.; Gunasekaran, A.; Gawankar, S.A. Sustainable Industry 4.0 framework: A systematic literature review identifying
the current trends and future perspectives. Process Saf. Environ. Prot. 2018, 117, 408–425. [CrossRef]
39. Bai, C.; Kusi-Sarpong, S.; Badri Ahmadi, H.; Sarkis, J. Social sustainable supplier evaluation and selection: A group decision-
support approach. Int. J. Prod. Res. 2019, 57, 7046–7067. [CrossRef]
40. Green, K.W.; Zelbst, P.J.; Meacham, J.; Bhadauria, V.S. Green supply chain management practices: Impact on performance. Supply
Chain Manag. 2012, 17, 290–305. [CrossRef]
41. Moldavska, A.; Welo, T. The concept of sustainable manufacturing and its definitions: A content-analysis based literature review.
J. Clean. Prod. 2017, 166, 744–755. [CrossRef]
42. Jamwal, A.; Agrawal, R.; Sharma, M. Deep learning for manufacturing sustainability: Models, applications in Industry 4.0 and
implications. Int. J. Inf. Manag. Data Insights 2022, 2, 100107. [CrossRef]
43. Ahmad, T.; Madonski, R.; Zhang, D.; Huang, C.; Mujeeb, A. Data-driven probabilistic machine learning in sustainable smart
energy/smart energy systems: Key developments, challenges, and future research opportunities in the context of smart grid
paradigm. Renew. Sustain. Energy Rev. 2022, 160, 112128. [CrossRef]
44. Roshani, A.; Paolucci, M.; Giglio, D.; Demartini, M.; Tonelli, F.; Dulebenets, M.A. The capacitated lot-sizing and energy efficient
single machine scheduling problem with sequence dependent setup times and costs in a closed-loop supply chain network. Ann.
Oper. Res. 2023, 321, 469–505. [CrossRef]
45. Ferreira, W.d.P.; Armellini, F.; de Santa-Eulalia, L.A.; Thomasset-Laperrière, V. Extending the lean value stream mapping to the
context of Industry 4.0: An agent-based technology approach. J. Manuf. Syst. 2022, 63, 1–14. [CrossRef]
46. de Oliveira-Dias, D.; Maqueira-Marin, J.M.; Moyano-Fuentes, J.; Carvalho, H. Implications of using Industry 4.0 base technologies
for lean and agile supply chains and performance. Int. J. Prod. Econ. 2023, 262, 108916. [CrossRef]
47. Soltani, M.; Aouag, H.; Anass, C.; Mouss, M.D. Development of an advanced application process of Lean Manufacturing
approach based on a new integrated MCDM method under Pythagorean fuzzy environment. J. Clean. Prod. 2023, 386, 135731.
[CrossRef]
Algorithms 2024, 17, 98 22 of 22
48. Sarkar, B.; Ullah, M.; Sarkar, M. Environmental and economic sustainability through innovative green products by remanufactur-
ing. J. Clean. Prod. 2022, 332, 129813. [CrossRef]
49. Liu, B.; De Giovanni, P. Green process innovation through Industry 4.0 technologies and supply chain coordination. Ann. Oper.
Res. 2019, 1–36. [CrossRef]
50. Kurdve, M.; Bellgran, M. Green lean operationalisation of the circular economy concept on production shop floor level. J. Clean.
Prod. 2021, 278, 123223. [CrossRef]
51. Tripathi, V.; Chattopadhyaya, S.; Mukhopadhyay, A.K.; Sharma, S.; Singh, J.; Pimenov, D.Y.; Giasin, K. An Innovative Agile
Model of Smart Lean–Green Approach for Sustainability Enhancement in Industry 4.0. J. Open Innov. Technol. Mark. Complex.
2021, 7, 215. [CrossRef]
52. Duarte, S.; Cruz-Machado, V. An investigation of lean and green supply chain in the Industry 4.0. In Proceedings of the
2017 International Symposium on Industrial Engineering and Operations Management (IEOM), Bristol, UK, 24–25 July 2017;
pp. 255–265.
53. Li, C.; Zheng, P.; Li, S.; Pang, Y.; Lee, C.K. AR-assisted digital twin-enabled robot collaborative manufacturing system with
human-in-the-loop. Robot. Comput. Integr. Manuf. 2022, 76, 102321. [CrossRef]
54. Shakya, M.; Ng, H.Y.; Ong, D.J.; Lee, B.S. Reinforcement Learning Approach for Multi-period Inventory with Stochastic Demand.
In AIAI 2022: Artificial Intelligence Applications and Innovations; IFIP Advances in Information and Communication Technology
Book Series; Springer: Cham, Switzerland, 2022; Volume 646, pp. 282–291. [CrossRef]
55. Matrenin, P.V. Improvement of Ant Colony Algorithm Performance for the Job-Shop Scheduling Problem Using Evolutionary
Adaptation and Software Realization Heuristics. Algorithms 2022, 16, 15. [CrossRef]
56. Kayhan, B.M.; Yildiz, G. Reinforcement learning applications to machine scheduling problems: a comprehensive literature review.
J. Intell. Manuf. 2023, 34, 905–929. [CrossRef]
57. Yan, Q.; Wang, H.; Wu, F. Digital twin-enabled dynamic scheduling with preventive maintenance using a double-layer Q-learning
algorithm. Comput. Oper. Res. 2022, 144, 105823. [CrossRef]
58. Li, R.; Gong, W.; Lu, C. A reinforcement learning based RMOEA/D for bi-objective fuzzy flexible job shop scheduling. Expert
Syst. Appl. 2022, 203, 117380. [CrossRef]
59. Gu, W.; Li, Y.; Tang, D.; Wang, X.; Yuan, M. Using real-time manufacturing data to schedule a smart factory via reinforcement
learning. Comput. Ind. Eng. 2022, 171, 108406. [CrossRef]
60. Paraschos, P.D.; Xanthopoulos, A.S.; Koulinas, G.K.; Koulouriotis, D.E. Machine learning integrated design and operation
management for resilient circular manufacturing systems. Comput. Ind. Eng. 2022, 167, 107971. [CrossRef]
61. Geraghty, J.; Heavey, C. An investigation of the influence of coefficient of variation in the demand distribution on the performance
of several lean production control strategies. Int. J. Manuf. Technol. Manag. 2010, 20, 94–119. [CrossRef]
62. Axsäter, S. Inventory Control, 3rd ed.; International Series in Operations Research & Management Science; Springer International
Publishing: Cham, Switzerland, 2015; Volume 225. [CrossRef]
63. Duri, C.; Frein, Y.; Di Mascolo, M. Comparison among three pull control policies: Kanban, base stock, and generalized kanban.
Ann. Oper. Res. 2000, 93, 41–69. [CrossRef]
64. Dallery, Y.; Liberopoulos, G. Extended kanban control system: Combining kanban and base stock. IIE Trans. 2000, 32, 369–386.
[CrossRef]
65. Yang, S.; Gao, Y.; An, B.; Wang, H.; Chen, X. Efficient Average Reward Reinforcement Learning Using Constant Shifting Values.
Proc. AAAI Conf. Artif. Intell. 2016, 30, 2258–2264. [CrossRef]
66. Xanthopoulos, A.; Chnitidis, G.; Koulouriotis, D. Reinforcement learning-based adaptive production control of pull manufacturing
systems. J. Ind. Prod. Eng. 2019, 36, 313–323. [CrossRef]
67. Garza-Reyes, J.A. Lean and green—A systematic review of the state of the art literature. J. Clean. Prod. 2015, 102, 18–29. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.