AI Based Service Management For 6G Green Communications
Abstract—Green communications have always been a target sized area. Data show that Information and Communication
for the information industry to alleviate energy overhead and Technology (ICT) accounts for more than total electricity
reduce fossil fuel usage. In current 5G and future 6G era, consumption as shown in Fig. 1a, and it will keep an estimated
there is no doubt that the volume of network infrastructure
arXiv:2101.01588v2 [cs.NI] 11 Jan 2021
and the number of connected terminals will keep exponentially annual growth rate between 6% and 9% [5], [6].
increasing, which results in the surging energy cost. It becomes Then, what will be the situation for 6G in terms of energy
growing important and urgent to drive the development of green consumption? As we know, 6G is expected to extend the
communications. However, 6G will inevitably have increasingly utilized frequency bands to Terahertz (THz) for 1,000 times
stringent and diversified requirements for Quality of Service of throughput improvement on the basis of 5G [1]. Since the
(QoS), security, flexibility, and even intelligence, all of which
challenge the improvement of energy efficiency. Moreover, the upper bound of transmission range is shortened from 100 m
dynamic energy harvesting process, which will be adopted widely of millimeter Wave (mmWave) to 10 m of THz spectrum,
in 6G, further complicates the power control and network future THz-enabled BS is envisioned to be deployed in the
management. To address these challenges and reduce human house to provide indoor communications [7], which means
intervene, Artificial Intelligence (AI) has been widely recognized significant growth of required BSs. Moreover, besides the com-
and acknowledged as the only solution. Academia and industry
have conducted extensive research to alleviate energy demand, munication purpose for mobile terminals and various sensing
improve energy efficiency, and manage energy harvesting in devices, the computation and content provision services will
various communication scenarios. In this paper, we present the be gradually transferred from local devices to clouds and edge
main considerations for green communications and survey the servers through real-time communications [8], [9], which is
related research on AI-based green communications. We focus one of the main constituent of ICT energy consumption as
on how AI techniques are adopted to manage the network and
improve energy harvesting toward the green era. We analyze in Fig. 1b. Another critical paradigm is the utilization of
how state-of-the-art Machine Learning (ML) and Deep Learning Artificial Intelligence (AI) techniques to provide context-aware
(DL) techniques can cooperate with conventional AI methods information transmissions and personal-customized services,
and mathematical models to reduce the algorithm complexity as well as realize the automatic network management [1],
and optimize the accuracy rate to accelerate the applications in [10], [11]. The growing ICT infrastructure, exploding data, and
6G. Finally, we discuss the existing problems and envision the
challenges for these emerging techniques in 6G. the increasingly complex network management will result in
surging energy consumption, which poses a great challenge for
Index Terms—6G, green communications, Artificial Intelli- the network operators [12], [13]. Data analysis shows that the
gence (AI), energy harvesting.
ICT sector may cost more than 20% of the total electricity [5]
as in Fig. 1a.
I. I NTRODUCTION To alleviate the growing energy burden toward 6G, the
ECENTLY, 5G has been launched to provide users with academia and industry have conducted extensive research.
R high-throughput services in some countries, while the
worldwide researchers have started to conceive 6G [1]–[3].
And the available solutions to address the huge energy
consumption mainly come from two parts: energy-efficient
It has been reported that 5G Base Stations (BSs) and mobile network design [14], [15] and energy harvesting [16], [17].
devices consume much more energy than 4G [4]. For example, Specifically, energy harvesting units, such as the solar panels,
a typical 5G BS with multiple bands has a power consumption wind turbines, and vibration harvester, are widely adopted
of more than 11,000W, while a 4G BS costs less than 7,000W. to convert the various kinds of energy to electricity for
The dramatically increased power consumption mainly comes the communication devices as shown in Fig. 1c. Among
from two parts: the growing Power Amplification (PA) in these energy harvesting techniques, Radio Frequency (RF)
the massive Multiple Input Multiple Output (MIMO) antenna harvesting is an important technique which enables not only
and the processing of booming data. Even though the energy the simultaneous information and energy transmission, but
consumption per unit of data has dropped drastically, the also the utilization of the interference signal. Similar to RF
exponentially increasing energy required to provide seamless harvesting, the Intelligent Reflecting Surface (IRS) is expected
5G services cannot be neglected since the number of required to be widely deployed to reflect the wasted signal to the
5G BSs is at least 4 times that of 4G to cover the same receivers to increase the Signal to Interference plus Noise
Ratio (SINR) [18]–[20]. Some other deployment including the
Bomin Mao, Fengxiao Tang, Yuichi Kawamoto, and Nei Kato are with the
Graduate School of Information Sciences, Tohoku University, Sendai, Japan. satellites and Unmanned Aerial Vehicles (UAVs), are deployed
Emails: {bomin.mao, fengxiao.tang, youpsan, and kato} to provide seamless coverage. For more efficient energy/power
(TWh) 22
(%) Consumer
7800 Energy consumption 20 devices Solar RF signals
6800 18
5800 16
4800 14 Data centers Geothermal
infrastructure Wind
3800 12
2800 10
1800 8
2010 2014 2018 2022 2026 2030 Tide Vibration
(a) Energy consumption of ICT and its share. (b) Energy consumption of dif- (c) Various energy harvesting sources for
ferent parts for ICT. ICT.
Fig. 1: Tendency of energy consumption for ICT and the promising energy harvesting techniques.
management, AI techniques including conventional heuristic • MTC: For the MTC devices most of which are battery-
algorithms, the popular Machine Learning (ML), and state- constrained and difficult to be charged, to alleviate en-
of-the-art Deep Learning (DL) methods, has been adopted ergy demand can be conducted from the access layer
to simplify the traditional mathematical iteration process and and network layer. The research mainly concentrates on
predict the future network changes as shown in Fig. 2. Since the optimization of network access, routing, and relay.
the future network services have diverse requirements instead As energy harvesting has been widely regarded as an
of only the high throughput, traditional mathematical models important technique for future Internet of Things (IoT)
aiming at improving the bit-per-Joule may not be applied to networks, how to manage the networks considering en-
future complex scenarios. To realize the automatic network ergy dynamics is challenging and meaningful.
management toward the green era, AI is the most promising • COC: Computation and storage services will be an
solution. And what we need to do is analyze the various important part of 6G, which is also energy-aggressive
network resources and consider more joint optimizations as as shown in Fig. 1b. For the computation parts, the
shown in Fig. 2. Accordingly, AI techniques are more widely research to reduce energy consumption mainly analyzes
adopted to optimize the power control and resource alloca- the offloading decision computation resource allocation
tion in many works [21]–[24]. In this research, we conduct since each server has a limited capacity. Moreover, the
a survey on AI-related service management for 6G green uneven distribution of computation demand requires the
communications. In the following paragraphs, we introduce optimization of server deployment for the balance of
the motivations, scope, and contributions of this paper. latency and energy consumption. For the Content Deliv-
ery Networks (CDNs), the content caching and delivery
A. Motivation policies directly affect energy consumption.
1) Energy-related Issues for Different Network Services: 2) Limitations of Conventional Methods: To alleviate en-
Similar to the 5G which has defined three kinds of ser- ergy demand and improve energy efficiency is usually very
vices including the eMBB (enhanced Mobile Broadband), complex since it is not only concerned with the power control,
uRLLC (ultra-Reliable and Low-Latency Communications), but also related to many other factors, such as transmission
and mMTC (massive Machine Type Communications), some scheduling, resource allocation, network design, user associ-
researchers have also considered service definitions in 6G [1]. ation, and so on. Thus, the formulated problem considering
Among these different service definitions, we expand our multiple related factors is non-convex or NP-hard [22], [48],
introductions from three typical communication scenarios: [49]. And the conventional mathematical approach is to iter-
Cellular Network Communications (CNC), Machine Type ative search the global optimum result or divide into two or
Communications (MTC), and Computation Oriented Commu- multiple sub-problems and search the sub-optimal point [50],
nications (COC). [51]. However, due to the increasing factors necessary to
• CNC: Since the majority of energy consumption for cel- be considered, the solution space is significantly huge, re-
lular networks comes from the BSs, the related research sulting in low convergence or extreme difficulty in finding
on green CNC mainly focuses on the deployment and the global optimum. Moreover, since 6G network services
configurations of BSs. To optimize energy efficiency of have more diversified requirements for throughput, latency,
CNCs, the deployment and work states of the BSs should and reliability than 5G, common mathematical optimization
be carefully analyzed and scheduled. Moreover, for the methods focusing on the maximization or minimization of a
working BSs, the power control and resource allocation single metric is not enough. Furthermore, the nonlinear and
are critical to improving the system throughput with unclear relationship among multiple parameters necessary to
minimum energy consumption. Furthermore, the energy be considered makes the mathematical models difficult to be
harvesting technology can be also considered to alleviate constructed. Additionally, node mobility and service changes
the grid electricity demand of BSs. lead to increasing network dynamics, which may result in
frequent failures of conventional methods. to explore the complex relationship among different network
parameters through trial and error [53]. In current years, the
3) Advantages of AI Methods: Compared with conventional ML/DL methods have been widely used to learn the power
methods, AI techniques including the traditional heuristic control and resource allocation policy [21], [49], [54], [55],
algorithms, ML, and the currently popular DL approaches which greatly alleviate the difficulty in manually studying
have significant advantages. AI techniques aim to solve the the complex relationships and constructing the mathematical
problems in a naturally intelligent manner [52]. Thus, it can try
Heuristic Machine Deep model the neurons in a biological brain [93]. Each artificial
Algorithms Learning Learning neuron can process the received signals with some non-
• Supervised/semi-supervised linear functions and then transmit the result to neurons in
• Particle Swarm Optimization • Supervised learning • Unsupervised learning
• Genetic Algorithm • Unsupervised learning • Deep reinforcement learning
the next layer through the weighted edges. Thus, the final
• Ant Colony Optimization • Reinforcement learning • Federated learning
• Simulated Annealing • Semi-supervised learning • Transfer learning
output of each ANN depends on not only the input signals,
• Imitation learning but also the utilized non-linear functions and edge weights.
Increasing accuracy with growing complexity In recent decades, the ML/DL models have developed fast
on the basis of ANNs, which can be summarized into three
Fig. 4: Development of AI Techniques aspects. First, the most obvious development is the increased
number of layers, which result in the deep architectures from
traditional shallow ones. Thanks to the breakthrough in the
forcement Learning (RL) will be introduced in the next training algorithm [94] as well as the hardware developments,
subsection. current DL models can have very complex architectures while
Regression Analysis: This method is mainly utilized to keeping a extremely high accuracy rate, which enables them
analyze the relationship between two or among multiple to be adopted in very complicated scenarios and overwhelm
parameters. The most common application is to map from the humans in some applications, such as the board game [64].
input parameters to the output results with the labeled dataset Second, connection manners become more complex. Besides
and a cost function is usually defined to evaluate the accuracy the full connections among neurons in adjacent layers for most
rate. According to whether the output is linear or binary, the ANNs, the partial connections have also been utilized in some
regression analysis can be divided into linear regression and modern ANNs, such as the Convolutional Neural Networks
logistic regression. Regression analysis plays an important role (CNNs) [95], which enables the flexible processing of the input
in green communications. For instance, the linear regression where features are not distributed everywhere. And part of the
can be utilized to predict future traffic changes, which is output can be also further input the learning models, such as
further adopted to determine the energy-efficient transmission the Recurrent Neural Networks (RNN) [96], to generate the
schemes, resource allocation, and computation offloading [15]. time-consecutive variables. Third, researchers have developed
Support Vector Machine: SVM is adopted to analyze data the models to concurrently utilize multiple ANNs to cooper-
for classification and regression analysis in a supervised learn- atively complete one task, such as the Generative Adversarial
ing manner [85]. An SVM utilizes a set of orthogonal vectors Network (GAN) [97] and Actor-Critic (AC) method [98]. The
to define a hyperplane or a set of hyperplanes to separate two ANNs can have the same or different structures while
the training data point. And the best hyperplane is the one act different roles. Forth, the techniques such as the different
that has the largest distance to the nearest training data in any activation functions, data processing methods, and attention
class. The SVM can be adopted for high-dimensional problems mechanism significantly improve the accuracy rate of current
and suitable for the small dataset. In green communication ML/DL structures.
management, SVM has been applied to solve the problems
like user association [67] and computation offloading [87].
C. Future Perspective AI Learning Methods
K-means Clustering: This method aims to partition multiple
observations into several clusters in which each observation Besides the development in the ML/DL structures, the
belongs to the cluster with the nearest center [86]. As an learning methods also critically affect the accuracy rate and
unsupervised learning method, this technique repeats the pro- computation performance. Future networks will consist of
cess to assign the nodes into different clusters and update the more complex scenarios and dynamics, which drives us to
cluster center. To evaluate the assignments, a cost function consider more advanced AI learning methods. In this part,
based on the distance between the nodes and the cluster center besides the traditional supervised learning and unsupervised
is defined. K-means clustering is efficient to cluster the users learning, we focus on three AI learning methods which will
and associate them to suitable BSs for saving energy [88]– definitely attract more attention as shown in Fig. 4.
[90]. It can also be applied to the optimization of cloudlet 1) Deep Reinforcement Learning: RL is the dynamically
placement [91]. learning through trial and error to maximize the outcome.
In an RL model, the essential components are the environ-
ment, a defined agent, the state space, the action space, and
B. Development of Deep Learning Models reward [99]. In the studied environment, the agent chooses an
Since the common ML/DL models and three training man- action according to the current state, and then gets rewarded
ners shown in Fig. 4 have been introduced in many works [82], for the correct action or penalized for an incorrect one. In the
[92], we just give some discussion about the development of training process, the agent follows the existing experience or
ML/DL models which have been utilized to improve energy explores a new action with a certain probability in order to
efficiency. maximize the reward. In the traditional RL model, a table is
Most of the current ML/DL models are developed from usually utilized to store the Q value which is the expected
Artificial Neural Networks (ANNs) which can be also termed accumulated reward for different actions at each state. The
Neural Networks (NNs). ANN is constructed by layers of training process is to fill in the table, which can guide
interconnected units named "artificial neurons", which is to future action selection. However, with the studied problem
adopted to map from the state to the corresponding action,
Final result:
which is the main concept of Deep Reinforcement Learning
… BS switching
(DRL) [99]. Another advantage is that this method enables
an agent to generalize the value of states it has never seen …
before or just has partial information. Due to these advantages, Input:
(a) Two-step strategy
it has been witnessed that DRL has attracted more attention Traffic trace,
user mobility,
to improving energy efficiency through optimizing the BS CSI, …
management [100], resource allocation [101], [102], power ⋯
control [21], [103], and computation offloading [23], [24], Output:
[104]. … BS switching
2) Transfer Learning: Transfer learning is a machine learn-
ing method which aims to utilize the constructed knowledge …
system while solving a problem to the different but related (b) One-step strategy
problem [105]. Different from traditional ML models which
learn the knowledge from zero, what is necessary to do for the Fig. 5: One-step and two-step AI-based BS switching strategy
new application in related problems is fine-tune the new model
based on existing knowledge system or train part of it. Thus,
transfer learning can significantly reduce the computation con- we give more detailed explanations about how these meth-
sumption and required training data, resulting in extended and ods to realize green communications in different scenarios.
accelerated applications. As the network changes frequently It should be noted that some important AI techniques are
due to the mobility and transmission environment changes, not introduced in this section, but they still have promising
transfer learning is widely considered to address the similar perspectives, such as imitation learning [113] and quantum
scenarios [106]–[109]. On the other hand, the application machine learning [114].
range of the existing knowledge system as well as the balance
between training and performance in target scenario are hot III. C ELLULAR N ETWORK C OMMUNICATIONS
topics and require more attention in existing research [108]. Energy consumption of cellular networks comes from the
3) Federated Learning: Federated learning is a decentral- radio access part and the core part [31]. Some practical
ization method by utilizing the distributed servers or devices measurements of energy consumption of cellular networks
to train and test AI models with the local data [110], [111]. have been reported in [31], [115]. And the data illustrate
Thus, the edge servers or devices can keep the training data that the BSs account for more than half of the total energy
locally and just need to upload the obtained parameters to the consumption, in which more than 50% to 80% is utilized for
central controller. What the central controller needs to do is the power amplifier and feeder. With the utilized frequency
collect and integrate the parameters of AI models. And then band extended to sub-THz and THz in the 6G era, the coverage
the edge devices can download AI models to make predictions of single BS further shrinks [1], [116]. Then, the required
or conduct periodical update. Since personal privacy arouses increasing number of BSs to realize seamless coverage is
increasing concern recently, the federated learning technique expected to consume more energy. Therefore, green com-
will attract growing attention in 6G. Moreover, the cooper- munication research for cellular networks mainly focuses on
ative training and running manner of federated learning can BSs. In this section, we first introduce the power consumption
efficiently utilize the idle computation resource and reduce and energy efficiency modeling of cellular networks and then
the consumption in the central controller. Furthermore, the explain the related AI-based approaches to realize the green
uploading of parameters instead of training data results in communications from different perspectives.
reduced communication overhead [24], [87], [112].
A. Power Consumption and Energy Efficiency of Cellular
D. Summary Networks
From the above introduction, we can find AI techniques According to our above introductions, we mainly focus on
have various application scenarios and should be chosen the Radio Access Network (RAN) part consisting of BSs
according to definite problems. And with the development of and access terminals. We introduce the power consumption
computation hardware, DL techniques have attracted growing modeling of BSs and the metric "bit-per-Joul" to measure
attention to solving more complex problems. However, this energy efficiency for both the BSs and access terminals.
does not mean that the traditional AI techniques such as 1) Power Consumption Modeling of BSs: The power con-
heuristic algorithms and shallow ML models are not suitable sumption of a BS consists of four part: power supply, signal
anymore. Since many traditional AI methods have much lower processing, air conditioning, and the power amplifier [31].
computation complexity compared with DL, they are suitable Since part of the power consumption is constant for BSs
for some resource-limited scenarios. In the following paper, at sleep and idle states while the other part is relevant to
Network information
the workload, energy consumption of a BS can be usually
summarized as [117]:
Macro eNB Data processing
Besides the BS deployment planning, the coverage design is the BSs with low usage may be the easiest solution. The main
an important factor to affect the required number of BSs and concern to switch off some BSs is the potential deterioration
network performance. Assuming the deployment is done with- of QoS. To alleviate the concern, the accuracy rate of traffic
out detailed cell planning, Ho. et al [134] utilize the GA [77] prediction affects network performance in terms of energy
method to adjust the femtocell coverage in order to optimize saving and QoS. Gao et al. [137] compare multiple ML
the three network metrics: coverage holes, coverage leakage, models including Auto-regressive Integrated Moving Average
and load balance. In this paper, the authors consider three (ARIMA) [138], prophet, random forest, LSTM, and ensemble
metrics including coverage holes, coverage leakage, and load learning in terms of accuracy rate, speed, and complexity.
to define the fitness function for the evaluation of considered Then, these models are utilized in traffic prediction. The
solutions during the evolution process. To overcome unknown prediction results are further utilized to calculate energy ef-
network dynamics and user mobility, the online learning ficiency. Thus, some BSs can be switched off if the Key
method based on periodical updates with real-time network Performance Index (KPI) is below the predefined threshold.
measurements is adopted. In their proposal, the hierarchical Similarly, Donevski et al. [139] utilize two kinds of NNs,
Markov Models (hMMs) [135] are used to capture the behavior including the dense NN and RNN to predict the future traffic
and generate the load trace of each femtocell with a high of Small Base Stations (SBSs) according to the previous trace.
accuracy rate. Then, the results can be used to calculate the Then, a threshold is defined to decide whether the SBS could
fitness. And the evolution process is illustrated to provide the be switched off or kept on. Another unified strategy is given by
continuous performance improvement. directly utilizing the traffic trace to predict the BS switching
Similar to [78], [134], Moysen et al. [79] also combine scheme as shown in Fig. 5. It should be noted that the threshold
the GA and ML in the design of cellular networks. In their in this proposal is adjustable to achieve a balance between the
research, the SVM [85] is trained offline as a QoS regressor coverage loss and efficiency loss. Simulation results illustrate
with the collected data including the Reference Signal Re- that energy consumption can be reduced by 63%, while more
ceived Power (RSRP) and Reference Signal Received Quality than 99.9% of requests can be satisfied.
(RSRQ) coming from the serving and neighboring eNBs. Different from the above scenarios which only consider two
Then, in the online phase, the GA algorithm is utilized to work states, Pervaiz et al. [140] analyze the switching policy
generate the feasible solutions consisting of the configuration for the multi-sleep-level-enabled BSs in a two-tier cellular
parameters of eNBs. And then the UE measurements for each network. The machine learning technique is utilized to decide
feasible solution is utilized as the input of SVM, of which the best sleep level of SBSs, while the users keep connections
the predicted QoS result is adopted to calculate the fitness with the Macro Base Stations (MBSs). Specifically, the SVM
function. With the goal of minimizing the PRB per transmitted regression model is considered to predict the vacation period
Mb, the improved BS configuration set can be found through and operation time of the SBSs according to historical network
the iterations of GA. The case study illustrates the proposed traffic profile. Then, the prediction results are analyzed along
model can enable the operator to find the appropriate deploy- with energy consumption and latency to decide which sleep
ment layout and minimize the required resources. level the SBS should be switched to. It should be noted that the
From the above research, it can be found that the deploy- SVM utilized in this paper can be replaced by other regression
ment policy is usually found by iterative algorithms, such models.
as the GA, while the supervised learning-based training is The above research works utilize the historical traffic profile
adopted to predict the multiple network parameters as the input to efficiently train the ML models in a supervised manner.
of GA or evaluate the fitness function as shown in Fig. 6. Researchers have also proposed the approaches to combine
The combinations of the heterogeneous algorithms and ML as the RL and transfer learning to increase the flexibility and
shown in Fig. 6 can cooperatively improve the performance accelerate the convergence. Authors of [106], [107] consider
of the proposed model. Since the DL has shown improved the RL agent to select the BS work modes for system power
accuracy rate and more advanced policy searching ability, it minimization according to the traffic patterns. Moreover,
is highly expected to witness the application of the prevalent transfer learning [105] is exploited to use the past learning
DL techniques in the BS deployment design. experience in current scenarios, which can accelerate the
2) Work State Management: As the network traffic is learning process. However, these two research works [106],
dynamically changing due to user mobility, the multi-tier BSs [107] neglect the QoS even though the authors consider the
can be scheduled to switch on and off to reduce energy user association policy after switching off some BSs. To solve
consumption [136]. If the work state of the BS is changed, the this problem, in [141], the cost function of the RL model is
user association information should be adjusted accordingly to defined as an adjustable combination of energy consumption
ensure a qualified connection. Therefore, the work state of BSs and service delay instead of only energy consumption [106],
should be scheduled carefully to minimize energy consumption [107]. Consequently, their proposal can not only reduce energy
as well as meet the QoS requirement. consumption, but also guarantee the diversified QoS require-
Since the users’ daily movements contribute to the similar ments. Additionally, the transfer learning technique is utilized
changing tendency of the traffic patterns, the correlation be- to accelerate the convergence of the considered AC model [98].
tween the current traffic data and historical experience can be Another similar research [142] also combines the RL and
utilized to design the BS switch on/off policy [106], [107]. To transfer learning to design the BS switching policy. In this
predict the future traffic with a historical profile and switch off proposal, the learned knowledge for spectrum assignment is
transferred to the process of user association. chooses an action, a user association scheme can be found
Deep Q-learning (DQL) technique has also been applied by relaxing the load balancing problem to a convex problem.
to design the BS switching policy based on the network Then, the Q-value based on the heterogeneous network (Het-
traffic in [143]. Different from the research [106], [107] which Net) power consumption can be calculated to evaluate the pair
directly utilizes the traffic pattern, authors in [143] consider a of BS activation and user association scheme. By iterating the
traffic modeling module to iteratively fit an Interrupted Possion process until the threshold is reached, the best scheme which
Process [144] and predict the next traffic belief state. Since jointly optimizes the load balancing and energy efficiency can
the traffic model is learned in an online fashion, it can capture be obtained. Results illustrate the significant improvement of
the complex dynamics of real-world traffic, which allows the the network performance and energy efficiency.
adopted DQL model to output more accurate action. The 3) User Association and Load Balancing: To switch the
adopted Deep Q-network (DQN) decides the sleeping policy idle BSs to sleep or off mode may result in the overloaded
according to the output brief state of the traffic modeling usage of nearby working BSs, which further leads to the QoS
module. And the reward function is defined as the sum of deterioration. To strike the balance between energy efficiency
the operation cost and the service reward. To enhance the and QoS, AI-based user association schemes have been stud-
original DQN model, a reply memory storing a certain amount ied.
of past experiences are utilized in the training step as a Zhang et al. adopt the QL technique to decide the user
bootstrapped estimation of true distributions. And the stable offloading policy to reduce energy consumption as well as
parameters are stored by a separate network to avoid the improve network throughput [149]. In this paper, the authors
training oscillations and divergence. The authors also apply consider that part of the connected users for each SBS can
adaptive reward scaling to match the network outputs. Even be offloaded to neighbor SBS or MBS in the multi-tier
though the research neglects the mutual effects among BSs, Ultra Dense Networks (UDNs). In this way, the idle SBS
the proposed model is suitable for BSs with different traffic can be turned to sleep or off mode, while the overloaded
patterns. And the experiment with a network simulator and SBS can be alleviated to ensure the provided services. The
dataset illustrates the advantages of the proposed model over proposed QL model aims to solve the problem of how much
other ML algorithms. workload of each SBS can be offloaded to other BSs. The
In the above research, to switch off some BSs in low usage state space includes the load of studied cell and neighbor cells
on the one hand reduces energy consumption, on the other as well as the proportion of users who could be offloaded.
hand sacrifices some network performance due to the resulted And to guarantee energy saving performance and network
coverage hole. Therefore, the proposed AI approaches usually throughput concurrently, the reward function considers the
define a weighted sum of energy consumption and QoS as EE, throughput, and the load difference among the cells.
the reward or cost function to reach a balance [140], [141]. The authors also utilize the mean normalization method to
To address the QoS sacrifice physically, Panahi et al. [145] eliminate the sample difference of the considered factors to
consider the heterogeneous scenario where the Device-to- define the reward function.
Device (D2D) technique is utilized to relay the messages The authors of [117] combine the game theory and RL
toward working BSs. To decide the work state for each MBS technique to solve the user association and Orthogonal Fre-
and Femtocell Base Station (FBS), the authors propose the quency Division Multiple Access (OFDMA) tile assignment.
Fuzzy Q-learning (FQL) algorithm which combines the Q- Specifically, each player is treated as a player to choose
learning (QL) and Fuzzy Interference System (FIS) [146], the heterogeneous NodeB (hgNB) considering the potential
[147]. In the model, the FIS is utilized to map the relationship profit and the effects on other players. Since the combinatorial
between the input energy efficiency as well as the service problem can result in the huge size of potential solutions, the
success probability and the switching policy. In the QL model, authors propose two RL approaches to intelligently guide the
the reward is defined as the weighted probability of a D2D link search: the regret learning-based algorithm and the fictitious
success probability, while a threshold of cellular link success play-based algorithm. In the former one, the Q value is defined
probability is adopted to decide whether the reward is positive according to the regret which is interpreted as the difference
or negative. With the reward function, the 𝜖-greedy algorithm between the actual payoff the agent realizes and the potential
allows to explore and exploit the potential switch on/off poli- payoff if another HeNB is chosen. In the latter one, the agent
cies until convergence. Even though every MBS/FBS decides reinforces a strategy considered the payoff calculated on the
the switching scheme, the control functionality including the empirical frequency distribution of the opponents.
initialization and termination of the optimization process is Wang et al. [54] utilize the ML techniques to predict the
deployed in a central entity. And after each state transition potential traffic burst and then conduct the traffic-aware vehicle
process, MBSs and FBSs receive the overall shared reward association. In their proposal, the supervised learning model is
determined by the central entity, and uses it to update the Q adopted to analyze the statistical correlation between past and
value to avoid the local selfish optimization. present traffic. And online learning is adopted with the goal of
Lee et al. consider the joint cell activation and user associa- minimizing regret instead of loss. In the proposed architecture,
tion for load balancing and energy saving in their work [148]. every AP performs independent traffic prediction, while the
The authors adopt the QL method. Specifically, each BS is central coordinator conducts the global traffic balance. Since
treated as an agent, while the state and action are current the vehicles are traveling across the APs, the traffic changes
activation variable and mode, respectively. Once each BS in adjacent cells are correlated. Thus, the traffic prediction of
each AP is based on the historical data rates and association conditions and different service types, and then adopt the
information of neighboring APs. Once the central coordinator transfer learning technique to only fine-tune the last a few
obtains the traffic forecast results, it can proactively update the layers of the structures through backpropagation process as
BS configurations to change the user association information. shown in Fig. 7. For the non-stationary wireless channels, the
Thus, some BSs can make preparations for the coming traffic first FNN in cascaded structure only needs to fine-tune the last
burst, while other BSs can be switched to off mode. a few layers with a small number of data samples as shown
in Fig. 7a. On the other hand, for the reason that the channel
distribution which is the input changes, all layers of the second
C. Power Control and Resource Allocation
FNN need to be fine-tuned. Moreover, the authors mention that
According to Equation 3, to improve the system energy to fine-tune the last a few layers can be also applied when the
efficiency, the transmit power control and resource alloca- service type changes. For instance, the parameters of last a few
tion which affects the interference is critical. Since the ul- layers of the cascaded FNN using for delay-tolerant service
tra massive Multiple-Input Multiple-Output (MIMO), Non- can be fine-tuned to fit the delay-sensitive or URLLC services
Orthogonal Multiple Access (NOMA), and beamforming tech- as shown in Fig. 7a. Furthermore, if we consider multiple types
nologies will be important techniques in 6G [1], we will of services exist, the authors propose a structure as shown
introduce the power control for these parts as well as the in Fig. 7b, where a few layers are just cascaded at the end
general power control issue. of FNN for each service. In this way, we can only fine-tune
1) General Power Control: The transmit power of BSs the parameters of the newly-added layers with a few training
affects the received SINR at the targeted receivers as well samples.
as interference for users in neighboring cells. Thus, the op- Mattiesen et al. [49] utilize the ANN to determine the
timization of energy consumption is also jointly considered transmit power according to the channel states. The research
with interference mitigation through the transmit power con- goal of their proposal is to optimize the weighted sum energy
trol. In [21], [55], Zhang et al. utilize the RL technique to efficiency, which is a non-convex problem. To solve this
optimize the transmit power for alleviating the interference problem, they first propose an improved Branch-and-Bound
in neighboring cells according to the received SINR and user (BB) based algorithm to obtain the global optimum solution.
density. In their proposal, for each transmit power level, every Then, the results obtained with this method can be further
target BS is assumed to obtain a defined utility according to utilized to train the ANNs in a supervised manner. Since
the received SINR at the target users, energy consumption, the training is conducted offline, the ANN can be trained
and interference to non-served users. Then, the Q-value can with a large dataset generated by the proposed BB-based
be defined according to the utility to measure the overall algorithm to achieve global optimal performance. And the
performance of the transmit power level. With the Q-function, online calculation of the transmit power based on the ANN
the target BSs apply the 𝜖-greedy policy to determine the is illustrated to be robust against mismatches between the
optimal transmit power level. The performance illustrates the training set and real dataset conditions.
reduced energy consumption and interference as well as im- Liu et al. [150] study the power allocation in a distributed
provement of network throughput. In [21], the authors further antenna system and utilize the KNN model to optimize the
proposed a CNN based DRL model to map from the network spectrum efficiency and energy efficiency. In this paper, the
states including the received SINR, user density in the target single-cell distributed antenna system with multiple Remote
cell, and estimated channel conditions in neighboring cells, to Access Units (RAUs) is considered and the transmit power of
the transmit power level. The performance illustration shows the RAUs should be optimized. However, the research purpose
that the DRL based method can further improve the network is not for further improvement over traditional methods. On
performance in terms of energy consumption, throughput, and the other hand, they target on solving the high computation
interference. Another important advantage is that the DRL overhead of existing methods and hope to utilize the KNN
method converges much faster than the RL based strategy. to map the relationship between the user location and power
Dong et al. [108] utilize the Fully-connected NN (FNN) and allocation with the assumption of available Channel State
cascaded NN to optimize the transmit power and channel allo- Information (CSI) and orthogonal channel resource. Thus, they
cation aiming at minimizing the network energy consumption utilize the traditional method to obtain some data samples for
considering the various service requirements. In this paper, the training the KNN models. In the running phase, Euclidean
arrival rates of services and packets are considered as input. distance between users in the testing and training groups are
For the FNN, the transmit power and channel allocation are calculated. And the same power of the nearest neighbor in
adopted as the output. Since the transmit power is a continuous the training samples is copied to the user in the test group.
parameter while channel allocation has discrete values, the The final performance analysis shows the KNN can achieve
quantization error in the output layer cannot guarantee the near-optimal performance.
optimal solution even though the DL structure is supervised The power control for multi-layer HetNet is more com-
trained with the labeled data generated by global optimization plex and difficult to reach the global optimum. Zhang and
method. To solve this problem, the authors consider the Liang [103] propose a multi-agent-shared-critic DRL method
cascaded FNN structure where the first FNN is to predict conducted in the core network. Specifically, in the core net-
the channel allocation and the second for power control of work, an actor and target actor DNN are trained for every
each user. The authors also analyze the non-stationary channel BS, while a shared DNN pair acts as the critic and target
Service 2
Input Output
Learned knowledge
Target task Fine-tuned layers
… Input
Target scenario
Service n
Input Output Input Output
(a) The transfer learning model for non-stationary wireless channels. (b) The transfer learning model for multiple types
of services.
Fig. 7: The transfer learning techniques for dynamic channel conditions and multiple service types.
critic. The actor DNNs are trained with redundant experience, 2) Beamforming: Adaptive beamforming is an important
then share the weight parameters with the corresponding local technology to adjust the directionality of the antenna array
DNNs. The local DNNs can calculate the transmit power with to enable highly directional transmissions in densely pop-
the real-time local data. To avoid the problem of involving the ulated areas. Through the adaptive beamforming technique,
local optimum, the core network utilizes the global experience the network performance of the hotspot can be significantly
to train the critic DNNs. Li et al. [151] combine the graph improved, which further results inincreased energy efficiency.
theory and RL technique. In this research, the conflict graph However, the hotspot areas are not fixed due to the dra-
constructed according to the received SINR by the users is matically changing user distribution caused by the lifestyle
utilized to dynamically cluster the cells in order to optimize and habits. In [125], Liu et al. utilize the LSTM to extract
the channel allocation. To optimize the power control in cell the spatial and temporal features of UE distributions from
clustering, the RL technique is utilized where the SBS acts as the history dataset and detect future hotspots. Based on the
the agent. The state space consists of the interference set and location information of predicted hotspots, hybrid beamform-
RSS, while the reward is defined according to the throughput ing which combines the digital and analog beamforming
and interference. techniques at the MBS can be adjusted to minimize the total
power consumption. Specifically, in the analog beamforming
With the extension of utilized frequency bands to THz, the design of massive MIMO systems, the phase shifter can
propagation loss and penetration loss will become increasingly be adjusted to maximize the large array gain. For hybrid
serious. To solve this problem as well as keep satisfied cover- beamforming, the optimal power allocation and beamforming
age, the radius of future THz-enabled BSs will be limited to directions can be found by converting the original problem
10 meters. Thus, the power control to mitigate the interference into a convex one. The final results also illustrate the reduced
in an indoor network will attract increasing attention. Authors energy consumption.
in [152] propose the QL-based distributed and hybrid power Du et al. jointly optimize the cell sleeping control and
control strategies to optimize the network performance in beamforming operation by DNN models in [153]. The au-
terms of throughput, energy efficiency, and user experience thors firstly model the power minimization problem through
satisfaction. For the BSs without mutual communications, each joint cell sleeping and coordinated beamforming. And the
BS acts as the agent to determine the power for each Resource formulated power minimization problem is constrained by the
Block (RB) in a selfish manner. On the other hand, if a central required SINR and maximum power threshold. To alleviate
controller is provided, it conducts the QL model to decide the the computation overhead of the numerical method for large-
transmit power for each BS. In these two methods, the state scale scenarios, the authors consider the DNN models to map
is the received SINR level and current transmit power level, the relationship between the channel coefficients and beam-
while the action is the power level that can be assigned to forming vectors. And the numerical method can be adopted
each RB. The reward functions are defined according to the to generate the training data which are further utilized to train
throughput. the constructed DNN models. To illustrate the performance,
the no sleep control and equivalent association strategies are the optimization of the hybrid analog beamformer. The Cross-
compared. The final results show the DNN-based method can Entropy (CE) function is adopted to evaluate the obtained
achieve obvious advantages in terms of power saving and system sum-rate corresponding to each randomly generated
satisfactions of QoS demands. hybrid analog beamformer. In the beam-steering optimization
The authors of [154] consider the manifold learning [155] process, the MFG framework is adopted, where the beams act
and K-means method [88] to cluster the multi-cell users into as the agents and information interactions are converted into
several regions and reduce the complexity of the considered the interactions with the mass. Considering the conventional
massive MIMO operation. In the two-tier massive MIMO numerical methods require a large dimension of action and
system, the interference mitigation and MIMO hybrid precod- state spaces, the RL technique is adopted to solve the MFG.
ing process are challenging due to the large channel dimen- Specifically, the state is defined as the combination of index
sionality and high complexity caused by the large antenna offset of antenna elevation and azimuth angles, while the
count. To alleviate the computation overhead, the authors actions represent the beam selectable path, elevation Angle
first utilize the maximum-minimum distance-based K-means of Departure (AoD), and aimuth AoD. The reward function
method to cluster users into different groups. Thus, with the is defined according to the obtained system rate. Through the
manifold learning, the nonlinear high dimensional channel QL process, the optimal action can be chosen.
coefficients can be transformed into the linear combinations 3) MIMO: In the distributed massive MIMO systems, the
of neighborhood channel coefficients, resulting in significant pilot sequences transmitted by users are usually adopted to
dimension reduction of the channel matrix while keeping estimate the CSI. However, the pilot contamination caused by
the original geometric properties of the underlying channel the adopted same orthogonal pilot sequences affects the chan-
manifold. Furthermore, the two-tier beamformers are mainly nel estimation accuracy. To alleviate the pilot contamination,
characterized by the distribution of low-dimensional manifolds the power allocated to each pilot sequence is important. Xu
and split into outer beamformer and inner beamformer, which et al. design an unsupervised learning method to predict the
are utilized to minimize the inter-cell interference and multi- power allocation scheme according to the large-scale channel
user intra-cell interference. The final results illustrate the fading coefficients [159]. In their research, the authors consider
improved SINR and reduced computation complexity. the Minimum Mean-Square Error (MMSE) channel estimator
Beamforming is also jointly optimized with some other and formulate the problem as the sum MSE minimization.
network factors to improve energy efficiency, such as the relay Then, a DNN is exploited with the channel fading coefficients
operations. Zou et al. [156], [157] adopt the DRL technique and power allocation as the input and output, respectively.
to improve the multi-antenna Hybrid AP (HAP) beamforming With the loss function defined by the sum MSE of channel
strategies and RF-powered relay operations. In their consid- estimation, the training process enables the DNN to map
ered scenario, the individual relay can forward or backscatter nonlinear relationship from channel fading coefficients to the
the signal to improve the received SINR. Moreover, the relay optimal pilot power allocation. Similarly, the authors of [14]
needs to harvest part of the received power to keep continuous consider the same input and output for the designed Deep
working states. Then, a hierarchical Deep Deterministic Policy Convolutional Neural Network (DCNN). The authors focus
Gradient (H-DDPG) model is proposed to select the relay on the maximum sum rate problem in limited-fronthaul cell-
mode and optimize the parameters including the beamforming free massive MIMO. And a heuristic sub-optimal approach
vector, power splitting ratio, and reflection coefficient in order is proposed to obtain some data samples, which are to train
to maximize the SINR. Specifically, the considered model sep- the DCNN model. Another similar research [160] is to utilize
arates the studied problems into two sub-problems. The DQN the ANN to map from the users’ positions or shadowing
model is utilized in the outer loop to select the relay mode. coefficients to the power allocation vector. All of these re-
Once the relay mode is selected, the channel conditions, which search works have verified the advantages of DL techniques
can be used by the AC networks [98] of the inner loop Deep over traditional mathematical models in terms of the power
Deterministic Policy Gradient (DDPG) to generate the actions, allocation in massive MIMO systems.
representing the values of beamforming and relay operation Intelligent power control has also been considered to sup-
parameters. To accelerate the convergence of the conventional press the attack motivation for more secure communications
DDPG model caused by the random initialization of double Q- of MIMO transmitter in [161]. In the considered scenario,
networks, an optimization model is developed to approximate the malicious attacker can choose different attack modes
the original problem, which can estimate a lower bound of including jamming, eavesdropping, and spoofing according to
the target value. The simulations show the improvement of the potential reward. The authors combine the game theory
the final reward value and convergence speed compared with and RL to control the power of MIMO transmitter for the
the model-free DDPG method. Moreover, the H-DDPG-based suppression of the attack motivation considering the required
framework can significantly improve throughput. EE. Specifically, a game model is formulated between the
Since the UAVs are usually adopted as the flying BSs, AI MIMO transmitter and the malicious attacker. And the RL
technique has also been applied for the UAV-enabled cellular technique is adopted to derive the optimal power control and
networks. Li et al. [158] combine the ML and Meaning Field transmission probability to reach the Nash Equilibrium (NE) in
Game (MFG) techniques to jointly optimize the beamforming favor of the MIMO transmitter. The final results illustrate the
and beam-steering to maximize the system sum rate. In the improvement of transmission secrecy performance and energy
considered scenario, to optimize hybrid beamforming lies in efficiency.
The authors of [162] and [48] utilize the CE-based algo- two-side matching algorithm, is utilized to generate some
rithm to solve the hybrid procoding problem in mmWave labeled data samples, which cooperate with the unlabeled
massive MIMO systems. Specifically, the CE-based algorithm data to train the NN in a semi-supervised learning manner.
is adopted to update the probability distribution of the analog The authors consider the co-training semi-supervised learning
beamformer in the iteration process and then the "elite" analog model [165], where two NNs are trained with the data from
beamformer which can result in minimum total transmit power different views to produce the optimal learner. The input
can be found. Moreover, the authors of [48] adaptively weight and output of the NNs are the channel gains and allocation
different elites according to their objective values, which can strategies, respectively. Since the classification with unlabeled
further improve the performance of CE-based algorithms. The data still depends on the labeled data, the authors select the
simulations in the two paper verify the CE-based hybrid highly confident labeled data with the most consistency. To
precoding scheme can improve energy efficiency of mmWave optimize the power allocation, the DNN is trained with the
massive MIMO systems with low complexity. labels generated by an iterative gradient algorithm.
Different from the above research focusing the intelligent User clustering is an important factor to improve energy
power control in the massive MIMO system, the authors efficiency for NOMA-enabled multi-tier cellular networks.
of [163] propose a DL-based user-aware antenna allocation Zhang et al. [89], [90] adopt the K-means clustering to cluster
strategy. In their research, the LSTM model trained with the the users in Thz MIMO-NOMA systems. In their research, the
real dataset is adopted to predict the variations of future users are separated into different clusters of SBSs and MBS
associated users for the massive MIMO-enabled BSs, which is in the coverage. Since the THz transmission is challenged
similar to the applications of DL in traffic forecast [15], [137], by the severe path spreading loss and molecular absorption
[139]. Based on the prediction results, the optimum number loss, a suitable clustering scheme can improve the channel
of BS antennas are allocated to maximize the EE. quality and suppress the interference, resulting in higher SINR
4) NOMA: The NOMA technique introduces an extra and transmission throughput. Then, the authors propose an
power domain to enable multiple users to be multiplexed enhanced K-means strategy to cluster the users. To overcome
on the same channel resource [101], which can improve the the fluctuation with different initial clustering centers for
network capacity and resource efficiency. Thus, the resources the conventional K-means method, the authors calculate the
including the power and channels are usually considered as the channel correlation parameters of different cluster heads and
key metrics to be optimized for network performance improve- choose the one that maximizes the metric. And the MSE
ment. In [101], the authors first utilize the DRL technique to analysis clearly verifies the improved convergence compared
conduct the channel assignment for alleviating the computation with the conventional K-means method.
overhead of conventional methods due to the huge solution
space. In the proposal, each BS acts as the agent, while the
D. Energy Harvesting-enabled Base Station
NOMA system is regarded as the environment. The attention-
based NN is adopted to model the channel assignment policy, Motivated by the concern for climate change and inspired
with the encoder computing the embedding of state space by the development of energy harvesting, the renewable energy
and decoder outputting the probability distribution over all resources have been considered to alleviate the requirement for
states. Once a channel assignment solution is obtained, the the power grid. On the other hand, the dynamics of renewable
corresponding power allocation can be calculated. Then, the energy resources complicate the management and operation
derived system performance is further utilized to define the of cellular networks. AI techniques have been widely studied
reward function. And the training process enables the proposed to track the dynamic harvesting source and optimize network
NN to find the optimal channel assignment according to the operations.
system states with low complexity. The authors of [164] also To optimize cellular network performance with renewable
utilize the DL technique to alleviate the computation overhead energy-enabled BSs, the most direct method is to predict the
of conventional methods. However, their proposal train the harvesting power. For the scenario where BS is powered by a
DNN in a supervised manner, where the downlink channel photovoltaic (PV) panel, battery, and power grid, the authors
gains and corresponding power allocation scheme are as the of [15] adopt the Block Linear Regression (BLR) [166],
input and output, respectively. ANN [167], and LSTM [168] to forecast the traffic, while
Zhang et al. [22] also consider the DL-based radio resource the linear regression model is utilized to predict the dynamic
management to improve EE in NOMA networks. Besides harvesting power. To measure the performance of these ML
the subchannel and power allocation considered in [101], the models, the metrics including the Average Mean Absolute
authors of [22] also analyze the user association since they Relative Error (AMARE) and Average Mean Error (AME) are
consider the two-tier networks including MBSs and SBSs. analyzed. Then, the prediction results can be utilized to switch
The authors optimize these three factors separately with three off some micro BS in low usage to save energy.
methods. Specifically, the semi-supervised learning-based NNs Miozzo et al. [169] propose a distributed RL-based SBS
and supervised learning-based DNN are adopted to optimize switching strategy to balance the network drop rate and energy
subchannel assignment and power allocation, respectively, consumption for two-tier cellular networks where the SBS
while the Lagrange dual decomposition method is used to and MBS are powered by the electricity grid and renewable
solve the user association problem. In the optimization of solar energy. The state space includes the instantaneous energy
subchannel assignment, the numerical iterative method, the harvested, battery level, and traffic load, while the reward is
defined according to the system drop rate and battery level. terms of energy saving and system outage, which is more
However, this method has the limitation to reach the system suitable for the highly-dense scenarios.
optimization since each SBS acts as the agent and decides Wei et al. [123] utilize the policy gradient-based AC
the working state according to its local state. To alleviate networks [178] to solve the user scheduling and resource
this problem, the authors further propose a layered learning allocation problem for the optimization of EE in a two-tier
optimization framework in [126]. In the lower layer, each HetNet where the SBSs are powered by solar and wind energy.
SBS still follows the original manner to decide the switching Since the wireless fading channels and stochastically harvested
scheme in a distributed intelligent manner. The only difference renewable energy have the Markovian property [102], the
is that a heuristic function is defined and united with the optimization of user scheduling and resource allocation can be
regular Q-value to select the optimal policy. Moreover, the formulated as an MDP, which lays the foundation for using
heuristic value is decided in the upper layer in a centralized DRL method. In their proposal, the state space consists of
manner. Specifically, the MBS utilizes a multi-layer NN to the SINR of each user and battery energy level of each SBS,
forecast its traffic load and judge whether the system is which are both continuous variables. The action space includes
under-dimensioned or over-dimensioned. Based on the load the number of allocated users and subchannels as well as the
estimation, the heuristic value is derived. transmission power. The reward function is defined as the
Li et al. [170] utilize the DRL method to manage the work EE with only the grid energy consumption considered. And
states of the harvesting-enabled SBS in a centralized manner. through online training, the final numerical analysis illustrates
In their proposal, the central controller acts as the agent to the improved EE.
decide the action which is a vector consisting of binary units From the above introductions, it can be found that AI
representing the switching decision for each SBS. And the techniques are efficient to address the dynamics of energy
state space includes the harvested energy, battery levels, traffic harvesting process. And similar to the BSs which are only
loads, throughput, and delay of all SBSs. Since the research powered by electricity grid, AI models can be utilized to op-
aims to balance the EE and QoS, the reward function is defined timize the switching scheme, user association, power control,
as the weighted sum of the two metrics. Using the DNNs to and resource allocation.
approximate the Q-value, the final simulation results clearly
illustrate the advantages of DQL against the traditional QL
E. summary
in terms of energy efficiency and delay. On the other hand,
this method has a shortcoming that the size of action space In the above research, AI techniques can be utilized to
exponentially increases with the number of SBSs, which leads optimize different network parameters in order to reduce en-
to abundant explorations during the training process. To solve ergy consumption or improve the EE. The supervised learning
this problem, Li et al. [171] consider the DDPG model. In this technique can be utilized to regress the complex unknown
model, the AC algorithm [172] is adopted where an actor NN relationships among the network parameters. For example, AI
and a critic NN are adopted to select an action and evaluate models can be trained with the data generated by conventional
the selected action, respectively. The final results verify the methods to map the relationship between channel conditions
improved energy efficiency over DQN and QL methods. and power allocation [14], [153]. Thus, AI-based algorithms
Since the renewable energy-enabled BSs are usually can avoid the massive iterations and alleviate the computation
equipped with batteries to store the harvested energy, to overhead of conventional methods. Moreover, the RL and DRL
optimize battery management can also contribute to the EE. techniques can efficiently address the problem of the huge size
The authors of [173] propose the FQL-based power manage- of solution space [171], [179]. Furthermore, the combinations
ment which combines the QL and FIS [174] to minimize the of ML/DL models with heuristic algorithms or game theory
electricity expenditures and enhance the battery life span. The can further enhance efficiency [79], [117], [134], [161], [180]
authors also construct the power consumption model related
to the real-time traffic as well as the battery aging model, IV. M ACHINE T YPE C OMMUNICATIONS
which is meaningful to design a more detailed energy-efficient Besides the cellular networks, MTC techniques provide
BS management policy in the future. Piovesan et al. [175] users with more choices and flexibility. And the development
analyze the constrained capacity of SBS battery and consider of IoT will result in a great surging number of MTC de-
energy sharing in the design of the SBS switching scheme. The vices [181]. In this section, we first give the power consump-
authors utilize and compare imitation learning [176], QL, and tion model of MTC and introduce the related AI strategies to
DQL methods. The considered state includes the battery level reduce energy consumption and improve efficiency.
and harvested energy, while the reward functions in two RL
models are defined according to the grid energy consumption.
In the imitation learning model, the ANN is supervised trained A. Power Consumption Modeling
with the labeled data generated by a mathematical model [177] The actual energy consumption of MTC depends on the
to map the relationship between the system state and switch definite scenario including the transmission policy, devices,
action. For the two RL models, the difference is that the Q- information size, and so on. In this part, we give a general
value is stored in a table for QL, while DQL utilizes an ANN power consumption model for the single-hop MTC scenario,
approximator to estimate the Q-value. The final comparison by which the multi-hop power consumption model can be
illustrates the DQL model achieves the best performance in derived.
The total power consumption of a machine node is mainly 1) Terrestrial Access Configurations: Even though cellular
utilized for two purposes: transmission and receiving packets, communications can provide stable and high-throughput con-
which can be simplified in the following equation. The details nections, the high power consumption to keep connections as
can be referred to [182]. well as the expense of the cellular infrastructure challenge
the wide applications in MTC. Moreover, since different
𝑃𝑚 (𝑑) = 𝑃𝑡0 + 𝑃𝑟 0 + 𝑃 𝑎 (𝑑) (4)
MTC services have heterogeneous QoS requirements and are
where 𝑃𝑚 denotes the total power consumption of an MTC distributed in various areas including the sparsely populated
node. 𝑃𝑡0 and 𝑃𝑟 0 are the power consumed by the circuit for areas and hazardous environments, to develop corresponding
transmitting and receiving and usually regarded as constants. access techniques is important to reduce energy consumption
𝑃 𝑎 (𝑑) denotes the power consumption of Power Amplifier or extend the lifetime. Some AI researcher works related to
(PA), where 𝑑 is the transmission distance. From the equation, improve energy efficiency of these access technologies are
it can be found that the total power consumption depends on introduced in the following paragraphs. We also give Table II
the PA. However, the value of 𝑃 𝑎 (𝑑) is affected by many to give more examples to adopt AI to optimize the access layer
factors including the specific hardware implementation, DC for green communications.
bias condition, load characteristics, operating frequency, and Li et al. adopt the RL technique to optimize the duty cycle
the required PA output power 𝑃𝑚𝑡 . In a specific scenario control for each router node in IEEE 802.15.4-based M2M
with given MTC devices, we usually only study the required communications [183], [184]. The authors consider the QL
minimum PA output power while the other factors are constant. method to design the superframe order for minimizing the sum
And the relationship between the two metrics can be denoted of weighted energy consumption and delay. In the considered
as below: RL model, the agent interacts with the environment and
𝑃𝑚𝑡 (𝑑) = 𝜂𝑃 𝑎 (𝑑) (5) chooses the suitable superframe order according to the queue
length. And the final simulation results verify the improved
where 𝜂 denotes the drain efficiency of PA. Specifically, the
energy efficiency. Xu et al. also utilize the model-free RL
value of required minimum PA output power 𝑃𝑚𝑡 can be calcu-
method to improve the throughput and EE of IEEE 802.15.4-
lated according to the given SINR threshold at the receiver side
enabled Industrial IoT (IIoT) networks [185]. In their research,
and the path loss model between the transmitter and receiver.
the QL is adopted to adjust the sampling rate of the control
Then, the power consumption for a single-hop MTC model
subsystem and backoff exponential, which is difficult to be
can be calculated. By adding the power consumption for each
addressed by traditional stochastic modeling approaches. For
hop, the multi-hop power consumption can be obtained. Since
the IEEE 802.15.4-based MTC scenarios, Zarwar et al. [186]
the definition of energy efficiency in MTC is similar to that
give a comprehensive survey on RL-enabled adaptive duty
in cellular networks, we can use Equation 3 accordingly.
cycling strategies, which can be referred for more knowledge.
It can be found the power consumed by the MTC node
Alenezi et al. focuses on LoRa communication technology
is mainly to support the circuit and PA. Since most of the
and utilize the K-clustering method to cluster the nodes in
MTC nodes do not need to keep the working state, the idle
order to reduce the collision rate [187]. To address the high
nodes can be turned into the sleep state to reduce the circuit
probability of packet collision caused by random access and si-
energy consumption. For the working nodes, how to reduce
multaneous transmissions, the authors first utilize the K-means
the required transmit power 𝑃𝑚𝑡 as well as minimize the
clustering to separate the IoT nodes into several groups and
transmission time are the main factors considered in green
then schedule their transmissions according to dynamic prior-
communications. For the former part, the transmit power
ity. The final simulations illustrate the significant reduction of
depends on the path loss and required SINR at the receiver.
collision rate, which further results in the decreased transmis-
The practical solutions to reduce the transmit power include
sion delay and energy consumption. Azari and Cavdar [188]
the optimization of network deployment, access technologies,
also utilize AI to optimize the performance of LoRa. The
and resource allocation. To reduce the transmission time, we
authors consider the Multi-Agent Multi-Arm Bandit (MAB)
need to optimize the transmission protocols, such as routing
to choose the best transmit power level, spreading factor,
and relay. Similar to the renewable energy-enabled BSs, energy
and subchannel to maximize the reward which is defined as
harvesting and sharing are also important techniques toward
a weighted sum of communication reliability and EE. The
green MTC. The following paragraphs introduce the related
analysis illustrates the lightweight complexity of the proposed
research one by one.
algorithms and verifies the performance improvement in terms
of energy efficiency and transmission success probability.
B. Energy-Efficient Network Access Guo and Xiang [202] utilize the distribute multi-agent RL
Various access technologies have been developed for dif- technique to pick the power ramping factor and preamble for
ferent MTC scenarios, such as cellular communications, IEEE each UE in the NB-IoT networks. In their research, an adaptive
802.15.4, WiFi, Narrow-Band IoT (NB-IoT), backscatter com- learning rate based QL algorithm is proposed for the non-
munications, and so on. The satellites and Unmanned Aerial stationary environment, with the reward defined according to
Vehicles (UAVs) have been emerging platforms to provide the UE’s energy consumption. Moreover, the learning rate is
Internet access for devices. In this part, we discuss how adjusted after comparing current expected reward and expec-
AI is utilized to improve energy efficiency of these access tations. The authors of [203] also utilize the QL techniques
technologies. to optimize the configurations in the random access process.
TABLE II. Some Related Research Works Using AI to Optimize Network Access for Green MTCs
Their proposal focuses on the optimization of three parameters latency. To strike the tradeoff among energy efficiency, latency,
including the number of random access channel periods, the and reliability, the authors first formulate the Lyapunov func-
repetition value, and the number of preambles in each access tion [205] to derive the optimum number of BSs to meet the
period. In the single-cell scenario, the tabular QL, linear content request of each vehicle. Then, to decide whether to
approximation-based QL, and DQN methods are adopted by use the feedback-based or feedbackless transmission scheme,
the eNB to predict the number of preambles in order to the authors construct the MAB model and utilize the 𝜖-greedy
maximize the served IoT devices. In the multi-cell scenario, RL algorithm to solve this problem. Specifically, the research
the huge size of the action space composed of three parameters goal of this step is to minimize the long-term expected cost
is a great challenge. The authors consider an action aggregated which is defined as the weighted sum of request drop event,
approach by converting the selection of definite value to the transmission latency, and energy consumption.
choice of increase or decrease. Then, the three QL methods 2) Access through Satellites: Satellites can provide seam-
are compared with a cooperative multi-agent DQL proposed. less coverage for IoT devices, especially for rural and remote
Lien et al. study the intelligent radio access in vehicular areas. However, the large path loss challenges the system
network to strike the balance among energy efficiency, latency, EE and lifetime. Authors of [206] study DRL-based channel
and reliability [204]. The authors concentrate on the fronthaul allocation to improve the system EE as well as guarantee the
radio resource starvation and propose an RL-based MAB QoS for LEO satellite IoTs. The authors formulate the channel
algorithm to avoid the backhaul transmission in the core resource allocation as an MDP and further utilize the DRL
networks. In the considered scenario, each vehicle can simul- technique to solve it. In their proposal, the agent is assumed
taneously access multiple BSs to request the contents using to choose an action to assign the channels according to the
the feedbackless transmission schemes, which further means state which is defined as the user task size and location. The
different communication reliability, energy consumption, and authors also construct the users’ requests into an image as the
QL method to select the next node and define the reward frequent recomputing, the authors first compute a collection
function according to the residual energy and depth infor- of cooperation policies offline. Then, in the online phase, the
mation for a balance of End-to-End (E2E) delay and energy estimated parameter values can be adopted to calculate energy
consumption [221]. The utilization of QL enables the long- cost, which finally helps to choose the optimal policy.
term reward taken into account, which finally reaches the
global optimization. By sorting the neighbors according to He et al. study the relay selection problem in the air-to-
the calculated Q-value, the node with higher priority can be ground VANETs (A2G VANETs) and adopt the QL to choose
selected to forward packets, while the other neighbors with the relay node in order to balance the network performance
smaller Q-values are suppressed for energy saving. Hu and and energy consumption [229]. In this paper, the flying UAVs
Fei also adopt the QL to solve the routing in UWSNs [222], and the ground vehicles transmit messages to each other
while the research goal is to make the residual energy of sensor by multi-hop relaying. Then, the relay selection affects the
nodes more evenly distributed for the maximum network packet delivery ratio, latency, signal overhead, and energy
lifetime. In the RL proposal, the authors consider not only consumption, which is further formulated to a multi-objective
the residual energy but also energy distribution in a group optimization problem. The authors construct the Q value table
of sensor nodes to define the cost function, which is further including the state and action indicating the network states and
utilized to calculate the reward and Q-value for different relay selection, respectively. Through attempting different re-
actions indicating various next nodes. The authors in this lay selections, the Q values for different choices can be finally
paper also illustrate that the proposed method can converge calculated. The extensive performance analysis illustrates the
for dynamic scenarios. And final performance results indicate improvement in terms of packet delivery ratio, latency, hop
the lifetime can be extended up to 20%. counts, and signal overhead, which means increased energy
In [223], authors adopt supervised learning-based MLP efficiency.
algorithm to improve the routing performance and energy
efficiency for the IoT low power networks. Different from Wang et al. also utilize QL to optimize the power allo-
the other works [212], [221], [222] which utilize AI models cation and D2D relay selection for maximizing the energy
to predict the next node directly, [223] aims at optimizing efficiency [218]. As the relay selection policy affects energy
the value of transmission range of each node to improve the efficiency of all D2D pairs, the authors construct a finite MDP
routing performance and minimizing energy consumption. In and adopt QL to choose which neighbor node is selected.
this paper, the authors first construct an IoT network to collect In the QL model, the state space is defined with the four
the labeled data including node positions and corresponding cases that whether energy efficiency of first-hop and second-
transmission range. Then, the MLP is trained with the labeled hop D2D links is below or above the definite lower band.
data to map the relationship from the node position to the Each D2D pair acts as the agent to select a neighbor node in
optimal transmission range. One of the advantages of this their region with the target of maximizing the reward defined
proposal is to address the high dynamics of IoT networks. And according to their energy efficiency. Through the iteration
the final simulations illustrate the extension of the network process in the QL algorithm, the Q-value table of each D2D
lifetime. pair can be updated and the optimal relay with the maximum
Mostafaei studies the multi-constrained routing problem in Q-value is chosen. The final simulation clearly illustrates the
WSNs and proposes a distributed learning approach [217], improvement of energy efficiency.
where each node is regarded as a learning automaton. After
the initial phase each learning automaton senses the neighbor Hashima et al. [230] utilize the stochastic MAB [231]
nodes to construct the action space, it transmits a packet by to model the neighbor discovery and selection problem in
a randomly selected action. Once the packet reaches the sink mmWave D2D networks. And the considered MAB model
node, the environment will feedback a reinforcement signal aims to maximize the long-term reward which is defined as
which can be a penalty or a reward to evaluate the selected the average throughput of the devices subject to the resid-
action. Then, the transmission probability of each action for ual energy-constraint of nearby devices. To solve this MAB
every node can be updated. problem, a family of upper confidence bound algorithm plus
2) Relay and D2D: Compared with routing, relay and D2D Thomson sampling is utilized by incorporating residual energy
techniques provide more flexibility. AI can be adopted to constraints. The final results illustrate the improved average
decide whether to relay or not and help to select the optimal energy efficiency and extended network lifetime. Authors
relay node according to the energy condition. Mastronarde et of [232] also focus on the relay selection in D2D mmWave
al. utilize the MDP to formulate the relay decision for each networks to increase the connection reliability. However, they
UE in the cellular networks [228]. To maximize the long- utilize the DL model to predict the best relay device according
term utility, the authors proposed a supervised learning-based to the distance between the device and BS or other devices,
model to help each UE to learn the optimal cooperation policy node mobility, signal strength, and residual energy. Specifi-
online. Specifically, the UE estimates three parameters, namely cally, the proposed relay selection algorithm consists of two
the outbound relay demand rate, inbound relay demand rate, phases. In the online phase, the random training values are
and relay recruitment efficiency in an online manner. Then, generated with the best relay labels to train the considered
the estimated values can be utilized to calculate the transition DNN model. Then, the second phase is to utilize the trained
probability and utility functions. To address the problem of DNN to predict the best relay.
TABLE III. Some Related Research Works Using AI to Optimize Transmissions for Green MTCs
D. Energy Harvesting and Sharing defined according to the long-term prediction loss. Finally, the
Similar to the cellular networks, MTC terminals can also authors combine the predictions of access control and battery
be charged by the ambient energy in a wireless manner [206], information and design a two-layer LSTM DQN network. The
[233]. To drive the MTC toward the green 6G era, two com- first layer is to predict the battery level, which is adopted
mon energy harvesting techniques are expected to be widely as part of the state space in the access control prediction.
applied: renewable energy harvesting and RF harvesting. The Extensive simulations illustrate the improvement of the system
formal one considers renewable green energy sources such sum rate, further resulting in improved energy efficiency.
as solar, winding, tide, and so on to reduce the utilization of Similar to the considered scenario in [17], the same authors
fossil fuel. The latter one is to efficiently harvest the dissipated apply the DRL techniques to optimize the joint control of
energy which counts the majority in RF signals but cannot be power and access [235]. Generally, the proposal consists of
used [234]. On the other hand, the dynamics of the harvesting two stages. In the first stage, the LSTM model is utilized to
power further complicate the network performance improve- predict the battery states, which is similar to that in [17]. In the
ment or energy efficiency optimization, which is the reason for control stage, the authors utilize the AC algorithm and DQN to
the application of AI techniques. In the following paragraphs, decide the access and power scheme. The state space consists
we introduce the related AI-based research considering the two of the channel power gain, predicted UE battery level, history
EH techniques. information of power control policy, and selected UE’s true
1) Renewable Energy Harvesting: Chu et al. utilize the RL battery, while the action represents the transmit power which
technique to design the multiaccess control policy and predict has a continuous value. The reward is defined according to the
the future battery state [17]. In their research, the authors achieved transmission rate, thus the algorithm aims to improve
consider the uplink communication scenario where multiple the system throughput. The proposed LSTM model is verified
energy harvesting-enabled UEs access the BS with the limited a high accuracy rate to predict the battery state and the new
channel resource. The authors firstly assume the user battery approach enables the improved average sum rate compared
and channel states are available for the BS, then utilize the with conventional algorithms as well as DQN-based models.
DQN based LSTM to design the UE uplink access scheme. In From the above introduction, we can find that using the
this model, the system state includes the channel conditions AI method to predict the harvesting-enabled battery state is
and UE battery levels. The reward is defined as the discounted an efficient method to adjust the network configurations for
system sum rate of the long term. The consideration of performance optimization. Authors of [16], [233] utilize the
multiple time slots drives the authors to adopt the LSTM non-linear regression method to find the relationship between
model, which can make sequential decisions. The constructed future harvesting power and the historical records. Then, with
LSTM model assists the BS to select the UEs at each time the estimated harvesting power, the IoT node can adjust the
slot in order to maximize the system sum rate. In the second security configurations to provide qualified service as well
proposal, the authors utilize the RL based LSTM to predict as reduce the outage probability. In [233], the authors also
the battery level. In this RL model, the considered state space study the THz-enabled 6G IoT scenario and show the achieved
includes the access scheduling history, the previous UE battery network throughput improvement and extended working time.
predictions, and the practical UE battery information. Since the 2) Radio Frequency Harvesting: Abuzainab et al. focus
purpose is to maximize the prediction accuracy, the reward is on the problem of adversarial learning in a communication
network where the devices are served and powered by the vantage of AI is that it can address the uncertainty and
Hybrid Access Point (HAP) [236]. In the considered scenario, alleviate the failure ratio during the access and transmission
the HAP needs to estimate the transmission power of the process [185], [203], [212], [214], [222]. For energy harvesting
devices and determine the suitable energy signal to reduce process, AI enables more knowledge about future available
the packet drop rate of the devices. As the adversary may alter power and battery status, which enables necessary configura-
the HAP’s estimate, the authors propose a robust unsupervised tions towards improved energy efficiency [17], [235].
Bayesian learning method. In the proposed model, the HAP
is assumed to have full CSI, which is utilized to calculate the V. C OMPUTING O RIENTED C OMMUNICATIONS
transmission power according to the received signal power. In the 6G era, the computation services are expected to play
In the nonparametric Bayesian learning model, the Dirichlet a more important role in people’s work and life. With the
distribution is used to calculate the posterior distribution of great leap in transmission rate and communication capacity,
the probability vector of the device transmission power. Then, an increasing number of applications will be offloaded to the
the HAP can find the optimal transmission power to maximize cloud or edge server for the nearly real-time results instead
the utility while not depleting the device’s battery. Compared of execution locally. Moreover, to store the contents on the
with the conventional Bayesian learning method, the proposed cloud and edge servers can provide users with more efficient
approach can achieve performance in terms of packet drop and flexible service. Additionally, the widespread application
rate without jeopardizing energy consumption. The proposed of AI techniques also drives the development of computing
learning scheme also exhibits improved energy efficiency oriented communications to accelerate network management.
compared with a fixed power transmission policy. In this section, we discuss the power consumption model and
Kwan et al. study the RF harvesting from intended and introduce the existing AI-based research aiming to improve
unintended sources and propose machine learning-based wake- energy efficiency and save energy consumption of the COC
up scheduling policy for on-body sensors [237]. To address scenarios.
the unpredictable nature and low amount of energy harvesting
from the RF signals of unindented sources make it difficult to A. Power Consumption Modeling
decide the wake-up time, the authors consider two machine
learning techniques including linear regression and ANN to The consuming power of the servers depends on the Central
predict the wake-up time. In the linear regression-based fore- Processing Unit (CPU) or Graphic Processing Unit (GPU)
caster, the authors consider the current capacitor charge level utilization which usually keeps changing. Generally, energy
and average energy harvesting rate to address the dynamics consumption of a server is approximately linearly dependent
caused by user mobility and changing channel conditions. The on the CPU and GPU usage. If we assume 𝑃𝑖𝑑𝑙𝑒 and 𝑃𝑚𝑎𝑥 to
proposed ANN is to predict the next wake-up time considering denote the consuming power of a server working at idle state
the last successful wake-up time and energy level. The final and full state, respectively, the following equations can model
simulation results illustrate the two models both achieve high energy consumption when the utilization rate is denoted as
accuracy rate. 𝑢 [71], [72]:
Similar to [237], the authors of [238] also focus on the opti- 𝑃(𝑡) = 𝑃𝑖𝑑𝑙𝑒 + (𝑃𝑚𝑎𝑥 − 𝑃𝑖𝑑𝑙𝑒 ) × 𝑢(𝑡) (6)
mization of active time of IoT nodes which are powered by RF ∫
harvesting energy. In this paper, besides information collection 𝐸 = 𝑃(𝑡)𝑑𝑡 (7)
and energy provision, the HAP is also responsible for setting
the sampling time of the IoT devices. The challenge of this Thus, for a cluster of servers, the total energy consumption
problem is that the HAP cannot have exact knowledge of the can be calculated by summing energy cost of all servers.
harvested energy for each IoT device due to the imprecise From Equations 6, it can be found that to save energy
knowledge of CSI. To address this issue, the authors combine consumption, we can reduce the utilization rate of each server.
stochastic programming and RL techniques. Firstly, stochastic However, it has been investigated that the server in the idle
programming is used to maximize the minimum sampling time state consumes approximately more than 60% of the peak
among all devices. To tackle the limitation of an unknown load electricity [239], [240], which makes the problem more
and dynamic probability distribution, the RL technique is complicated. For a given workload, to utilize only one or sever
adopted where the assumed agent decides the sampling and servers at the full state and turn off the other servers may result
charging time according to the states corresponding to the in low energy consumption, but on the other hand contribute
device battery levels. The reward function is measured by to the high delay. Therefore, how to allocate the computation
the maximum-minimum active time of devices. Moreover, resources to balance energy consumption and service quality
the authors model the large-state or continuous space using is an important direction in the research [149], [241], [242].
linear function approximation. The final results illustrate the
RL approach can achieve as high as 93% of the minimum B. Energy-Efficient Cloud and Edge Computing
sampling time computed by stochastic programming. According to Equation 6, to reduce the CPU/GPU usage
can alleviate energy cost. In this part, we discuss the three
E. Summary common issues to alleviate the computation resource usage
In this section, we analyze the AI-based research toward including offloading decision, resource allocation, and server
green MTCs. Compared with conventional methods, the ad- placement.
latency, an imitation learning-based algorithm is proposed, through training with limited samples, avoiding the complex
which can alleviate the extreme complexity of conventional training from scratch, which reduces the execution time by
branch-and-bound algorithm. Specifically, an expert is trained the order of two magnitudes. The final performance analysis
with a few samples to obtain the optimal scheduling policy show that the transfer learning-based DNN can provide a close
in an offline manner. Then, the agent is trained to follow the approximation of the optimal resource allocation.
expert’s demonstration online. Results illustrate that imitation Wang et al. [87] study the cellular networks where MEC-
learning can significantly accelerate the execution of the enabled High-Altitude Balloons (HABs) conduct the users’
branch-and-bound process. computation tasks with limited capacity and energy. Since the
2) Computation Resource Allocation: The computation data sizes of the computation tasks vary, the user association
platform usually needs to execute multiple tasks. How to policy should be optimized to meet the requirement as well
allocate the computation resource, especially the CPU/GPU as minimize energy consumption. To alleviate the limitations
cycles is an attractive topic [87], [109], [113], [247]. On the of traditional Lagrangian dual decomposition [251] and game
other hand, energy consumption is also an important metric theory [67] in dynamic scenarios, the authors utilize the SVM-
that needs to be considered. How to balance energy consump- based federated learning algorithm to map the relationship
tion and computation performance can be addressed by AI from users’ association and historical requested task size to
techniques [113], [248], [249]. The following paragraphs will the future association. Specifically, similar to the process
focus on several research works. in Fig. 9, the HAB first train an SVM model with the
Similar to [113], the authors of [250] also consider AI locally obtained data to construct the relationship between user
techniques to balance energy consumption and latency for the association and computation task size. Then, the HABs share
scenarios utilizing the capacity-limited edge servers and cloud their trained SVM model, which enables further integration
server. However, edge servers are driven by hybrid power and local improvement. Thus, each HAB can build an SVM
including solar, wind, and diesel generator, while computation- model to quantify the relationship between all user association
efficient cloud servers are grid-tied. The authors model the and historical computation task information. The simulation
joint workload offloading and edge server provision as an results illustrate energy consumption can be reduced with a
MDP and utilize the RL technique to solve it. The authors better prediction of optimal user association.
define the total system cost with the delay, diesel generator Ma et al. [247] utilize the PSO algorithm to jointly optimize
cost, and battery consumption, while the policy denotes the the selection of access networks and edge cloud to minimize
computing power demand in each time slot. To find the opti- the latency and total energy consumption. In the considered
mal policy, a novel post-decision state-based online learning scenario, each user can be served by multiple edge cloud-
algorithm is proposed to exploit the state transitions of the enabled access networks. Since the latency and energy con-
considered energy harvested-enabled MEC system. Compared sumption are both caused by task offloading and execution, the
with the standard QL method, the proposed approach con- formulated problem to minimize the two metrics is NP-hard.
verges much faster. And extensive simulations confirm that In the adopted PSO model, the fitness function is defined as
the MEC system performance can be significantly improved. the sum of weighted latency and energy consumption. Note
Pradhan et al. [109] study the computation offloading of that the values of latency and consumed energy are processed
IoT devices in the massive MIMO Cloud-RAN (C-RAN) to between 0 and 1 to avoid the dimensional influence.
deployed in an indoor environment. In this paper, the pur- And the final performance analysis illustrates the significant
pose of optimizing the computation offloading is to minimize improvement in terms of latency and energy consumption.
the total transmit power of IoT devices. In the considered 3) Edge Server and Virtual Machine Placement: The place-
scenario, the transmission latency of the uplink signals is ment optimization including the edge servers and Virtual
concerned with the transmit power and the CPU cycle alloca- Machines (VMs) affect the resource utilization of the whole
tion. Therefore, to minimize the total transmit power of IoT network. Since power consumption at the idle state constitutes
devices under the latency threshold, we need to consider not the major part of total energy waste [66], to minimize the
only the signal processing factor, but also the computation active servers as well as meet the service requirements can
resource allocation, which is a non-convex problem due to improve energy efficiency. And AI techniques including the
the coupling relationship among these factors and their value heuristic algorithms and machine learning methods have been
constraints. To solve this problem, the authors consider the studied to optimize the deployment of edge servers and VMs.
supervised learning method and adopt the DNN model to Li and Wang [71] study the edge server placement and
decide the transmit power, CPU cycle assignment vector, and devise a PSO-based approach to minimize energy consump-
the number of quantized bit. The authors also propose an tion. In this paper, the authors consider that multiple edge
Alternating Optimization (AO) based mathematical model to servers are located at different base stations. And the delay
obtain some near-optimal solutions to train the DNN model for the base stations to access the edge servers should be not
offline. Simulation results illustrate the fast convergence of above a threshold. In this paper, the minimization of energy
the DNN training process. More importantly, to tackle the consumption depends on the locations and assignments of the
same problem in dynamic IoT networks, the authors utilize edge servers. To solve this discrete problem, the authors also
the transfer learning [105] technique, which means that part of redefine the parameters and operators of the PSO method.
the trained DNN’s parameters are utilized in the newly-formed To evaluate the performance, a real dataset from Shanghai
DNN for the changed scenario. Then, the DNN can be updated Telecom is utilized in the experiment, with which the PSO-
based approach shows an improvement of more than 10% problem. Then, the authors consider the energy-aware local
energy saving. fitness and devise a two-dimensional encoding scheme to
Liu et al. [66] study the VM placement in cloud servers accelerate the convergence and reduce the search time. Results
and adopt the ACO algorithm to minimize the number of illustrate that the proposed method outperforms the other
active servers and balance the resource utilization, resulting approaches and can lessen 13%-23% energy consumption. A
in improved energy efficiency. In their approach, the bipartite similar research work based on PSO is given in [73]. The
graph is constructed to describe the VM placement prob- authors utilize the decimal coding method to apply PSO in
lem. And the pheromone is distributed not only between a discrete problem. And energy consumption is minimized
the VMs and servers, but also among the VMs assigned to considering the service requirement constraints. The authors
the same server. And the assumed artificial ants conduct the also analyze the complexity of the proposal which is related
VM assignment based on global search information. To speed to the numbers of migrated virtual machines, particles, and
the convergence and improve the solution, a local search iterations.
including the ordering exchange and migration operations is
conducted. The improved ACO algorithm is efficient for large- C. Green Content Caching and Delivery
scale problems. And the experimental results show that the Besides offloading the contents to the edge/cloud servers,
number of active servers can be minimized with balanced to store the contents is also an important service for future
usage of resources including the CPU and memory, which CDN. Energy consumption of this part mainly comes from the
results in improved energy efficiency. caching and delivery process. In the following paragraphs, we
Shen et al. [91] focus on the cloudlet placement to improve discuss the related research on how AI is adopted to improve
energy efficiency in the mobile scenario and K-means cluster- energy efficiency of content caching and delivery.
ing [88] method is adopted to search the location center. In this 1) Caching Policy Design: For future multi-tier or hierar-
paper, energy consumption is assumed to be directly related chical networks, the contents are usually cached in different
to the number of deployed cloudlets. Thus, to minimize the parts to improve storage efficiency. The content caching policy
number of deployed cloudlets can optimize energy efficiency. needs to be optimized due to the variable storage size of
To tackle this problem, the authors firstly utilize the K-means heterogeneous devices and different energy consumption for
clustering method to find the central locations of the mobile content retrievers. Li et al. [253] utilize the DRL to optimize
devices. The following steps are to delete some locations that the content caching policy for multi-tier cache-enabled UDNs.
do not meet the density requirements and generate the moving The authors analyze the different energy consumption of
trajectory of the cloudlets. Performance analysis illustrates the content retriever from the Small Access Points (SAPs), MBS,
increased number of covered devices of each cloudlet, which and core networks, then construct the energy-efficient model.
results in reduced energy consumption. To optimize energy efficiency, the standard DRL method using
Zhang et al. [80] study the container placement to optimize the regular multi-tier DNN is adopted, where energy efficiency
energy consumption of virtual machines and propose an im- and different content combinations as the reward and state,
proved GA. In this paper, the container is utilized to compute respectively. To accelerate the convergence of the proposed
some applications and energy consumption is assumed to be intelligent content caching method, the authors utilized the
nonlinearly related to resource utilization. Since the container latest finds including the prioritized experience replay [254],
placement is regarded as a combinatorial optimization prob- dueling architecture, and deep RNN. Extensive simulations
lem, the heuristic algorithms, such as GA [81], are well suited. illustrate that the proposed intelligent content caching algo-
However, the conventional GA sometimes incorrectly elimi- rithms can significantly improve energy efficiency for both the
nates new individuals in the mutation operation when resource stationary and dynamic popularity distributions. [255] analyzes
utilization is high, which causes performance degradation. To impacts of the channel conditions on content caching. And
solve this problem, the authors propose two kinds of exchange the RL-based content caching is proposed to alleviate energy
mutation operations and define a control parameter with the consumption.
number of search iterations. And the method can help the Shi et al. [256] adopt the DQN model to optimize the
search iteration to jump out of the local optimum. The final content caching in three layered vehicular networks, where
simulations illustrate the significantly improved power saving an airship distributes the contents to UAVs for satisfying the
performance in small, medium, and large scales of scenarios terrestrial services. In the considered scenario, the airship
with uniform and non-uniform VM distributions. needs to schedule the UAV caching the required contents to
Wang et al. [72] study virtual machine placement in provide the service if the requested content is not in local UAV,
heterogeneous virtualized data centers and utilize the PSO which means more energy consumption. To minimize energy
method [252] to minimize energy consumption. In this paper, consumption, the DQN model is proposed and the defined
the authors first establish energy consumption model of a reward considers the probabilities of local UAV requests and
heterogeneous virtualized data center. Since traditional PSO other UAV scheduling. To improve training performance, the
method can be only utilized for continuous optimization prob- experience replay mechanism is considered. And the proposed
lems, the authors redefine the particle position and velocity DQN model is verified to overcome the large number of states
with two 𝑛-bit vectors, and then redefine the subtraction, and in the training process.
addition, and multiplication operators to fit the energy-aware Tang et al. [257] consider the scenario where the users can
virtual machine placement optimization, which is a discrete retrieve the contents locally, or from the neighbor devices,
SBS, and MBS, with increasing energy consumption. On the the trajectory of UAV in order to improve content delivery
other hand, the user’s device, SBS, and MBS have increasing for the UAV-assisted intelligent transportation system. In this
caching capacity. Specifically, the QL algorithm is applied paper, the moving vehicles are assumed to cache part of the
to every user to select the cached contents with the goal of contents due to the limited capacity and need to retrieve the
minimizing the cost which is inversely proportional to the other contents from the BS which is time-consuming and
popularity of cached files. For the caching policy of each SBS, unstable. To improve the content delivery performance, the
the DQN is adopted to select the contents in order to minimize cache-enabled UAVs are assumed to hover over the vehicles
the total energy consumption. In the proposal, the cost function to meet some content requests. As the trajectory control affects
is similar to the reward in DRL, while the optimization goal the performance of content delivery as well as the power
becomes to minimize the value of cost. For this proposal, the consumption of UAVs, the Proximal Policy Optimization
complexity of QL is relatively low since every user’s device algorithm is adopted to decide the flying velocity according
has very limited capacity, which means the state space is small. to the network states including the current position, vehicle
On the other hand, the DQN has a relatively high complexity information, and cached contents. The final results also show
since the number of cache combinations is large, leading to a the improvement of energy efficiency.
huge state space. The above works focus on content delivery in the access
The content caching policy design deeply depends on the networks, while the data forwarding in the core networks
users’ preferences, thus, the centralized control-based opti- is also an important factor to affect energy consumption.
mization methods may cause concern for privacy. For the data- Li [75] utilize the ACO algorithm [262] to optimize the data
driven AI algorithms including ML and DL techniques, the forwarding scheme to reduce content retrieve hops, which
training and running process which requires the users’ data results in less energy consumed by the routers and links. In this
poses great challenges. To address this problem, federated paper, the CDNs are first divided into multiple domains. And
learning has been widely studied to keep the data IN the the data packets and the hello message packets are assumed to
local area to protect privacy [116], [258], [259]. In [116], be two types of ants. For each path, the pheromone is defined
the UE conducts the calculations of the shallow layers to and calculated as the normalized sum of path load, delay, and
generate some general features of the content requests. Similar bandwidth. Then, through the generated interest ants in the
to the process in Fig. 9, the heterogeneous BSs including the initial state, the node can construct the paths and update the
flying UAVs aggregate the parameters of the shallow layers to corresponding pheromone values. Then, during the data packet
conduct the further training and running process to decide the transmission stage, the pheromone is further updated according
content caching policy. Different from the cooperative training to the real-time performance.
of the deep learning models, Yu et al. [258] consider that each 3) Joint Optimization: Since the caching and delivery poli-
user downloads the Stacked Autoencoder from the server and cies both affect energy consumption, joint optimization is
trains it with the local dataset generated from the personal another direction toward green communications. Li et al. [263]
usage. Then, the updated parameters and extracted features are adopt the DRL method to minimize the latency and energy
uploaded to the server, where the hybrid filtering technique cost of content caching and delivery in RAN. In this paper,
is adopted to decide the content caching policy. To further the authors define the reward function considering the latency
ensure data security, blockchain techniques can be adopted in and energy cost of the content caching and delivery between
the data transmission process [259]. However, these research the users and SBS, MBS, and cloud servers. Then, the AC
works aim to improve the caching performance, instead of the model and DDPG algorithm [264] are adopted, where two
minimization of energy consumption. identical DNNs are utilized to generate the deterministic action
2) Delivery: Besides content caching, how to deliver the and evaluate the chosen strategy. Here, the action is defined
contents is also an important factor to affect energy consump- with the content file placement, SBS-user association, and
tion. In this part, we discuss the related AI-based research on subchannel assignment. The simulation results illustrate the
content delivery optimization. improved rewards, which means the performance improvement
Lei et al. [260] study the content caching and delivery in terms of transmission delay and energy consumption.
in cellular networks, and a supervised DNN based approach Similarly, Li et al. [242] also utilize the DL technique
is adopted to optimize the user clustering to minimize the to jointly optimize the content delivery latency and system
transmit power of the BSs. In each cell, the content delivery energy consumption. However, as the cache-enabled D2D
should satisfy the stringent delay requirement, thus the user networks are adopted to alleviate the overhead of requesting
scheduling algorithm should have low computation time to the contents from the cellular BS In this paper, the device
enable real-time operations. To realize this goal, the DNN mobility, content popularity, and link establishment decisions
is trained to map from the users’ channel coefficients and need to be considered. To address the complexity caused
requested data amount to the clustering scheduling policy. by the dynamics including changing channel conditions and
The authors utilize a variable size of dataset generated with variable content popularity, the authors consider a three-step
conventional iterative algorithms to train the proposed DNN. proposal, all of which utilize the DL models. First, the RNN
And the performance shows that the large sized dataset can models including the conceptor-based Echo State Networks
result in 90% approximation to the optimum with limited time (ESN) [265] or LSTM is utilized to predict user mobility
consumption. according to the limited previous records. Then, the predicted
Al-Hilo et al. [261] utilize the DRL technique to optimize D2D user location information, together with other attributes
including gender, age, occupation, time, and so on, are utilized constructed in a hierarchical manner and have various sizes of
as the input of ESN or LSTM to predict the probability of coverage. Moreover, as the UAVs and HABs will also act as
each user to request every content at the next time slot. Then, the BSs [87], [116], [249], [261], the heterogeneous hardware
the content request distribution can be utilized to assist the architectures and the mobility further complicate the green
content placement. For example, the content will be assigned management. The following paragraphs introduce the potential
to the user if the request probability is above 70%. In the AI-based research considering the potential three functions of
third step, the joint value and policy-based AC algorithm [172] 6G BSs.
is utilized for each user to choose a neighbor to establish As the end terminals can be served by different BSs
the communication link for content delivery according to including the MBSs, SBSs, and Tiny Base Stations (TBSs) in
the observed environment which is defined as the transmit the multi-tier 6G HetNet, the user association policy should
power, channel gain, and distance. In this algorithm, the be optimized in order to turn off the redundant BSs for
reward function is denoted by the sum of weighted content energy saving. Moreover, the BSs are usually deployed with
delivery delay and power consumption. The simulation results multiple frequency bands, the resource allocation including
illustrate that with different weight combinations of delay and the channels and power are critically for the network en-
power consumption, variable power saving performance can be ergy efficiency. However, the mobility of end devices, and
obtained, which means that the proposed strategy is reasonable UAV or satellite-enabled BSs results in the changing traffic
and flexible. Similar research is given in [266], which also demand and dynamic channel conditions, while the resource
utilizes the ESN model [265] to predict the user mobility heterogeneity further complicates these problems. To address
and content request distribution. Since the requested content these issues, AI techniques can provide efficient assistance.
is dependent on the users, the authors consider the context of For example, AI models can be adopted to predict the traffic
users including the gender, occupation, age, and device type to demands, mobility patterns, and channel conditions, which
predict the probability of content requests. To make the results enables the network reconfigurations in advance.
practical, the authors collect historical content transmission Besides offering communication services, future BSs will
and user mobility records to train the considered models. act multiple roles, such as the computation/storage providers
and energy source. As some BSs have a certain amount of
computation and storage resources, the computation offloading
D. Summary
and content caching policies can be optimized by AI models.
According to the introduced research, we can find AI For example, the computation offloading or content caching
techniques can significantly improve energy efficiency of the are usually models as a non-convex problem, which is further
content caching process. In the content placement step, AI solved by the RL or DRL techniques. As we mentioned in
techniques are important and efficient to predict the content Sec. I, compared with the traditional method which divides the
popularity and users’ information including the preference and non-convex problem into two sub-problems and solves them
location, which can result in improved local Cache Hit Ratio one by one, the RL or DRL can find the global optimal solution
(CHR) and reduce the content retriever from cloud servers. and avoid the complex iteration process during the algorithm
For the content delivery part, the optimization is to improve execution period.
the resource allocation, transmission scheduling, routing, and
other communication functions to save energy. Different from B. Energy-Efficient Space-Air-Ground Integrated Networks
the energy-efficient proposals in cellular networks as we SAGIN has been regarded as one of the key technologies
mentioned above, the strategies in content delivery networks for 6G [1], [267]. SAGIN can provide seamless coverage
should consider the content placement, latency requirements, and flexible information transmissions, especially for massive
and even the caching capacity. MTCs. Since the satellites, HABs, and many UAVs are driven
by renewable energy, energy-efficient network orchestration
VI. O PEN R ESEARCH I SSUES is critically important for SAGIN. However, the diversified
Even though there are a huge number of research works transmission environments, heterogeneous hardware platforms,
on AI-based green communication services, we still need to and dynamic energy resources pose great challenges. To ad-
pay more attention to transform our endeavors into practical dress the complexity and uncertainty, AI can provide many
applications in the 6G era. Moreover, the utilization of AI efficient models. For example, using the RL technique to
techniques in current networks is still confronted with many optimize the resource allocation policy including the trans-
challenges in terms of computation complexity, hardware com- mitting power [268] and channels [206] has been evaluated
patibility, data security, and so on. The following paragraphs to improve the network energy efficiency. Moreover, the CSI
give some promising directions, which we believe will give dynamics and network mobility make energy-efficient packet
some ideas to the researchers. transmissions more difficult. As AI has been demonstrated
that it can efficiently map the complex relationship between
existing network traces and future transmission policy for
A. Green BS Management for 6G HetNet terrestrial networks [206], [216], we believe the research can
As we mentioned in Sec. III, the BSs take the majority of be extended to the SAGIN scenario.
total energy consumption. In the 6G era, the number of BSs Even though AI has been studied to optimize the SAGIN
is meant to be multiple times that of 5G. And these BSs are performance [189], [269], current research mainly focuses
on the single layer, such as the LEOs and UAVs. From harvesting technique, AI can be used to optimize the BS power
the systematic perspective, the network management toward control and transmission scheduling [150], [152]. For the UAV-
green communications should consider every part of SAGIN. enabled BSs, AI can be adopted to optimize the trajectory
For example, the UAV deployment and trajectory should be to reduce energy consumption and improve the harvesting
optimized considering the beam control of satellites to realize efficiency [161], [273]. Current research mainly focuses on
energy-efficient coverage [66], [210], [211]. As AI has been the maximization of minimum harvesting energy due to the
illustrated to be competent to handle the complex multiple- disordered transmission and unplanned power control [237],
variable-related problems [196], [197], using AI techniques to [238], AI can enable the RF harvesting process to be energy-
analyze performance from the perspective of whole SAGIN aware, which can greatly reduce the wasted energy, especially
system will be a promising direction. However, the difficulty for the signals from omnidirectional antennas.
is how to characterize the concerned factors into the AI The RF harvesting technique also enables energy sharing
model [57], [92]. And, the execution of the AI model is another among devices, which can be considered to avoid the outage
challenge due to the extreme computation overhead. Moreover, of some network parts as well as reduce energy waste when
AI is also important to optimize RF energy harvesting in batteries of some devices are nearly full and cannot save
cellular networks, which will be discussed in Sec. VI-D. incoming energy anymore [129]. The Simultaneous Wireless
Information and Power Transmission (SWIPT) technique has
C. AI-based Energy-Efficient Transmissions been widely studied, especially in MTC scenarios [274]. Even
though it may cause some performance loss to harvest energy
Packet transmission is energy-consuming as it costs energy
from part of the received signals, AI can be utilized to decide
of transmitters, forwarders, and receivers. Besides power con-
the ratio between RF harvesting and information transmission
trol and resource allocation methods to reduce energy con-
to reach a balance [275]. Currently, ambient backscattering is
sumption, many other choices have been provided including
a promising technique especially for the low power machines,
the routing policy design, relay, backscatter communication,
AI can be considered to optimize energy harvesting and
and IRS-aided transmissions. There is no doubt that multiple
information forwarding process [23], [156], [157].
communication manners will be provided for the end devices
to transmit the packets successfully. For instance, the mobile
users can choose the cellular network to send the email, E. Security for AI-enabled Networks
which can be also finished by the IEEE 802.11-based WiFi The adversaries and unauthenticated users threaten the in-
or through D2D in a multi-hop manner. How to cooperatively formation privacy as well as cause the transmission failures,
utilize and schedule the different communication methods leading to the deteriorated energy efficiency. To protect the
and resource in a multi-agent multi-task environment will normal information transmission from the attacks, AI can
heavily affect the system energy consumption and network be considered as it has been verified to detect the network
performance. Most AI-based research focuses on the single threats [276]. Moreover, using AI to control the transmit
communication scenario, while very limited works study the power and allocate the resource is also efficient to address the
hybrid scenario [193], [196]. In the future, we can pay more network jammers [208]. For the future AI-driven 6G, a new
attention on AI to improve energy-efficient transmission in the type of network threatens may be the malicious data generated
scenario where multiple communication manners are available. by the adversaries, which misleads AI models to reach a
wrong decision. Besides the decreased throughput or increased
D. AI-Enhanced Energy Harvesting and Sharing latency, the potential results may be the widespread outage of
end terminals or extremely low harvesting efficiency. How to
Energy harvesting has been widely recognized as an im-
develop robust AI models to ensure green communications
portant part for green communications. To drive the devel-
will be important topics.
opment of green communications, various energy harvesting
Most AI techniques including the DL and ML rely on
techniques will be utilized, which can be grouped into dif-
data in the training and running phases. Since the data may
ferent groups according to whether it is controllable and pre-
be concerned with personal privacy or business information,
dictable [270]. AI techniques can be adopted in the scenarios
to develop and execute AI algorithms should consider the
using the uncontrollable but predictable energy group and
data security issues. More importantly, the standards and
partially controllable energy group, where the formal consists
regulations should be built to guide the collection and usage
of the solar, winding, tide, and other renewable sources,
of data [59].
while the latter includes RF energy. For the uncontrollable
but predictable energy harvesting techniques, some AI models
can be utilized to map the relationship between the future F. Lightweight AI Model and Hardware Design
harvesting power and related factors [271], [272]. And the To develop AI-based green communications, energy con-
predicted results can be adopted to reconfigure the network sumption of AI algorithms should be analyzed. However,
in advance. Another method is to directly utilize AI models most of the current research just focuses on the network
to map from the harvesting-related factors to network man- performance improvement compared with conventional algo-
agement policy. These methods enable network operators to rithms and neglects the consumed energy for the training
gain more knowledge of energy harvesting and improve the and running of AI models [277], [278]. This may cause the
utilization efficiency. For the partially controllable RF energy high complexity of the proposed AI models, which may be
