Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Split Federated Learning Empowered Vehicular Edge Intelligence: Adaptive Parellel Design and Future Directions

Xianke Qiang, Zheng Chang,  Chaoxiong Ye, Timo Hämäläinen, , Geyong Min X. Qiang and Z. Chang are with School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China. Z. Chang, C. Ye and T. Hämäläinen are with Faculty of Information Technology, University of Jyväskylä, P. O. Box 35, FIN-40014 Jyväskylä, Finland. G. Min is with Department of Computer Science, University of Exeter, Exeter, EX4 4QF, U.K.
Abstract

To realize ubiquitous intelligence of future vehicular networks, artificial intelligence (AI) is critical since it can mine knowledge from vehicular data to improve the quality of many AI driven vehicular services. By combining AI techniques with vehicular networks, Vehicular Edge Intelligence (VEI) can utilize the computing, storage, and communication resources of vehicles to train the AI models. Nevertheless, when executing the model training, the traditional centralized learning paradigm requires vehicles to upload their raw data to a central server, which results in significant communication overheads and the risk of privacy leakage. In this article, we first overview the system architectures, performance metrics and challenges ahead of VEI design. Then we propose to utilize distribute machine learning scheme, namely split federated learning (SFL), to boost the development of VEI. We present a novel adaptive and parellel SFL scheme and conduct corresponding analysis on its performance. Future research directions are highlighted to shed light on the efficient design of SFL.

Index Terms:
split learning, federated learning, vehicular network, edge intelligence

I Introduction

With ever increasing emphasis on privacy and widespread deployment of edge computing in vehicular networks, federated learning (FL) emerges as a promising distributed learning framework for implementing vehicular edge intelligence (VEI). FL enables the vehicles to train the local model with private data, and then upload the local model for Road Side Unit (RSU) to aggregating. Despite the potential of FL in VEI, there remain numerous challenges [1]. One significant problem is the high heterogeneity among the clients involved in training [2]. Another primary concern of FL is how to protect user privacy since sensitive information can still be revealed from model parameters or gradients by a third-party entity or the RSU [3]. Furthermore, with the development of AI, we have entered the era of large models, which are progressively growing in size and complexity [4]. Training complete and large models on resource-constrained vehicles poses a significant challenge.

Meanwhile, Split Learning (SL) is also one of the underlying technologies for achieving VEI, where the whole AI model (e.g., CNN) is partitioned into several sub-models (e.g., a few layers of the entire CNN) with the cut layer and distributing them to different entities (e.g., the vehicle-side model at the vehicles or the RSU-side model at the RSU)[5, 6]. By offloading computation-intensive portions to the RSU and preserving privacy-sensitive portions locally, SL can significantly reduce the computation overhead of model training on resource-constrained devices, and has great potential to empower the future intelligent transportation (ITS) systems. However, utilizing the conventional sequential SL directly for VEI may induce extra communication overload and time delays, which calls for a contiguous design over the conventinal scheme.

It is worthy noticing that a novel framework called Split Federated Learning (SFL) combines the ideas of SL and FL to parallelized the training process [7]. As for SFL in vehicular network, the vehicle downloads the vehicle-side model and execute forward propagation to upload the smashed data to the RSU. Then the RSU performs the forward and backward propagation with received smashed data, and broadcasts the gradients of smashed data. After that, the updated vehicle-side model is upload to RSU for aggregation. SFL not only reduces communication overhead and latency comparing with SL, but also reduces vehicle computing load, which makes it more suitable for VEI systems. Firstly, SFL enables the vehicles to participate in training without compromising data privacy, thus reducing the risk of data leakage. It enhances vehicle engagement and provides a more distributed vehicle data. Secondly, by offloading part models to RSU, SFL alleviates the computational bottleneck of vehicles. Thirdly, such a parallel design greatly enhances the scalability of SFL systems compared to SL/FL, allowing the system to accommodate more vehicles within the communication range of RSU, especially in high-speed mobility scenarios. These advantages shed the light on emerging applications such as cooperative autonomous driving, intelligent traffic navigation, traffic signal operation, and electric vehicle charging management.

Refer to caption
Figure 1: The workflow of centralized learning, federated learning, split learning and split federated learning.

However, SFL still faces new challenges when it is applied to VEI due to the mobility and constrained resources of vehicular network [8]. Firstly, continuously moving vehicles may drive out of the RSU RSU’s communication range during the training process, leading to interruptions in training. Thus, how to select as many vehicles as possible that can successfully transmit data to participate in training becomes a critical issue. Secondly, there is a significant difference in the computing capabilities of different vehicles. Choosing different partition layers affects system latency, energy consumption, and even privacy. For different vehicles, selecting appropriate cut layers to minimize latency, energy consumption, and maximize privacy becomes another important challenge. Last but not the least, compared to FL, SFL offloads some models to RSU RSU, reducing computational overhead on the vehicle side but increasing communication overhead, essentially trading communication time for computational time. Since multiple vehicles participate in training in a single round, balancing system computational latency and communication latency to minimize overall system latency is also an important consideration.

In this work, we are motivated to study the SFL-empowered VEI system in light of growing attention toward ITSs and booming development of VEI. To the best of our knowledge, this work represents an early attempt to provide a comprehensive overview of SFL-empowered vehicular network. The rest of this paper is as follows. We present the background information including system architecture, performance metrics and challenges for VEI, with a special focus on the distributed implementation. Then we introduce a novel parallel and adaptive structure of SFL as an enabling technology for VEI, and provide a case study to evaluate it in real communication environment. Finally, open research directions are discussed.

Refer to caption
Figure 2: VEI system architecture.

II Vehicular Edge Intelligence: Architecture, Performance Metrics and Challenges

In this section, we will firstly introduce system architecture and then introduce intelligent metrics of vehicular network systems from the aspects of training and testing, time and energy, privacy and security. Then we analyze the facing challenges in the distributed implementation of VEL.

II-A System Architecture

VEI utilizes the computing and communication resources of vehicles, combined with AI technologies. VEI relies on the effective utilization of extensive data gathered from numerous vehicles for model training. We can devide the implementation of VEI into four groups including centralized machine learning (CL), distributed collaborative FL, SL and SFL, to achieve data processing and decision-making close to the data source. The details of these four schemes are shown in Fig. 1.

CL aggregates training data at centralized locations, such as cloud data centers. However, the transmission of vast vehicular data to these centers not only strains network bandwidth but also exacerbates latency issues. Additionally, vehicular data often encompasses sensitive information, including personal data related to user information (e.g., license plate numbers, facial features, and vehicle details). Thus, the imperative to retain data on local devices emerges to safeguard user privacy.

Distributed collaborative learning emerges as a promising technical strategy under exploration by the research community to tackle these challenges of CL. Generally, distributed collaborative learning entails the joint training of a global model through collaboration, without direct access to the decentralized raw data. This approach holds significant appeal for applications seeking to leverage the wealth of data generated within a distributed IoT environment. Notably, FL and SL stand out as two representative and emerging methods within the realm of distributed collaborative learning.

FL allows multiple data owners to work together to train a shared AI model without revealing their individual data. In FL, each vehicle independently trains a local AI model using its own data, which is then aggregated with other models by a central server to form a global model, thereby preserving data privacy by keeping data on local vehicles. It’s a promising solution for dealing with data challenges in IoV. However, FL requires each vehicle to have enough resources for training AI models, which might be difficult for resource-limited vehicles, especially for complex models like deep neural networks.

SL is another distributed collaborative learning approach, the whole model is partitioned to be collaboratively trained at the vehicles and RSU. The split learning operated in three main steps. Initially, the vehicle downloads the vehicle-side model and performs forward propagation to transmits the processed data to the RSU. Subsequently, the RSU conducts forward and backward propagation of RSU-side model, then updating the RSU-side model and broadcasting the gradient associated with the cut layer back to vehicles for vehicle-side model update. Next, the updated device-side model is transferred to the next vehicle to repeat the above process until all the devices are trained. SL allows vehicles to offload part of the model training task to a RSU thus making it possible to leverage flexible resource management in computing for supporting model training. Therefore, SL may greatly facilitate the resource aspect of ubiquitous intelligence in IoV. However, the sequential vehicle-RSU collaboration in SL limits its capability of involving the IoV big data dispersed across a large number of vehicles for model training.

Combining the advantages of SL and FL, SFL can not only allows vehicles to offload part of the model training task to a RSU, but also parallel training. SFL mainly constraints three steps. Firstly, the vehicle downloads the vehicle-side model and execute forward propagation to upload the smashed data to the RSU. Secondly, the RSU perform the forward and backward propagation with received smashed data, and then broadcasts the gradients of smashed data. Finally, the updated device-side model is upload to RSU for aggregation.

II-B VEI Performance Metrics

II-B1 Training &\&& Testing

For AI models, model performance is the primary measure of their effectiveness. Training accuracy is a critical indicator of the model’s learning capacity, reflecting its performance on the training data. On the other hand, testing accuracy serves as a crucial metric for assessing the model’s generalization capability, measuring its performance on unseen data. A model with strong generalization abilities can effectively handle various data distributions and scenarios, not just performing well on training data. Therefore, for AI models within the vehicular network, besides considering challenges such as computational power and wireless resource allocation, it is essential to focus on model performance to ensure their learning and generalization capabilities in practical applications, especially in the case of non-IID distribution of data.

II-B2 Time &\&& Energy

In vehicular network, tasks often have high time sensitivity, compounded by the high-speed mobility of vehicles. Given that different vehicles stay within the communication range of RSUs for a short and uncertain period, the overall system’s model training time becomes a crucial metric. Additionally, energy consumption serves as a significant evaluation criterion. Vehicles are primarily served as transportation tools. While utilizing onboard data for training can enhance the passenger experience, it is crucial to avoid excessive energy consumption during model training to ensure it does not compromise the vehicle’s primary transport function. Therefore, energy consumption should also be taken into consideration.

II-B3 Privacy &\&& Security

With growing concerns regarding data privacy, especially in the domain of connected vehicles, data leakage not only compromises property security but also poses more severe risks to personal safety and traffic safety. Privacy regulations, such as the EU’s General Data Protection Regulation (GDPR) or the California Consumer Privacy Act (CCPA), impose limitations on the collection and immediate utilization of users’ sensing or perception data for model training and inference [9]. The local gradients uploaded by clients could potentially be exploited by attackers to infer the membership of individual data samples[10]. In SL, the data owner and the label owner also have privacy issues, although they only share the intermediate data, i.e., the smashed data and the cut layer gradients. The smashed data (i.e., the extracted features) shared from the data owner may be used by the attacker to reconstruct its training data [11]. To alleviate the burden of privacy protection on the network side, the model training framework necessitates retaining raw data within local vehicles or using some privacy technology such as differential privacy technology [12].

Refer to caption
Figure 3: The workflow of adaptive split federated learning.

II-C Challenges

In this subsection, we focus on the challenges faced by distributed machine learning in implementing VEI from different layers.

II-C1 Data Layer

The intelligence of VEI is derived from the significant volume of data engaged in its training process obtained from vehicles. From data layer perspective, the vehicular network primarily encounters two challenges to achieve VEI. Firstly, a large amount of devices have massive amounts of data, which means more energy is required for training, and not all data is useful, as even many data are redundant. Secondly, in the Internet of Vehicles (IoV), the data involved in training exhibits high heterogeneity, which means they may originate from different types of vehicles, various geographical locations, and diverse driving conditions. This heterogeneity poses significant challenges to model training because models need to effectively capture and adapt to these variations.

II-C2 Computation Layer

Computational capability is crucial for training more complex AI models to provide intelligent services, particularly as we enter the era of Large Models. However, in the context of time-sensitivity, the task of training AI models on vehicles with restricted computing power can prove excessively time-consuming. Moreover, the heterogeneity of computing platforms presents a significant challenge, stemming from the diverse types and computational capabilities of vehicles, thereby inducing instability in system operations. In addition, the large AI models does indeed bring challenges. Traditional FL requires training the entire model on each device, which may not be realistic for vehicles with limited resources. In fact, running large models on vehicles is not economical because it consumes a lot of energy. The main task of vehicles is intelligent transportation, and there is no need to sacrifice the electricity required for transportation for model training.

II-C3 Communication Layer

In distributed learning, high-performance wireless networks play a crucial role in accelerating the implementation of VEI, as intermediate parameters of model training need to be transmitted through wireless connections. For instance, in FL, models parallel upload to the RSU for aggregation; in SL, features and their corresponding gradients must be communicated between vehicles and RSU via wireless networks. However, owing to the inherent instability of wireless networks, the extensive participation of vehicles in training, and the varying distances between vehicles and the RSU, not all vehicles can access sufficient communication bandwidth. Moreover, due to the high-speed movement of vehicles, some may exit the communication range of the RSU during training, hindering model completion.

II-C4 System Layer

Different designs in training architectures have multifaceted impacts on the implementation of VEI. Choosing an appropriate training architecture not only affects the system’s performance and efficiency but also involves aspects such as privacy, security, scalability, and flexibility. A centralized training architecture may lead to increased burdens in data transmission and computation, resulting in system latency and energy consumption increasing, as well as posing risks of data leakage. Conversely, a distributed training architecture offers better privacy protection and security, as well as improved scalability and flexibility. Therefore, selecting the right training architecture is crucial, requiring comprehensive consideration of multiple factors to ensure the efficient and secure operation of the VEI system. In traditional SL, the RSU with RSU-side models must sequentially serve vehicles with unique local datasets. Due to this sequential training process, the overall latency for each training epoch increases linearly with the number of vehicles. Such heightened latency could impede the scalability of split learning, especially for extensive ITS vehicles deployments.

III Parallel and Adaptive Split Federated Learning

Bearing in mind the aforementioned challenges, such as non iid distribution of data, shortage of computing resources, tight communication resources, and low system scalability, this section introduces an Adaptive Split Federated Learning (ASFL) scheme. The system can dynamically adjust the cut layer based on environmental conditions for every vehicle. Compared to traditional SL and FL, this scheme reduces communication overhead and computational overhead, which makes the distributed learning more adaptable to the mobility characteristics of the vehicular network.

III-A Adaptive Split Federated Learning Scheme

We consider a general vehicular network that includes one RSU and a set of N𝑁Nitalic_N vehicles. The data set of the vehicle n𝑛nitalic_n is denoted as 𝒟nsubscript𝒟𝑛\mathcal{D}_{n}caligraphic_D start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, and |𝒟n|subscript𝒟𝑛\left|\mathcal{D}_{n}\right|| caligraphic_D start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | is the number of training data samples of vehicle n𝑛nitalic_n.

The objective is to collaboratively train a global AI model that minimizes the global loss function based on the global dataset collected from all vehicles.

min𝝎L(𝝎)=1n=1N|𝒟n|n=1N|𝒟n|Ln(𝝎),subscript𝝎𝐿𝝎1superscriptsubscript𝑛1𝑁subscript𝒟𝑛superscriptsubscript𝑛1𝑁subscript𝒟𝑛subscript𝐿𝑛𝝎\min_{\boldsymbol{\omega}}L(\boldsymbol{\omega})=\frac{1}{\sum_{n=1}^{N}\left|% \mathcal{D}_{n}\right|}\sum_{n=1}^{N}{\left|\mathcal{D}_{n}\right|}L_{n}(% \boldsymbol{\omega}),roman_min start_POSTSUBSCRIPT bold_italic_ω end_POSTSUBSCRIPT italic_L ( bold_italic_ω ) = divide start_ARG 1 end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT | caligraphic_D start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | end_ARG ∑ start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT | caligraphic_D start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | italic_L start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( bold_italic_ω ) ,

Where Ln(𝝎)subscript𝐿𝑛𝝎L_{n}(\boldsymbol{\omega})italic_L start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( bold_italic_ω ) denotes the local loss function of vehicle n𝑛nitalic_n.

The full model of vehicle n𝑛nitalic_n in the t𝑡titalic_t-th round 𝝎tn,ϵsuperscriptsubscript𝝎𝑡𝑛italic-ϵ\boldsymbol{\omega}_{t}^{n,\epsilon}bold_italic_ω start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n , italic_ϵ end_POSTSUPERSCRIPT includes two non overlapping sub-models with ϵitalic-ϵ\epsilonitalic_ϵ-th cut layer, represent as vehicle-side model 𝝎tV,ϵsubscriptsuperscript𝝎𝑉italic-ϵ𝑡\boldsymbol{\omega}^{V,\epsilon}_{t}bold_italic_ω start_POSTSUPERSCRIPT italic_V , italic_ϵ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and RSU-side model 𝝎tS,ϵsubscriptsuperscript𝝎𝑆italic-ϵ𝑡\boldsymbol{\omega}^{S,\epsilon}_{t}bold_italic_ω start_POSTSUPERSCRIPT italic_S , italic_ϵ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, it can be denoted by 𝝎tn,ϵ={𝝎tV,ϵ;𝝎tS,ϵ},superscriptsubscript𝝎𝑡𝑛italic-ϵsubscriptsuperscript𝝎𝑉italic-ϵ𝑡subscriptsuperscript𝝎𝑆italic-ϵ𝑡\boldsymbol{\omega}_{t}^{n,\epsilon}=\{\boldsymbol{\omega}^{V,\epsilon}_{t};% \boldsymbol{\omega}^{S,\epsilon}_{t}\},bold_italic_ω start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n , italic_ϵ end_POSTSUPERSCRIPT = { bold_italic_ω start_POSTSUPERSCRIPT italic_V , italic_ϵ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ; bold_italic_ω start_POSTSUPERSCRIPT italic_S , italic_ϵ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT } , and the global model update principle is as follows:

𝝎t+1=𝝎tn=1N1N(𝝎t+1n,ϵ𝝎t),subscript𝝎𝑡1subscript𝝎𝑡superscriptsubscript𝑛1𝑁1𝑁superscriptsubscript𝝎𝑡1𝑛italic-ϵsubscript𝝎𝑡\displaystyle\boldsymbol{\omega}_{t+1}=\boldsymbol{\omega}_{t}-\sum_{n=1}^{N}% \frac{1}{N}(\boldsymbol{\omega}_{t+1}^{n,\epsilon}-\boldsymbol{\omega}_{t}),bold_italic_ω start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT = bold_italic_ω start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - ∑ start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ( bold_italic_ω start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n , italic_ϵ end_POSTSUPERSCRIPT - bold_italic_ω start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ,

III-B Workflow of ASFL

The workflow of ASFL is shown in Fig. 3. Firstly, the RSU chooses different cut layers and distributes different vehicle-side model to different vehicles according to cut layer selection strategy. Secondly, the vehicles train vehicle-side model they received over the local datasets parallel, and then send the corresponding smashed data, to the RSU. Thirdly, the RSU is supposed to have sufficient resources and can provide powerful computing capability, such that it sequentially performs forward propagation to the RSU-side model with the received smashed data to calculate the loss function respectively, and then broadcasts the gradients of smashed data. Finally, vehicles update their vehicle-side model and upload to RSU for aggregation and global whole model updating.

III-C Cut Layer Selection Strategy

Because vehicles continue to move on the road and wireless transmission channel environment are unstable. Moreover, different sensors and hardwares among vehicles result in significant differences in data and computation capacities. To reduce convergence time and enhance learning accuracy, we propose a cut layer selection strategy based on data transmission rate.

We assume that all vehicles remain same time within the RSU communication range. Therefore, we only consider the transmission rates of different vehicles to choose the cut layers. As shown in Fig. 5(5(a)), the communication load of SL is significant, as it not only requires the transmission of some vehicle models but also the transmission of intermediate messages for model training. So when the vehicle’s tranmission rate is higher, we can choose a smaller split layer to reduce communication load achieving a better communication latency. We using Resnet18 as whole training model, the model has a total of 9 split points showing in the Fig. 4. The cut layer selection strategy is shown in the following:

cn={2,0<rntR1¯4,R1¯<rntR2¯6,R2¯<rntR3¯8,R3¯<rntR4¯subscript𝑐𝑛cases20superscriptsubscript𝑟𝑛𝑡¯subscript𝑅14¯subscript𝑅1superscriptsubscript𝑟𝑛𝑡¯subscript𝑅26¯subscript𝑅2superscriptsubscript𝑟𝑛𝑡¯subscript𝑅38¯subscript𝑅3superscriptsubscript𝑟𝑛𝑡¯subscript𝑅4c_{n}=\begin{cases}2,&0<r_{n}^{t}\leq\bar{R_{1}}\\ 4,&\bar{R_{1}}<r_{n}^{t}\leq\bar{R_{2}}\\ 6,&\bar{R_{2}}<r_{n}^{t}\leq\bar{R_{3}}\\ 8,&\bar{R_{3}}<r_{n}^{t}\leq\bar{R_{4}}\\ \end{cases}italic_c start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = { start_ROW start_CELL 2 , end_CELL start_CELL 0 < italic_r start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ≤ over¯ start_ARG italic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG end_CELL end_ROW start_ROW start_CELL 4 , end_CELL start_CELL over¯ start_ARG italic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG < italic_r start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ≤ over¯ start_ARG italic_R start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG end_CELL end_ROW start_ROW start_CELL 6 , end_CELL start_CELL over¯ start_ARG italic_R start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG < italic_r start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ≤ over¯ start_ARG italic_R start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_ARG end_CELL end_ROW start_ROW start_CELL 8 , end_CELL start_CELL over¯ start_ARG italic_R start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_ARG < italic_r start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ≤ over¯ start_ARG italic_R start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT end_ARG end_CELL end_ROW

where rnsubscript𝑟𝑛r_{n}italic_r start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT is the wireless transmission speed of vehicle n𝑛nitalic_n in the t𝑡titalic_t-th round, R1¯R2¯R3¯R4¯¯subscript𝑅1¯subscript𝑅2¯subscript𝑅3¯subscript𝑅4\bar{R_{1}}\leq\bar{R_{2}}\leq\bar{R_{3}}\leq\bar{R_{4}}over¯ start_ARG italic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG ≤ over¯ start_ARG italic_R start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG ≤ over¯ start_ARG italic_R start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_ARG ≤ over¯ start_ARG italic_R start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT end_ARG are the constraint speed.

III-D Performance Evaluation

III-D1 Experiment Setting

In our experiment, we utilized the NVIDIA GeForce RTX 3060 GPU as the RSU, while the 3060 CPU served as vehicles. In total, there are four vehicles, and we use sockets to communication between vehicles and servers.

The learning rate is 0.0001, and the batch size is 16, local epochs is 5. We consider three baselines, FL, SL and SFL. The number of SFL2,4,6,8 means the number of cut layer. We using CIFAR10 [13] as our simulation dataset. Notably, data distribution at vehicles is non-IID, which widely exists in practical systems. To capture the heterogeneity among mobile vehicles in these datasets, we impose a constraint where each vehicle retains only six out of the ten possible labels, with sample sizes varying according to a power law as described in [14]. The official implementations of ASFL are available at [15].

Refer to caption
Figure 4: ResNet18 Model Structure.
Refer to caption
(a) Communication load
Refer to caption
(b) Communication delay
Refer to caption
(c) Test accuracy under IID distribution
Refer to caption
(d) Test accuracy under non-IID distribution
Figure 5: Performance of case study.

III-D2 Performance Analysis

Fig. 5(5(a)) shows that the communication overload of different schemes with one local epoch and one round is decrease with the increase of the number of cut layer. As We can see, the overload of SL and SFL is much more than that of FL. This is because the intermediate values calculated by the model need to be transmitted by the network.

Fig. 5(5(b)) shows the overall training time of different system design. The serial SL calculates and communicates with four vehicles in sequence, which consumes an additional significant amount of time. The proposed ASFL cost less time than FL and SL show that that the ASFL performance well. Although the communication loads of ASFL and SL are much higher than that of FL, it can be concluded from the experiment that the training time of ASFL is slightly less than that of FL, indicating that ASFV is using increased communication load to reduce computational load and finally reduce global training time is reasonable, which reflects the rationality of the ASFV architecture.

Fig. 5(5(c)) shows the testing performance under iid data distribution. As we can see, the SFL schemes have better performance that SL and FL. Surprisingly, we found a correlation between the model performance of the SFL scheme and the choice of cut layers, with the performance improving as we chose the later cut layers. Fig. 5(5(d)) shows the testing performance under non-IID distribution, which every vehicle only choose six classes data out of ten. The SL has better performance compared with FL. Notably, our proposed ASFL scheme outperforms other alternatives.

IV Open Research Directions

SFL, serving as distributed learning framework, have attracted significant attention, yet the research remains in its early stages. Especially within the domain of vehicular networks, there are quite a few research directions awaiting for further investigation.

IV-A Data Generation and Selection

The advantage of AI lies in using a large amount of local device data for training. However, in SFL, while the system can access massive data from numerous devices, these data are commonly non-IID, resulting in limitations in the learning and generalization ability of the system model. With the development of Artificial Intelligence Generated Content (AIGC) technology, we consider using it to assist in generating data to mitigate the impact of non-IID data distribution on the model, thereby improving the performance of the system. The use of data generation for training raises three key issues. Firstly, how to evaluate or measure the effectiveness of data. Secondly, considering the existence of invalid or redundant data in large-scale training datasets, how to select data to reduce communication and computational burden and avoid resource waste. Thirdly, how to balance data generation and model training performance in mobile scenarios of vehicle networks.

IV-B Cut Layer Selection

In SFL, the global model is divided into non-overlapping vehicle(user)-side models and RSU-side models through cut layers. The latter the cut layer is chosen, the smaller the size of the smashed data. As vehicle speed increases, the channel stability between vehicles and RSU weakens, thus selecting cut layers latter to reduce communication load. However, with the acceleration of vehicle speed, the time spent by vehicles within the RSU communication range decreases. Therefore, selecting cut layers former reduces vehicle-side model computation time. Thus, designing a cut selection strategy to balance vehicle communication and computational resources, as well as time-energy overload, to achieve minimal overall training time is a significant consideration.

Splitting strategies need to consider not only the balance between computation and communication but also the balance between privacy protection and cost. The latter the split layer, the greater the computational load on vehicles, the smaller the communication load, and the better the privacy of the smashed data (the output smashed data will be more blurred). Therefore, it is necessary to consider how to balance communication overhead and privacy for vehicles with different speeds and capabilities.

IV-C Split Inference

As Transformer, AIGC, and LLM technologies advance, an increasing number of intelligent vehicular networking services rely on their support. However, directly deploying large-scale models for training on vehicles is impractical. Firstly, vehicles lack sufficient computing resources to support such tasks. Secondly, running large models on vehicles consumes excessive energy, thereby impacting vehicle payload capacities. Therefore, employing the concept of distributed learning, known as split learning, to decompose Transformer architecture models into what is termed split inference, has become a noteworthy research direction. This approach can be applied in vehicular networking environments assisted by AIGC and LLM to reduce demands on vehicle-side resources while maintaining system performance and efficiency.

Distinguishing split inference from split learning is crucial. In SL, the outputs of the cut layer (smashed data) are shared during forward propagation, while only the gradients from the smashed data are transmitted back to the vehicle during backpropagation. In contrast, split inference involves sending the outputs of the cut layer to the server without requiring backpropagation.

IV-D Wireless Resource Allocation

When discussing distributed learning frameworks, it is essential to use wireless networks to transmit a large amount of intermediate training data. In this context, SFL creates a communication burden during the training process. Considering a large number of vehicles involved in each training epoch, it is crucial to allocate communication resources wisely and effectively to accelerate the implementation of SFL. Multi-objective optimization can be comprehensively explored by combining variables such as delay, energy consumption, convergence time, and learning accuracy, thereby promoting the rational resource allocation. In addition, designing incentive mechanisms for resource allocation is also a means of promoting rational resource allocation. When formulating incentive plans, not only must factors such as CPU frequency, spectrum, and energy costs be considered, but the impact of transmission interference on other users can also be considered as a cost factor. Advocacy rooted in various frameworks of game theory, contract theory, and auction theory can be proven useful in ASFL incentive design within vehicular network.

IV-E Parallel Design

In the vehicular network, the number of vehicles within the RSU communication range varies at each time slot. In addition to designing optimization algorithms or game algorithms to minimize system overhead or maximize model performance, it is also necessary to consider the scalability of the system. When many vehicles join the network, a reasonable and high-efficient parallel design can help to avoid a linear or exponential increase in system latency, ensuring strong scalability of the system.

V Conclusion

This paper proposed a comprehensive review and introduction to the concept of SFL for VEI. The inherently serial and distributed nature of SFL poses challenges when being applied to large-scale vehicular network. To address these challenges, we propose a SFL scheme to empower the development of VEI, coupled with a mobility-adaptive cut layer selection strategy. Through case study, we demonstrate the advantages of the proposed ASFL. Furthermore, we provide more insights on the design of SFL and suggest potential future research directions.

References

  • [1] X. Zhang, J. Liu, T. Hu, Z. Chang, Y. Zhang, and G. Min, “Federated learning-assisted vehicular edge computing: Architecture and research directions,” IEEE Vehicular Technology Magazine, 2023.
  • [2] C. Yang, M. Xu, Q. Wang, Z. Chen, K. Huang, Y. Ma, K. Bian, G. Huang, Y. Liu, X. Jin et al., “Flash: Heterogeneity-aware federated learning at scale,” IEEE Transactions on Mobile Computing, 2022.
  • [3] J. Shen, N. Cheng, X. Wang, F. Lyu, W. Xu, Z. Liu, K. Aldubaikhy, and X. Shen, “RingSFL: An Adaptive Split Federated Learning Towards Taming Client Heterogeneity,” 5 2023.
  • [4] L. Ma, N. Cheng, C. Zhou, X. Wang, N. Lu, N. Zhang, K. Aldubaikhy, and A. Alqasir, “Dynamic neural network-based resource management for mobile edge computing in 6g networks,” IEEE Transactions on Cognitive Communications and Networking, pp. 1–1, 2023.
  • [5] A. Singh, P. Vepakomma, O. Gupta, and R. Raskar, “Detailed comparison of communication efficiency of split learning and federated learning,” arXiv preprint arXiv:1909.09145, 2019.
  • [6] X. Liu, Y. Deng, and T. Mahmoodi, “Wireless distributed learning: a new hybrid split and federated learning approach,” IEEE Transactions on Wireless Communications, vol. 22, no. 4, pp. 2650–2665, 2022.
  • [7] C. Thapa, P. C. M. Arachchige, S. Camtepe, and L. Sun, “Splitfed: When federated learning meets split learning,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 8, 2022, pp. 8485–8493.
  • [8] M. Wu, R. Yang, X. Huang, Y. Wu, J. Kang, and S. Xie, “Joint optimization of model partition and resource allocation for split federated learning over vehicular edge networks,” IEEE Transactions on Vehicular Technology, pp. 1–6, 2024.
  • [9] X. Lyu, S. Liu, J. Liu, and C. Ren, “Scalable aggregated split learning for data-driven edge intelligence on internet-of-things,” IEEE Internet of Things Magazine, vol. 6, no. 4, pp. 124–129, 2023.
  • [10] M. Song, Z. Wang, Z. Zhang, Y. Song, Q. Wang, J. Ren, and H. Qi, “Analyzing user-level privacy attack against federated learning,” IEEE Journal on Selected Areas in Communications, vol. 38, no. 10, pp. 2430–2444, 2020.
  • [11] Z. He, T. Zhang, and R. B. Lee, “Attacking and protecting data privacy in edge–cloud collaborative inference systems,” IEEE Internet of Things Journal, vol. 8, no. 12, pp. 9706–9716, 2020.
  • [12] M. Wu, G. Cheng, D. Ye, J. Kang, R. Yu, Y. Wu, and M. Pan, “Federated split learning with data and label privacy preservation in vehicular networks,” IEEE Transactions on Vehicular Technology, vol. 73, no. 1, pp. 1223–1238, 2024.
  • [13] A. Krizhevsky, G. Hinton et al., “Learning multiple layers of features from tiny images,” 2009.
  • [14] T. Li, A. K. Sahu, M. Zaheer, M. Sanjabi, A. Talwalkar, and V. Smith, “Federated optimization in heterogeneous networks,” Proceedings of Machine learning and systems, vol. 2, pp. 429–450, 2020.
  • [15] X. Qiang, “Adaptivesplitfederatedlearning,” https://github.com/XiankeQiang/AdaptiveSplitFederatedLearning, accessed: 2024-06.