Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

Access Characteristic-Guided Remote Swapping Across Mobile Devices

Published: 20 November 2024 Publication History

Abstract

Memory swapping ensures smooth application switching for mobile systems by caching applications in the background. To further play the role of memory swapping, remote swapping across mobile devices has been widely studied, which caches applications to nearby remote devices by remote paging. However, due to the massive remote I/Os and unguaranteed swap throughput, the current remote swapping is limited with an unsatisfactory user experience, especially under variable network conditions. This paper first studies the access characteristics of applications and clarifies the impact of various network traffic on remote swapping. Motivated by these, an efficient access characteristic-guided remote swapping framework (ACR-Swap+) is proposed to optimize remote swapping across mobile devices with resilient remote paging. ACR-Swap+ first performs selective remote paging based on the swap-in frequency of different processes and then prefetches data across devices based on the process running states. Finally, it conducts hierarchical remote paging to avoid the impact of network traffic on remote swapping. Evaluations on Google Pixel 6 show that ACR-Swap+ reduces the application switching latency by 21.6% and achieves a negligible performance fluctuation under various network traffic compared to the state of the art.

1 Introduction

Modern mobile systems adopt an application cache policy to cache as many applications as possible in the main memory [24], which allows the application to restart quickly and improves user experience. However, due to the restricted memory of mobile devices, the system always suffers memory pressure and numerous applications are directly killed [14]. To expand physical memory, many studies [20, 27, 44, 47] move inactive memory pages to the local storage (i.e., local swapping) for mobile devices. However, since flash-based local storage devices are subject to read/write interference [21, 30, 35] and limited lifetime [37, 46], local swapping is disabled in default on mainstream mobile systems [15]. On the other hand, with the development of mobile devices and wireless networks (e.g., 5G, Wi-Fi Direct), using the memory resource of remote devices (i.e., remote swapping) is becoming a promising choice. This paper will propose a new remote swapping design for mobile devices.
Previously, remote swapping has been widely studied [3, 9, 28, 29], which evicts memory pages to remote devices through the wireless network. For instance, CloudSwap [9] performed remote swapping from local mobile devices to cloud servers to enable memory-oblivious application caching. MobileSwap [28] exploited the unbalanced utilization of memory resources across mobile devices to achieve comparable-to-local remote swapping performance based on existing network infrastructure. However, traditional remote swapping relies on massive remote I/Os, which occupies excessive network resources and affects the interaction of foreground applications [39]. Existing solutions have not addressed it well. This side effect on user experience is amplified when memory is under pressure with more frequent swapping operations between devices. Differently, this paper aims to optimize remote swapping across mobile devices with the awareness of access characteristics of applications, and considering the impact of network fluctuations on the system. The design is motivated by our four interesting observations and a practical challenge.The observations are: (1) most system-service processes are swapped in more frequently than application-specific processes; (2) the swap-in of the system services is concentrated on several specific processes; (3) application-specific processes are accompanied by many Swap-ins when switching from the background to the foreground, while the demand is much less when running in the background; (4) unlike applications, system-service processes are mainly swapped in when running in the non-background state. The challenge is that since remote swapping is a network-sensitive (based on remote paging) task, it is vulnerable to performance loss under undesired network condition changes.
Motivated by the above observations and challenge, we propose ACR-Swap\(^+\), an efficient access characteristic-guided remote swapping framework to minimize the disadvantages of massive remote I/Os for user experience optimization. ACR-Swap\(^+\) consists of three schemes. First, a system service-aware page sifting (SPS) scheme is proposed to prevent page swapping of critical system-service processes to remote devices based on the swap-in frequency of different processes. Second, a running state-oriented prefetching (RSP) scheme is proposed to resize the prefetching window size based on the processes running states to reduce the swap-in latency of remote devices. Third, since remote swapping is a network-sensitive (high swap throughput) task, it is easily interfered by undesired network conditions. Especially when other cross-device tasks and remote swapping grab network bandwidth, the original remote swapping will suffer great performance loss. To improve the stability of our design, a network traffic-aware remote paging (NaRP) scheme is further proposed to monitor the network condition and perform hierarchical remote swapping. ACR-Swap\(^+\) is implemented on real mobile devices. Experimental results show that ACR-Swap\(^+\) can achieve encouraging application switching performance and application caching capability. In summary, this paper makes the following contributions:
The access characteristics of applications and a practical challenge of current remote swapping are deeply analyzed to guide the design of remote swapping.
SPS prevents the swap frequency processes from swapping to the remote devices. RSP performs adaptively prefetching based on the processes running states. NaRP performs hierarchical remote paging based on the network traffic.
ACR-Swap\(^+\) is deployed on real mobile devices. Experimental results show that the application switching latency is reduced by 21.6% compared to the state of the art.

2 Background and Related Work

2.1 Memory Swapping in Mobile Systems

In mobile systems, memory swapping occurs when the available memory achieves the watermark threshold. During memory swapping, inactive pages are evicted from the main memory and delivered to the swap area as I/O requests by a kernel thread named kswapd. This process is called swap-out. When a swapped-out page is demanded, the system will trigger a page fault and the page is swapped back to the memory through a new I/O request. This process is called swap-in. Based on the location of the swap area, memory swapping can be classified into two categories: local swapping and remote swapping. Local swapping means the system allocates the swap area in the local flash-based storage (i.e., flash-based swapping), while remote swapping means that the swap area is in the remote device memory, as shown in Figure 1.
Fig. 1.
Fig. 1. Memory swapping in mobile devices: (a) Local Swapping: pages are swapped between the local main memory and flash-based swap area through the generic block layer. (b) Remote Swapping: pages are swapped between the local main memory and the remote swap area through the remote I/O virtual block layer.
For the local swapping [20, 27, 33, 44, 47], memory pages are moved between DRAM and local storage through the I/O block layer. However, due to the read/write interference and limited endurance for flash devices [21, 30, 35, 37, 46], the system performance is significantly affected by swapping-induced I/Os. To address this issue, remote swapping is proposed to alleviate the memory pressure with the aid of remote device memory [3, 6, 9, 19, 28, 31, 38]. For remote swapping, when the system faces memory pressure, the victim memory pages will be swapped out to a remote device [36]. The memory pages are swapped between the main memory and the remote swap area through the remote I/O virtual block layer based on the typical socket (e.g., network block device (NBD)). Another type of memory swapping in mobile systems is compression-based swapping (i.e., ZRAM [23]), which compresses inactive memory pages and stores them in a separate memory area. And then, when the application accesses the compressed pages, they are decompressed back into memory. ZRAM is faster than traditional flash-based swapping mechanisms because it avoids long-latency I/O accesses. However, the disadvantage of this approach is that compressing memory pages still consumes memory capacity, while compression/decompression costs CPU cycles. Many commercial mobile operating systems (e.g., Android) enable this mechanism by default [15]. However, in practice, Android does not actively exploit this mechanism by default because the Android low memory killer daemon (lmkd) [2, 14] is usually triggered first to reclaim memory space before swapping occurs.

2.2 Processes Management in Android

Android is a widely used mobile system based on Linux. Process management in Android is specially designed to ensure smooth application. First, Android divides all processes into two categories: system-service and application-specific. The system-service processes provide general services to support application running. The application-specific process starts running when the application is started. Second, to determine which process should be terminated when memory is insufficient, Android will sort them in order of importance according to the components running in each process and their running states. Briefly, it includes four types [16]. (1) A foreground process represents the processes being used by the user. (2) A visible process is doing work that the user is currently aware of. (3) Service processes are not directly visible to the user, while they generally provide support for foreground applications (e.g., background network activity). (4) A cached process is not needed temporarily and is cached in memory or a swap device.

2.3 Related Work

Remote Swapping System. Many previous works adopted remote swapping to release the system memory pressure. Sprite [36] presented a network operating system and provided sharing, flexibility, and high performance to networked workstations by a set of kernel calls. In terms of the virtual memory system, it used ordinary files for backing storage to simplify process migration, to share backing storage between workstations, and to capitalize on server caches. In the data center, recent studies [4, 6, 19, 31, 38, 42, 43] leverage remote swapping to implement memory disaggregation based on RDMA. With the development of mobile devices and wireless communication technologies (e.g., 5G and WiFi-Direct), users often own multiple mobile devices and the performance of remote I/O has been significantly improved, while mobile devices have always suffered from memory pressure. For mobile devices, remote swapping is able to expand the limited physical memory and avoid the disadvantages of traditional memory swapping, which has been widely studied. The problem of energy consumption of remote swapping in mobile devices is addressed [3]. CloudSwap [9] made use of the cloud resource to release the memory pressure of local devices to enable the memory-oblivious app caching. Moreover, two cloud-assisted prefetching schemes were adopted to reduce remote swap-in latency. Different from this work, we use a page filtering strategy based on application characteristics to ensure efficient page transfer between different mobile devices without the intervention of cloud resources and the help of local storage devices. MobileSwap [28] built a dedicated network channel to improve the network throughput for remote swapping under existing network infrastructure. LegoSwap [26] further proposed elastic swap area management to ensure user interaction. Unlike these two, this paper does not seek to change the network itself, but aims to realize the network-aware remote paging based on the application characteristics to boost user experience.
Application Running Characteristic Analysis. Prior studies have been conducted to analyze application running characteristics to design a more efficient memory management mechanism [25, 27, 44]. Marvin [25] combines the upper-layer garbage collection (GC) of the system with the underlying swap mechanism to optimize memory management due to the gap between the system GC and the swap mechanism. SEAL [27] builds a hybrid swap system including local storage and memory by exploiting the page access characteristics of the different running stages of the application. lcSWAP [44] revisits swap I/O size based on page compressibility to reduce write stress. Unlike these works, ACR-Swap\(^+\) identifies the swapping characteristics of different processes of an application and distinguishes between system-service processes and application-specific processes. And then ACR-Swap\(^+\) analyzes the performance characteristics of remote swapping systems in different network environments.
The proposed methodology is more relevant to the work presented in [29]. Compared to that, the present work (i) further analyzes the swap-in characteristics of the system service process, including the swap-in frequency and the impact of running states on swap-in; (ii) redesigns the original page filtering and prefetch mechanism to account for forgotten system-service processes; (iii) clarifies the impact of network traffic on remote swapping and proposes a network traffic-aware remote paging scheme; (iv) redesigns the experiment on latest platform and provides in-depth analysis of evaluation results, impact on application caching ability and switching time, which is critical to user experience.

3 Studies on Access Characteristics of Applications

As mentioned above, many previous works have been proposed to improve the performance of remote swapping from different aspects. Unlike them, we studied the access characteristics of applications regarding swap-in, which will be used to optimize the user experience. Precisely, this section first measures the swap-in frequency for various processes. And then, the swap-in distribution of different processes at different running states is also analyzed. We adopt two Google Pixel 6 smartphones as our platform. It is equipped with an Octa-core ARM CPU, 8 GB DRAM, and 128 GB UFS 3.1 storage space. To deploy the remote swapping system, one device is configured as the local device while the other is configured as the remote device. The detail is shown in Section 7.1.

3.1 Swap-in Frequency for Various Processes

3.1.1 Swap-in Frequency between System Services and Applications.

In this part, to collect the swap-in traces, 12 pre-installed popular applications are run repeatedly for ten rounds. In each round, applications are switched to the foreground in random order, and each of them runs in the foreground for five minutes. Then, the average number of Swap-ins is collected for ten rounds. Figure 2 presents the number of Swap-ins for each process. Actually, the processes are classified into two types: system-service and application-specific. The results show that the swap-in frequency of the system-service processes is much higher than that of the application-specific processes. For example, the number of Swap-ins of the thread CrRenderMain” used for interface rendering reaches 32,147, while for the application-specific process like Facebook, it has only 2,641. They are in different orders of magnitude, indicating the potential to store the pages of system-service processes locally.
Fig. 2.
Fig. 2. System running process characterization for Swap-ins.
Observation 1.
The frequency of Swap-ins corresponding to system-service processes is much higher than that of application-specific processes.

3.1.2 Swap-in Frequency among Various System Services.

With the dumpsys command [11], we found that there are hundreds of system-service processes running generally in the mobile system. This number is many times to the number of applications (i.e., 20 to 30 [32]) users use regularly. Therefore, the system-service process directly affects the performance of the swap subsystem. In order to further understand the swap-in frequency of system services, we collected 14 system services with the highest frequency based on the above experiments. As shown in Figure 3, the swap-in frequency of different system services varies greatly. For example, the highest CrRenderMain” (i.e., 32,141) is 669% of the commonly used searchbox:search” (i.e., 4798). This is because some system services are more common and some only correspond to infrequently used functions. For example, RenderThread” is responsible for UI interface rendering, and it will be constantly called during the application running. However, system services like inputmethod”, responsible for input methods, or searchbox:search” providing search entry, are often called when the user needs them.
Fig. 3.
Fig. 3. System Services characterization for Swap-ins.
Observation 2.
Several system-service processes that serve nearly all applications are swapped in more frequently than others that specialize in certain functions.

3.2 Swap-in Disparity among Various Processes at Different States

3.2.1 Applications Perform Different Swap-in at Different States.

In this study, we analyze the Swap-ins for the application at different states. Actually, the running state of the application often corresponds to the cached process and the foreground process for the Android system. To simplify the distinction, an application in the cached state is defined as the background, and an application switching from the cached state to the foreground is defined as from background to foreground. Figure 4 shows the percentages of Swap-ins for applications at different states. The number of Swap-ins performed by the applications in the background and foreground is minimal, accounting for 15.6% of the entire activity cycle, on average. While during the app switching from background to foreground, the number of Swap-ins increases significantly, and a large amount of data is swapped in, reaching 84.4% on average.
Fig. 4.
Fig. 4. Breakdown of Swap-ins for applications at different running states.
Observation 3.
There are significant differences in the number of Swap-ins between applications with different states.

3.2.2 Swap-in Disparity for System Services at Various States.

To further understand the swap-in characteristics of the system services, we also analyze the Swap-ins for system services at different states. To simplify the distinction, a system service in the cached state is defined as the background, and a system service in other states (i.e., visible, service and foreground processes) is defined as non-background. Figure 5 shows the percentages of Swap-ins for system services at different states. First, most system services perform swap-in in a non-background state. For example, when systemui” and surfaceflinger” perform swap-in, the process is entirely in a non-background state. Second, some system services run in the background constantly, causing the process to stay in the background when it performs swap-in.
Fig. 5.
Fig. 5. Breakdown of Swap-ins for system services at different running states.
Observation 4.
Most of the Swap-ins of system-service processes occur in the non-background state.

4 Access Characteristic-Guided Remote Swapping (ACR-Swap+)

In this section, ACR-Swap\(^+\) is introduced to show how to use access characteristics of applications to optimize remote swapping. It includes SPS, RSP, and NaRP.

4.1 Overview

The basic idea of ACR-Swap\(^+\) is to partition the swap area into a local swap area (LSA) and a remote swap area (RSA). Then, based on the observations presented in Section 3, swapped data should be carefully placed between these two areas. Figure 6 shows the architecture of ACR-Swap\(^+\). Two critical components are implemented: LSA and RSA. LSA exposes a conventional frontswap interface [18] to the kernel virtual memory and consists of four components: a system service-aware page sifting, a running state-oriented prefetching, and a network traffic-aware remote paging. The system service-aware page sifting module classifies pages according to their belonging processes and places selected processes at LSA based on the observations in Section 3.1 (i.e., swap-in frequency). The running state-oriented prefetching module enables data to be swapped in from RSA based on the observations in Section 3.2 (i.e., the application running states). RSA is designed with the remote device memory, consisting of the remote stub and page pool. The remote stub is responsible for responding to LSA’s requests, and the page pool manages the memory pool at the remote device. The local stub is responsible for communicating with remote devices and delivering pages.
Fig. 6.
Fig. 6. Architecture and workflow of ACR-Swap\(^+\).

4.2 System Service-aware Page Sifting (SPS)

When facing memory pressure, pages are frequently swapped between memory and the swap area. For remote swapping, excessive remote paging will magnify the shortcomings of remote I/O, which will inevitably affect the user experience. Unlike the simple treatment of system-service processes in previous work [29], SPS adopts a system service-aware page identification and placement strategy for processes based on the observation in Section 3.1. The core idea is to prioritize the data placement of system-service processes with frequent Swap-ins in LSA to avoid excessive remote swapping operations. When the free space of LSA reaches a certain threshold during memory reclamation, application-specific processes are preferentially swapped out to RSA. Figure 7 shows the design of SPS. There are three challenges for the above design: First, when the kswapd thread reclaims memory, it cannot directly obtain the process information corresponding to the page. In this case, it is impossible to classify and place pages according to the types of processes. Second, the numerous system-service processes need to be further sifted to improve the efficiency of remote swapping. Third, setting the threshold for page transfers from LSA to RSA needs to be transparent and efficient to the user. To solve the above challenges, the page identification, system service sifting and threshold determination schemes are presented as follows.
Fig. 7.
Fig. 7. The proposed SPS scheme.

4.2.1 Page Identification.

In LSA, pages of system-service and application-specific processes are identified and placed. When the system suffers memory pressure and the selected pages start to be swapped out, the page identification module is responsible for identifying and placing pages. Specifically, this module first obtains the corresponding pages’ process information through the page reverse mapping mechanism [22]. Then, process-specific pages are grouped into system-service and application-specific pages according to their task names. These pages will be compressed and stored in two memory pools of LSA, one is the system-service processes, and the other is application-specific processes. ACR-Swap\(^+\) creates only two memory pools for different processes, rather than one per process to index each process’s pages. The main reason is that if each process uses one memory pool, the system’s maintenance of process memory pool information is complex and inefficient due to the numerous processes. Moreover, when the pages of LSA need to be swapped out to RSA, it is troublesome for the system to find the corresponding application-specific process from many process memory pools. Conversely, creating only two classes of process memory pools is less expensive and easy to retrieve.

4.2.2 System Service Sifting.

Placing all system-service processes in the LSA will result in limited pages for remote swapping, which cannot effectively relieve memory pressure. To further clarify which system-service processes can be placed in the LSA, a system service sifting mechanism is designed to perform a deeper analysis of the system services. Since there are numerous system services with various swap-in frequencies, system-service processes are further sifted to perform more aggressive remote paging. Specifically, the system service sifting mechanism includes two modules: a swap-in frequency recorder and a target system service selector. The former is responsible for recording the swap-in frequency of different system services. The latter is responsible for selecting the target system service to perform remote paging according to the collected swap-in frequency. The workflow of system service sifting is as follows.
First, the swap-in frequency recorder enables each system service to save the corresponding swap-in frequency. Moreover, the average value of the swap-in frequencies corresponding to all application processes will be calculated as a benchmark for selecting system services. Second, when the system needs to perform remote paging, the target device selector determines which system services are preferentially swapped out by comparing the swap-in frequencies of the system services and the application processes. Specifically, when the swap-in frequency of a particular system-service process is lower than the average frequency of the application processes, the system service is marked as swap-out available.

4.2.3 Threshold Determination.

As the kswapd thread continues to work, more and more pages are stored in LSA. When the remaining memory (\(RM\)) of LSA is less than the predefined threshold, \(TH\) (i.e., the pages in LSA should be swapped to RSA to release local memory pressure), the system prioritizes the page reclamation from the application-specific memory pool. If the memory pressure of LSA still cannot be released, it means that many system-service processes are occupied in the local memory. Then, the system reclaims the pages of the system-service processes in an LRU mode. Note that the value of \(TH\) is set when the remaining LSA cannot accommodate the anonymous data caused by an app switching, which ensures that the running of the foreground app will not be affected by remote paging. Thus, \(TH\) must meet the maximum memory required for an application (e.g., 512MB [25]) to run normally to ensure transparent paging of the application. Besides, ACR-Swap\(^+\) performs page forwarding before the LSA is used up for user experience consideration. Specifically, if page reclamation is performed when LSA is used up, the memory required by the foreground app needs to wait for the page reclamation to complete. Furthermore, if we set a reserved space for app switching in the foreground when LSA is not used up, we can ensure that the underlying page transformation is transparent to the upper app. Therefore, the swap-out operation is transparent to the user.

4.3 Running State-oriented Prefetching (RSP)

Traditional memory swapping employs a read-ahead scheme, which simultaneously swaps in an aligned block of pages containing the faulted page (always eight pages). If the pages swapped in by read-ahead are accessed in the cache, it can provide superior access performance to improve user experience. However, even though the read-ahead schemes have been significantly improved [5, 40, 41, 45], it is not designed for remote swapping across mobile devices. To this end, ACR-Swap\(^+\) proposed RSP based on the swap-in access characteristics. The basic idea is to dynamically regulate the prefetching window size based on the application running states. In this way, the application can achieve different swap-in throughput at different running states, which can effectively hide the latency of remote I/O and avoid cache pollution.
Different from the previous work [29], it neglected to prefetch system-service processes. Since system services are swapped in more frequently than applications, efficient prefetching needs to take them into account. Considering that there is no apparent running state changes when the system-service processes perform swap-in, the previous RSP does not seem to be suitable for them. However, note that the system-service processes are often called during application switching. Therefore, the newly proposed RSP can utilize this feature to synchronously regulate the prefetching windows of all processes based on the application running states. However, there are at least two challenges to this basic idea: First, the switching of the application running state should be monitored accurately. Second, the prefetching window size should be carefully regulated. To solve the above issues, a monitoring application running state scheme and an N-size prefetching algorithm are proposed as follows.

4.3.1 Monitoring Application Running State.

Based on our current implementation, we classify the application running state into two categories: foreground and background. It is mainly for two reasons. (1) Since application-specific processes running states have various types, the system overhead is considerable if each application running state is dynamically adjusted. (2) During the system’s operation, only the foreground process directly interacts with the user. From the perspective of optimizing the user experience, we prioritize the foreground applications, and the others are regarded as background applications. It is worth mentioning that we adopt the value of \(oom\_score\_adj\) [14] to judge the priority of the process, e.g., \(oom\_score\_adj\) equals 0, which means the application is in the foreground.

4.3.2 N-Size Prefetching Algorithm.

Algorithm 1 shows the process of the N-size prefetching algorithm for applications, which is designed to adaptively change the prefetching window size. \(P_{runstate}\) and \(P_{pid}\) represent the process running states and process id, respectively. \(PW_{size}\) is defined as the prefetching window size, and its default value is set to 1, where memory compression is generally performed page by page. Moreover, to implement adaptive prefetching for system-service processes, a additional symbol is added. \(Signal_{app\_switch}\) represents the signal of the application switching from background to foreground. If \(Signal_{app\_switch}\) is true, it indicates the system-service processes are then called.
For each window adjustment process, the N-Size algorithm first checks whether \(P_{runstate}\) of the corresponding process changes. If it changes and the state of the process becomes the \(foreground\) (lines 7-11), the prefetching window size is expanded by \(n\) times, e.g., set to 2 in our current implementation. And \(Signal_{app\_switch}\) is set to true. If not, the window size and \(Signal_{app\_switch}\) remain unchanged (lines 12-14). If \(P_{pid}\) of the process remains unchanged and the corresponding process is the same (lines 16–18), it can be judged that the application-specific process is still switching to the foreground, and the prefetching window size is increased by \(n\) times. Note that the prefetching window will return to the default value when there is no application switching, so there is no need to set a step-by-step decrease (lines 20-21). Furthermore, since TCP transmits up to 64KB of data at a time, RSP limits the prefetching window size to no more than 16 (lines 24-25).
Algorithm 2 shows the process of the N-size prefetching algorithm for system services. Once \(Signal_{app\_switch}\) is true, the system-service process prefetching window will increase like the application process (lines 2-6). Moreover, to improve the accuracy of prefetching, we redesigned the prefetching in units of the per-app page list to exploit the process locality [9].

5 Network Traffic-aware Remote Paging (NaRP)

In this section, we first analyze how much remote swapping performance is affected by the network traffic in real usage scenarios. And then, a network license algorithm for remote swapping is designed to perform remote paging at different degrees to ensure the stability of remote swapping.

5.1 Problem Statement and Motivation

Remote swapping performs well in an ideal laboratory environment [28], but it faces various external environment interference in real use cases, which will significantly degrade the user experience. Different wireless networks often have different performance characteristics in terms of bandwidth, latency, jitter, and packet loss. Specifically, first, mobile devices may be used at home, school, or other public places, and the networks in these places obviously have different network bandwidths and network delays [8]. Second, there is often more than one user using the devices under the same LAN. In particular, when other users are also performing network-intensive tasks (e.g., file sharing and screen casting), these tasks can easily compete with remote swapping for network resources. To evaluate the performance of remote swapping in a real network environment, we constructed network conditions under different scenarios, covering different locations (e.g., family network and laboratory network) and different users under the same network.
Figure 8 shows an evaluation of the swap throughput for different network states and use scenarios. Figure 8(a) shows that the remote swapping throughput reached the highest 517 Mbps under the ideal laboratory network. However, in the public network, such as the coffee network, the swap throughput may drop by 47.9%. Correspondingly, under different networks, the network bandwidth jumps up and down. This reveals that differences in network conditions themselves can significantly affect remote swapping. It is worth mentioning that mobile devices may be used at home, school or other public places, which seems inevitable. Figure 8(b) shows the throughput variation of remote swapping under another scenario. Under the same local area network, there is often more than one users using their mobile devices. However, when more and more users use their mobile devices under the local area network, the throughput of remote swapping continues to decline. For example, when there are 10 other users on the LAN using their mobile devices, the swap throughput drops by 42.7%. This is because other users inevitably compete with remote swapping tasks for network resources when using mobile devices, especially when other users are also performing network-intensive tasks (i.e., file sharing or screen casting).
Fig. 8.
Fig. 8. Various swap throughput among different network conditions: (a) The swap throughput at different locations, i.e., under different networks. Net.1: family network. Net.2: laboratory network. Net.3: campus network. Net.4: office network. Net.5: coffee network. (b) The swap throughput under the same LAN with different users.
In conclusion, current remote swapping is susceptible to performance fluctuations caused by changes in network traffic, which will seriously impair user experience.

5.2 Design of NaRP

To solve the above issue, a NaRP scheme is proposed. The basic idea is to divide the network traffic into different levels and then perform remote paging at various degrees to ensure the stability of remote swapping. Two challenges should be solved for this basic idea: First, network traffic varies in different environments and over time, which is difficult to quantify into individual levels. Second, even if the network traffic is well-graded, undesired network fluctuation will still exist. Thus, the system must be redesigned to accommodate different degrees of remote paging. To solve the above challenges, the grading network traffic and network license algorithm schemes are proposed as follows.

5.2.1 Grading Network Traffic.

In our design, network traffic is divided into three levels based on the network bandwidth. The reason why we only divide these three levels is related to our remote paging mechanism. ACR-Swap\(^+\) supports three different levels of remote paging, including directly closing remote paging, the SPS in this paper, and the origin SPS mechanism [29]. The advantage of this division is that the degree of remote swapping does not need to be controlled separately for network conditions. There are two crucial network bandwidth thresholds involved in this design. The maximum value indicates that network traffic supports maximum remote swapping. The minimum value indicates that network traffic does not support remote swapping. They are determined by two basic realities [12]. First, if the application fails to respond to user input within 5 seconds, the system displays Application Not Responding (ANR). This situation corresponds to the minimum bandwidth. Second, if the application cannot respond to user input for more than 200 ms, the user will feel that the application is slow. This situation corresponds to the maximum bandwidth. Combined with the number of pages that need to be swap-in during the application switching (average is 30MB) and the above-mentioned delay requirements, in this design, the minimum and maximum network bandwidth are 48 Mbps and 1200 Mbps, respectively. Moreover, the network bandwidth check only happens during application switching to save system overhead.

5.2.2 Network License Algorithm.

Algorithm 3 shows the process of the network license algorithm, which is designed to perform varying degrees of remote swapping based on the network traffic. \(N_{bw}\) represents the bandwidth of the wireless network used by the device. \(T_{remote}\) is defined as other remote tasks under the same LAN, and its default value is set to false. \(Signal_{app\_switch}\) as a global variable representing application switching (defined in Algorithm 1) is also utilized in this algorithm. When the application switches from background to foreground, \(Signal_{app\_switch}\) will be set to \(true\). Also, when the system detects that there are other network-intensive tasks under the LAN, \(T_{remote}\) is set to \(true\). In our design, file sharing and screen casting are selected network-intensive tasks, which are triggered manually to simulate the corresponding scenarios.
Whenever \(Signal_{app\_switch}\) or \(T_{remote}\) is true, the system will enable network traffic checks to judge the network bandwidth (lines 1-3). As mentioned earlier, network bandwidth is divided into three levels, using two thresholds, the minimum value and the maximum value. When the current network bandwidth \(N_{bw}\) is less than the minimum value, the system is not allowed to perform remote paging (lines 4-6). When \(N_{bw}\) is between the two, the system performs preferential remote paging (lines 7-9). When \(N_{bw}\) exceeds the maximum value, the system can perform full-power remote paging (lines 10-12).

6 Implementation and Discussion

6.1 Implementation of ACR-Swap+

ACR-Swap\(^+\) is implemented on real mobile devices. Several independent kernel modules are implemented to the main functions, including common_network for providing common network services between local and remote devices, local_swap for enabling SPS, RSP, and NaRP on the local device, and remote_swap for providing remote memory on the remote device. System-native functions (e.g., try_to_unmap)are used in preference to obtain information for high compatibility. ACR-Swap\(^+\) introduces only a few system source code modifications (2,210 lines of C language in the kernel space and 105 lines of Java/C++ in the user space). To deploy ACR-Swap\(^+\) on off-the-shelf devices, some implementations are presented as follows.
First, LSA and RSA are implemented as follows. LSA is deployed in physical memory as a dedicated frontswap interface [18]. To the best of our knowledge, frontswap is designed for swapping at page granularity rather than supporting general block I/O operation. It strives to minimize context switches to other tasks while swapping operations completely. At the same time, a virtual network block device is created and formatted as a swap area, and the pages that are swapped into the partition are actually forwarded to the remote device. Specifically, the source code of boot.img is modified. After porting the recompiled image to the mobile device and rebooting, a customized frontswap and block interface are set in the systems. One is \(localswap\), which can be seen in the directory “/sys/module” and the other is \(remoteswap\), which can be seen in the directory “/dev/block”. \(localswap\) provides a transcendent memory interface for swap pages, responsible for page filtering store and selectively forwarding pages to \(remoteswap\) suffering memory pressure. Actually, RSA acts as the \(remoteswap\) corresponding to the actual swap area to temporarily store the swapped pages and responds to client-side read requests.
Second, due to the frequent page delivery between the local device and the remote device, effective page indexing is essential for swap area management. In the implementation, flags \(local\) and \(remote\) are used to maintain index pages located in LSA or RSA, respectively. The original swap-related code and corresponding mapping table are modified to support the two swap areas page indexing. Expressly, when a page locates in LSA, the address bit of PTE is set as \(local\). Finally, another assumption of this work is that the system performs remote swapping in only user-owned trusted devices. For such a scenario, ACR-Swap\(^+\) adopts an authentication mechanism that asks the local and remote devices’ owner(s) to authenticate the remote paging, after which ACR-Swap\(^+\) assumes that both systems are not malicious. Therefore, the security issues related to untrusted devices are out of our work scope.
Third, SPS adopts the page reverse mapping mechanism to obtain the process ID of the page to distinguish the system service process. Moreover, the swap-in frequency of each process is recorded to support the selection of victim system-service processes when SPS performs system service sifting. We maintain a two-dimensional array in the kernel to hold this information. RSP identifies the determine the running status of the process based on \(oom\_score\_adj\), when a page is swapped in. To support NaRP, we periodically detect the network bandwidth. Depending on the network bandwidth, different degrees of remote paging mechanisms are executed.

6.2 Possible Disconnection and Security Concerns

Due to mobility, the connection between the local and remote devices can be lost at any time. If not handled properly, disconnecting may induce multiple problems, such as the local system crashes and the remote system keeps monitoring. Based on this, a time-out mechanism is adopted to detect a disconnection. At regular intervals, the remote paging latency is measured. If the local device does not receive the acknowledgment before a certain threshold or the remote does not hear from the local, they both trigger a disconnection event [7]. For the remote, network disconnection is equivalent to killing the kernel thread that updates and maintains anonymous client memory. Then, the remote device cleans up the residuals of the disconnected client process, while the corresponding memory is inevitably reclaimed by the remote-side gradually. It is a future work that remote memory is not reclaimed and properly handled in the case of network disconnection. For the local device, we try to make the disconnection transparent to the application as much as possible. Specifically, we first update the corresponding PTE of the swapped pages and then enable the local storage as a swap area after the disconnection. It is worth mentioning that we put some system-service processes in the local memory, which can further reduce the impact on the system when the device is disconnected.
Another assumption of this work is that the system performs remote swapping in only user-owned trusted devices. For such a scenario, ACR-Swap\(^+\) simply adopts an authentication mechanism that asks the local and remote devices’ owner(s) to authenticate the remote paging, after which ACR-Swap\(^+\) assumes that both systems are not malicious. For scenarios with high security requirements, to optimize network security, ACR-Swap\(^+\) provides an optional data encryption interface to ensure user privacy. Specifically, ACR-Swap\(^+\) enables an encrypt kernel module based on the Linux crypto framework [34] to encrypt/decrypt pages. This module uses the AES acceleration hardware on mobile systems to provide secure multi-mobile device communication using 128-bit AES encryption. Once the module is enabled, the system generates a public/private key pair to facilitate the secure sharing of the AES session key generated at runtime. When sharing pages between local and remote devices, the data encryption and decryption functions crypto_skcipher_encrypt or crypto_skcipher_decrypt can be called to achieve secure data transmission. Since data security is not the main concern in this paper, the encryption feature is turned off by default.

7 Evaluation

7.1 Experiment Setup

In our experiments, we adopt two Google Pixel 6 smartphones with LineageOS 20.0 [1] (based on Android 13 and kernel 5.10) as our platform. It is equipped with an Octa-core ARM CPU, 8 GB DRAM, and 128 GB UFS 3.1 storage space. To deploy the remote swapping system, one device is configured as the local device while the other is configured as the remote device. All devices are connected to the same Wi-Fi network with a bandwidth of 1200 Mbps. To avoid interference, network functions except Wi-Fi are disabled. We selected widely used mobile applications as mentioned in Table 1, and conducted experiments by following the two steps: (1) We installed the pre-selected 40 applications (12 switching applications and 28 background running applications) on the device to begin each experiment. (2) We used adb [10] and logcat [13] commands to collect evaluation results while performing automated tests with UI Automator [17] that emulates UI touches of users based on scripts. We performed the same automated tests ten rounds and calculated the average.
Table 1.
CategoryForeground ApplicationsAuto user inputs
BrowserChromeBrowse and read posts
Social NetworkFacebook, XBrowse and read posts
MultimediaYoutube, TiktokWatch videos
Business UtilityAmazon, CNN, Uber, Gmap, CameraBrowse and search/Take a photo
GameAngry Birds, Candy CrushPlay a stage
Table 1. Applications and Automated User Interaction
\(*\)Background applications: Browser (Firefox, Opera), Social Network (WhatsApp, Instagram, Skype, WeChat, LinkedIn, Quora), Multimedia (Spolity, MXPlayer, Netflix, Capcut), Online shopping (Wish, Taobao, eBay, AliPay, BOA, Paypal), Business Utility (Booking, Gmail, New York Times, BBC News, OfficeMobile, GoogleDrive), and Game (Hill Climb Racing, Boom Beach, ClashRoyale, Call of Duty).
The following seven schemes are evaluated to show the advantages of ACR-Swap\(^+\): (1) ZRAM represents the baseline adopted by mobile systems by default, which inevitably kills processes when the compressed memory is almost used up. (2) Local Swapping (LS) represents the traditional flash-based swapping, which stores the swapped-out pages in the local storage. (3) Remote Swapping (RS) represents the original remote swapping that creates the swap area in remote devices, and the swapped-out data will be stored in the remote device to release local memory pressure. (4) CloudSwap (CS) represents enhanced remote swapping [9]. We reproduce its design relevant to this paper including that using the fast local storage as a cache of read-intensive swapping pages while storing prefetch-enabled, write-intensive swapping pages in the cloud storage. (5) MobileSwap (MS) represents enhanced remote swapping [28] that includes resource-dedicated swapping for fast swapping among devices and app-aware swapping for network connectivity considerations. (6) ACR-Swap represents the state-of-the-art work ACR-Swap [29], which includes PPS (store system services locally) and AGP (adaptive prefetching for applications). (7) ACR-Swap\(^+\) is the proposed scheme, which includes SPS, RSP and NaRP. In the evaluation, ZRAM is set to 1GB for swap space. The swap area with LS, RS, and MS is set to 3GB. CS, ACR-Swap and ACR-Swap\(^+\) maintain a 1GB local swap area and a 2GB remote swap area, for fairness. Moreover, ACR-Swap and ACR-Swap\(^+\) set the LSA’s \(TH\) to 512MB for simplicity.

7.2 Application Caching and Switching Latency Optimization

In the following, application caching capability and switching latency are evaluated to show the advantages of the proposed scheme.

7.2.1 Application Caching.

We repeatedly launch the pre-installed applications for ten rounds and check how many could be cached in the background. The order of application switches changes randomly in each round.
To estimate the application caching capability, the number of cached applications and the write traffic are collected, as shown in Figure 9. First, ZRAM can only cache a limited number of applications (14 out of 40). This is because when the limited ZRAM space is almost used up, the LMK [14] mechanism starts to work, frequently killing applications and deteriorating the user experience. In contrast, remote swapping can cache more applications without worrying about write traffic and swap space size. Although traditional RS is limited by the impact of massive remote I/O, users cannot respond in time, resulting in additional LMK [2]. With further optimization of RS, such as CS, MS and ACR-Swap, the caching capabilities of applications have been further improved. This is because CS adopts a hybrid local and cloud swapping architecture and the redesigned swap-in read-ahead mechanism, MS designs a dedicated network channel to optimize RS. Unlike them, ACR-Swap\(^+\) first significantly reduces the impact of massive remote I/O on applications through in-depth analysis of application swapping characteristics. In particular, compared to ACR-Swap, ACR-Swap\(^+\) has added the optimization of system services and consideration of network traffic. Secondly, the device’s memory-based design ensures almost no additional write traffic. The experimental results show that compared with ZRAM, its cache capacity is increased by 170%, and the write traffic is reduced by 49% compared with LS.
Fig. 9.
Fig. 9. The amount of write traffic and application caching capability comparison among ZRAM, LS, RS, CS, MS, ACR-Swap, and ACR-Swap\(^+\).

7.2.2 Application Switching Latency.

To understand the benefit of ACR-Swap\(^+\) on application switching, we repeated the launch of the pre-installed switching applications in the foreground and calculated the average application switching latency for ten rounds. Figure 10 shows the results for remote swapping schemes. First, in the case of ZRAM, application switching latency is minimal, whereas RS is increased by 350%. It indicates that RS still has vast room for application switching latency optimization. Second, although previous works, including CS, MS and ACR-Swap, can effectively reduce this latency, there is still a big gap between using ZRAM. Third, the results also show that ACR-Swap\(^+\) reduces the app switching latency by 69.4%, 51.7%, 46.2%, and 21.6% compared with RS, CS, MS, and ACR-Swap, respectively. In particular, its performance is almost the same as using ZRAM. This is because compared with RS, CS and MS, ACR-Swap\(^+\) leverages the page access characteristics to optimize the user experience. On the one hand, ACR-Swap\(^+\) realizes efficient page identification and placement between LAS and RSA through SPS, significantly reducing remote paging and improving the swap throughput. On the other hand, the dynamic prefetching window of RSP further improves the role of read-ahead to hide the remote paging latency. Compared with ACR-Swap, ACR-Swap\(^+\) reconsiders the system-service processes to enhance the efficiency of swap-out and swap-in further.
Fig. 10.
Fig. 10. Average application switching latency of RS, CS, MS, ACR-Swap, and ACR-Swap\(^+\).

7.3 Performance Analysis

To further understand the benefits of ACR-Swap\(^+\), SPS and RSP are evaluated as follows.

7.3.1 Throughput Improvement with SPS.

SPS is designed to reduce frequent swapping operations on network channels by page sifting. We use the swap throughput to measure the effectiveness of SPS. To collect it, we chose six switching applications (e.g., Chrome and Facebook) to switch between foreground and background, and 26 applications running in the background to increase memory pressure. All applications run in the variant remote swapping schemes, respectively. We run each application category in both configurations for more accuracy for ten rounds and calculate the average value. The size of the swap area and \(TH\) is set the same as in the application cache experiment.
Figure 11(a) shows the swap throughput among evaluated schemes. We find that SPS can significantly improve the swap throughput of the applications, which is 3.66x that of RS. This is because SPS is designed to place the pages with frequent swapping operations belonging to system-service processes used during application switching to local memory, dramatically reducing the time it takes to swap pages in and out. Compared with CS and MS, SPS improves the swap throughput by 2.02x and 1.54x. Since the pages swapped in by the app are much less than that of the system-service process, the CS judges whether the page is placed locally based on the number of times the app is swap-in, and the improvement of the swap throughput is limited. Moreover, compared with the PPS of ACR-Swap, thanks to the further sifting of the system-service process, the throughput of SPS increases by 16.2%. To show more details and the robustness of the proposed scheme, we further analyzed the number of remote paging among different app switching. Figure 11(b) shows that the proposed scheme can significantly reduce the number of remote paging, which further explains why SPS can effectively improve swap throughput. Specifically, SPS reduces the number of remote paging by an average of 48.1% compared to RS and 17.3% compared to PPS.
Fig. 11.
Fig. 11. Throughput Improvement with SPS: (a) The improvement for swap throughput among RS, CS, MS, PPS (represents the precious work without system service sifting of ACR-Swap) and SPS. (b) The reduction of remote paging among RS, CS, MS, PPS, and SPS. The results are normalized to the paging of RS.

7.3.2 Swap-in Optimization with RSP.

RSP is proposed to hide remote paging latency by page prefetching. We study the efficiency by evaluating the swap-in latency. The evaluation method is similar to the above. All applications are run in the variant of different prefetching methods. We compare RSP against different prefetching mechanisms: (1) No Prefetching (NP): swap-in page by page without prefetching; (2) Linux Read-Ahead (LRA) scheme: swap-in in a fixed size (i.e., eight); (3) Application-aware Read-Ahead (ARA) scheme [9]: swap-in in the unit of app list rather than swap slot. (4) Adaptive Granularity Prefetching (AGP) scheme [29]: adaptive prefetching for applications without considering system services. Specifically, we compare the page swap-in latency during the switching of individual applications with several different prefetching mechanisms, as shown in Figure 12. RSP achieves an average read latency reduction of 65.2%, 36.2%, 17.8%, compared to LRA, ARA, and AGP, respectively. This improvement stems from the fact that the access characteristics of the application are fully utilized. ARA can also reduce the switching latency to some extent, but it misses the best time to prefetch.
Fig. 12.
Fig. 12. Average app page read latency of NP, LRA, ARA, AGP, and RSP.

7.4 Guaranteed Swap Throughput with NaRP

NaRP is designed to ensure that remote swapping has stable performance under unpredictable network conditions. To understand the benefit of NaRP, we compare the swap throughput under different network conditions, which is evaluated as follows. We use swap throughput to evaluate the efficiency of NaRP. To collect it, we deploy different remote swapping schemes in each of the five locations (representing different LAN connections). As in the above experiments, six different applications were switched back and forth, and the average swap throughput during application switching was calculated.
Figure 13 shows the swap throughput under various network conditions among multiple remote swapping schemes. First, the swap throughput corresponding to various remote swapping is faced with fluctuations under different network conditions. For example, compared to Net.2, the throughput of RS under Net.5 drops by 36.7%. Other existing schemes including CS, MS, and ACR-Swap also suffer performance losses of 35.7%, 20.2%, and 25.1%, respectively. Most importantly, ACR-Swap\(^+\) achieves the slightest performance fluctuation of only 5.1%. Thanks to the conscious remote paging design of the network, ACR-Swap\(^+\) realizes dynamic remote paging and effectively avoids the influence of network jitter on the system. Second, under the same network conditions, the swapping throughput of previous designs is also significantly different. ACR-Swap\(^+\) guarantees the swap throughput of 838 Mbps even under the worst network conditions.
Fig. 13.
Fig. 13. The swap throughput comparison among RS, CS, MS, ACR-Swap, and ACR-Swap\(^+\) under various networks.

7.5 Sensitive Studies

To further understand ACR-Swap\(^+\), several sensitive studies are conducted, including the size of LSA, the \(TH\) for remote paging in SPS and prefetching window size in RSP.

7.5.1 Local Swap Area Size.

The user experience benefit of ACR-Swap\(^+\) is impacted by the LSA size. In this evaluation, we change the size of LSA (from 128 MB to 2 GB) and keep RSA with 2G memory unchanged. Figure 14 shows that the average application switching latency reduced with the expansion of LSA. The larger the LSA size is, the more pages are placed locally, and direct memory access is more efficient than reading pages across devices. On the other hand, it is not a priceless drop. When the size of LSA is raised continuously, the system overhead for frequent page compression and decompression for LSA during the swap process, resulting in excessive CPU usage. For example, when the LSA is 2GB, the CPU overhead reaches 41% when the system is running ACR-Swap\(^+\), which will bring extra energy overhead.
Fig. 14.
Fig. 14. The trade-off for the size of LSA.

7.5.2 TH for Remote Paging in SPS.

In the design of SPS, when the remaining memory of LSA is less than \(TH\), the system will perform remote paging. To show the advantage of SPS, we should reasonably choose the threshold for remote paging. In this evaluation, LSA is set to 1GB, and we change the size of \(TH\) (from 100 MB to 400MB) and keep RSA with 2G memory unchanged. Figure 15(a) shows that with the increase of \(TH\), the swap throughput of the application will increase significantly, but it will also lead to an increase in the size of LSA. The throughput of the application varies with the increase of \(TH\). For example, YouTube’s swap throughput has been declining with the increase of \(TH\). This is because the increase of \(TH\) means that more and more pages are swapped out, and YouTube relies more heavily on network I/O. As for CNN, the impact of \(TH\) changes on this type of application is almost negligible.
Fig. 15.
Fig. 15. The trade-off for the size of \(TH\) and prefetching window size. (a) The various swap throughput under different \(TH\). (b) The app read latency for different prefetching window.

7.5.3 Prefetching Window Size in RSP.

The improved performance benefit of RSP is affected by the prefetching window size. In this experiment, we enable various times of the window size expansion (i.e., \(n\) = 2, 4, 8, 16) to obtain the average swap-in read time to verify the impact of different prefetching sizes on the application. Figure 15(b) shows the swap-in read time of the application under different read-ahead windows. It can be seen that the swap-in effect of the same application is different due to different prefetching window settings during the switching process. For example, Chrome has a 29% difference in the swap-in read time between n=4 and n=16. This is because different sizes of prefetching window both increase the read bandwidth and introduce additional performance overhead due to excessive prefetch, which needs to be carefully designed.

7.6 Overhead Analysis

While ACR-Swap\(^+\) effectively optimized the memory management with cross-device swapping, there are still several tradeoffs in the design. This section quantifies ACR-Swap\(^+\)’s Three types of overhead: (1) Performance overhead of reverse mapping in SPS design; (2) Memory overhead of system service sifting in SPS design; and (3) CPU overhead to manage the two-level framework.

7.6.1 Performance Overhead.

To enable ACR-Swap\(^+\), the system needs to obtain the process-related information corresponding to the page, including the process number and process running state information, through the RMAP (reverse mapping) mechanism in the Linux kernel. Moreover, the reverse mapping of this page will be sent when each page is swapped out, which will introduce additional time overhead to our system. Our experiments show that the time to perform this operation is at the microsecond-level, which is negligible compared to the hundreds of milliseconds of latency in application switching.

7.6.2 Memory Overhead.

ACR-Swap\(^+\) perform system service sifting by the swap-in frequency of a different process. To record the swap-in frequency of system-service processes, we maintain a two-dimensional unsigned integer array in the kernel to record the process ID and corresponding swap-in frequency. Each process information maintenance occupies 8 bytes. Our actual analysis found that there are only about 100 system service processes (compared to application-specific processes) that are frequently swapped in. So this array actually requires only 800 bytes of memory overhead, which is almost negligible.

7.6.3 CPU Overhead.

Remote memory swapping introduces extra CPU overhead because pages need to be passed through sockets. Compared with traditional remote-swap, page compression during ACR-Swap\(^+\) working will increase overhead. At the same time, since ACR-Swap\(^+\) selectively performs remote paging, it will significantly reduce the workload of sockets. Overall, ACR-Swap\(^+\) increases CPU overhead by 4.1% compared to traditional remote-swap, but this overhead is entirely worth the benefits.

8 Conclusion

In this work, we proposed ACR-Swap\(^+\), which aims to optimize remote swapping on mobile devices with enhanced access characteristics of applications. We first studied the access characteristics of applications including swap-in frequency of different processes and swap-in disparity at different application running states. And then, the impact of network traffic on remote swapping was also analyzed. Based on these, three schemes, including SPS, RSP, and NaRP, are proposed to improve the user experience by reducing the negative impact of massive remote I/O. Experimental results show that ACR-Swap\(^+\) outperformed other schemes with marginal overhead.

References

[1]
2024. Build LineageOS for Google Pixel 6. https://wiki.lineageos.org/devices/oriole/build
[3]
Andrea Acquaviva, Emanuele Lattanzi, and Alessandro Bogliolo. 2006. Power-aware network swapping for wireless palmtop PCs. IEEE Transactions on Mobile Computing (TMC) 5, 5 (2006), 571–582.
[4]
Marcos K. Aguilera, Nadav Amit, Irina Calciu, Xavier Deguillard, Jayneel Gandhi, Stanko Novaković, Arun Ramanathan, Pratap Subrahmanyam, Lalith Suresh, Kiran Tati, Rajesh Venkatasubramanian, and Michael Wei. 2018. Remote regions: A simple abstraction for remote memory. In 2018 USENIX Annual Technical Conference (ATC’18), 775–787.
[5]
Hasan Al Maruf and Mosharaf Chowdhury. 2020. Effectively prefetching remote memory with leap. In 2020 USENIX Annual Technical Conference (ATC’20). 843–857.
[6]
Emmanuel Amaro, Christopher Branner-Augmon, Zhihong Luo, Amy Ousterhout, Marcos K. Aguilera, Aurojit Panda, Sylvia Ratnasamy, and Scott Shenker. 2020. Can far memory improve job throughput?. In Proceedings of the Fifteenth European Conference on Computer Systems (EuroSys’20). 1–16.
[7]
Ardalan Amiri Sani, Kevin Boos, Min Hong Yun, and Lin Zhong. 2014. Rio: A system solution for sharing I/O between mobile systems. In Proceedings of the 12th Annual International Conference on Mobile Systems, Applications, and Services (MobiSys’14). 259–272.
[8]
Duc Hoang Bui, Kilho Lee, Sangeun Oh, Insik Shin, Hyojeong Shin, Honguk Woo, and Daehyun Ban. 2013. GreenBag: Energy-efficient bandwidth aggregation for real-time streaming in heterogeneous mobile wireless networks. In 2013 IEEE 34th Real-Time Systems Symposium (RTSS’13). 57–67.
[9]
Dongju Chae, Joonsung Kim, Youngsok Kim, Jangwoo Kim, Kyung-Ah Chang, Sang-Bum Suh, and Hyogun Lee. 2016. CloudSwap: A cloud-assisted swap mechanism for mobile devices. In 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid’16). 462–472.
[10]
Android Developers. 2024. Android Debug Bridge(Adb). https://developer.android.com/tools/adb
[11]
[12]
Android Developers. 2024. Keeping your App Responsive. https://developer.android.com/training/articles/perf-anr
[13]
Android Developers. 2024. Logcat Command-line Tool. https://developer.android.com/tools/logcat
[14]
Android Developers. 2024. Low Memory Killer Daemon: Android Developers Docs. https://source.android.com/docs/core/perf/lmkd
[15]
Android Developers. 2024. Low Memory Management: Android Developers. https://developer.android.com/topic/performance/memory-management
[17]
Android Developers. 2024. Write Automated Tests with UI Automator. https://developer.android.com/training/testing/other-components/ui-automat
[18]
Linux Memory Management Documentation. 2024. Frontswap. https://www.kernel.org/doc/html/latest/vm/frontswap.html
[19]
Juncheng Gu, Youngmoon Lee, Yiwen Zhang, Mosharaf Chowdhury, and Kang G. Shin. 2017. Efficient memory disaggregation with Infiniswap. In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI’17). 649–667.
[20]
Weichao Guo, Kang Chen, Huan Feng, Yongwei Wu, Rui Zhang, and Weimin Zheng. 2015. \(MARS\): Mobile application relaunching speed-up through flash-aware page swapping. IEEE Transactions on Computers (TC) 65, 3 (2015), 916–928.
[21]
Sangwook Shane Hahn, Sungjin Lee, Inhyuk Yee, Donguk Ryu, and Jihong Kim. 2018. FastTrack: Foreground App-Aware I/O management for improving user experience of Android smartphones. In 2018 USENIX Annual Technical Conference (ATC’18). 15–28.
[22]
Linux Kernel. 2024. Linux Reverse Mapping (rmap). https://lwn.net/Articles/23732/
[23]
Linux Kernel. 2024. zram: Compressed RAM-based Block Devices. https://www.kernel.org/doc/html/latest/admin-guide/blockdev/zram.html
[24]
Sang-Hoon Kim, Jinkyu Jeong, and Jin-Soo Kim. 2017. Application-aware swapping for mobile systems. ACM Transactions on Embedded Computing Systems (TECS) 16, 5s (2017), 1–19.
[25]
Niel Lebeck, Arvind Krishnamurthy, Henry M. Levy, and Irene Zhang. 2020. End the senseless killing: Improving memory management for mobile operating systems. In 2020 USENIX Annual Technical Conference (ATC’20). 873–887.
[26]
Changlong Li, Yu Liang, Liang Shi, Chao Wang, Chun Jason Xue, and Xuehai Zhou. 2024. Flexible and efficient memory swapping across mobile devices with LegoSwap. IEEE Transactions on Parallel and Distributed Systems (TPDS) 35, 1 (2024), 140–153.
[27]
Changlong Li, Liang Shi, Yu Liang, and Chun Jason Xue. 2020. SEAL: User experience-aware two-level swap for mobile devices. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD) 39, 11 (2020), 4102–4114.
[28]
Changlong Li, Liang Shi, and Chun Jason Xue. 2021. MobileSwap: Cross-device memory swapping for mobile devices. In 2021 58th ACM/IEEE Design Automation Conference (DAC’21). 115–120.
[29]
Wentong Li, Yina Lv, Changlong Li, and Liang Shi. 2022. Access characteristic guided remote swapping for user experience optimization on mobile devices. In 2022 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI’22). 186–193.
[30]
Wentong Li, Liang Shi, Hang Li, Changlong Li, and Edwin Hsing-Mean Sha. 2023. IOSR: Improving I/O efficiency for memory swapping on mobile devices via scheduling and reshaping. ACM Transactions on Embedded Computing Systems (TECS) 22, 5s (2023), 1–23.
[31]
Shuang Liang, Ranjit Noronha, and Dhabaleswar K. Panda. 2005. Swapping to remote memory over infiniband: An approach using a high performance network block device. In 2005 IEEE International Conference on Cluster Computing (CLUSTER’05). 1–10.
[32]
Yu Liang, Jinheng Li, Rachata Ausavarungnirun, Riwei Pan, Liang Shi, Tei-Wei Kuo, and Chun Jason Xue. 2020. Acclaim: Adaptive memory reclaim to improve user experience in Android systems. In 2020 USENIX Annual Technical Conference (ATC’20). 897–910.
[33]
Geunsik Lim, Donghyun Kang, Myungjoo Ham, and Young Ik Eom. 2023. SWAM: Revisiting swap and OOMK for improving application responsiveness on mobile devices. In Proceedings of the 29th Annual International Conference on Mobile Computing and Networking (MobiCom’23). Article 16, 15 pages.
[34]
Stephan Mueller and Marek Vasut. 2024. Crypto API. https://www.kernel.org/doc/html/latest/crypto/
[35]
David T. Nguyen, Gang Zhou, Guoliang Xing, Xin Qi, Zijiang Hao, Ge Peng, and Qing Yang. 2015. Reducing smartphone application delay through read/write isolation. In Proceedings of the 13th Annual International Conference on Mobile Systems, Applications, and Services (MobiSys’15). 287–300.
[36]
J. K. Ousterhout, A. R. Cherenson, F. Douglis, M. N. Nelson, and B. B. Welch. 1988. The Sprite network operating system. Computer 21, 2 (1988), 23–36.
[37]
Vijayan Prabhakaran, Mahesh Balakrishnan, and Ted Wobber. 2010. Extending SSD lifetimes with disk-based write caches. In 8th USENIX Conference on File and Storage Technologies (FAST’10), Vol. 10. 101–114.
[38]
Yifan Qiao, Chenxi Wang, Zhenyuan Ruan, Adam Belay, Qingda Lu, Yiying Zhang, Miryung Kim, and Guoqing Harry Xu. 2023. Hermit: Low-latency, high-throughput, and transparent remote memory via feedback-directed asynchrony. In 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI’23). 181–198.
[39]
Murali Ramanujam, Harsha V. Madhyastha, and Ravi Netravali. 2021. Marauder: Synergized caching and prefetching for low-risk mobile app acceleration. In Proceedings of the 19th Annual International Conference on Mobile Systems, Applications, and Services (MobiSys’21). 350–362.
[40]
Junhee Ryu, Dongeun Lee, Kang G. Shin, and Kyungtae Kang. 2023. Fast application launch on personal Computing/Communication devices. In 21st USENIX Conference on File and Storage Technologies (FAST’23). 425–440.
[41]
Sam Son, Seung Yul Lee, Yunho Jin, Jonghyun Bae, Jinkyu Jeong, Tae Jun Ham, Jae W. Lee, and Hongil Yoon. 2021. ASAP: Fast mobile application switch via adaptive prepaging. In 2021 USENIX Annual Technical Conference (ATC’21). 365–380.
[42]
Chenxi Wang, Haoran Ma, Shi Liu, Yifan Qiao, Jonathan Eyolfson, Christian Navasca, Shan Lu, and Guoqing Harry Xu. 2022. MemLiner: Lining up tracing and application for a far-memory-friendly runtime. In 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI’22). 35–53.
[43]
Chenxi Wang, Yifan Qiao, Haoran Ma, Shi Liu, Wenguang Chen, Ravi Netravali, Miryung Kim, and Guoqing Harry Xu. 2023. Canvas: Isolated and adaptive swapping for multi-applications on remote memory. In 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI’23). 161–179.
[44]
Yong-Xuan Wang, Chung-Hsuan Tsai, and Li-Pin Chang. 2021. Killing processes or killing flash? Escaping from the dilemma using lightweight, compression-aware swap for mobile devices. ACM Transactions on Embedded Computing Systems (TECS) 20, 5s (2021), 1–24.
[45]
Yair Wiseman and Song Jiang. 2009. Advanced Operating Systems and Kernel Applications: Techniques and Technologies: Techniques and Technologies. IGI Global.
[46]
Tao Zhang, Aviad Zuck, Donald E. Porter, and Dan Tsafrir. 2019. Apps can quickly destroy your mobile’s flash: Why they don’t, and how to keep it that way. In Proceedings of the 17th Annual International Conference on Mobile Systems, Applications, and Services (MobiSys’19). 207–221.
[47]
Xiao Zhu, Duo Liu, Kan Zhong, Jinting Ren, and Tao Li. 2017. SmartSwap: High-performance and user experience friendly swapping in mobile systems. In 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC’17). 1–6.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Architecture and Code Optimization
ACM Transactions on Architecture and Code Optimization  Volume 21, Issue 4
December 2024
665 pages
EISSN:1544-3973
DOI:10.1145/3613648
  • Editor:
  • David Kaeli
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 November 2024
Online AM: 16 September 2024
Accepted: 24 August 2024
Revised: 14 July 2024
Received: 26 December 2023
Published in TACO Volume 21, Issue 4

Check for updates

Author Tags

  1. Mobile device
  2. memory management
  3. remote swapping
  4. user experience

Qualifiers

  • Research-article

Funding Sources

  • NSFC
  • Shanghai Science and Technology Project

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 382
    Total Downloads
  • Downloads (Last 12 months)382
  • Downloads (Last 6 weeks)109
Reflects downloads up to 19 Feb 2025

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media