1 Introduction
Recent years have witnessed the enormous popularity of cloud storage services, such as Dropbox, Google Drive, iCloud Drive, and Microsoft OneDrive. They have not only provided a convenient and pervasive data store for billions of Internet users, but also become a critical component of other online applications. Their popularity brings a large volume of network traffic overhead to both the client and cloud sides [
18,
38]. Thus, a lot of efforts have been made to improve their network-level
efficiencies, such as batched sync, deferred sync, delta sync, compression, and deduplication [
12,
13,
17,
38,
39,
57]. Among these efforts, delta sync is of particular importance for its fine granularity (i.e., the client only sends the changed content of a file to the cloud, instead of the entire file), thus achieving significant traffic savings in the presence of users’ file edits [
19,
40,
41].
Unfortunately, today delta sync is only available for PC clients
and/or mobile apps in
state-of-the-art commercial cloud storage services
(as detailed in Section 2), but not for the web—the most pervasive and OS-independent access method.
Taking Dropbox as an example, after a file
f is edited into a new version
\(f^{\prime }\) by
a user, Dropbox’s PC client will apply delta sync to automatically upload only the altered bits to the cloud; In contrast, Dropbox’s web interface requires users to manually upload the
entire content of
\(f^{\prime }\) to the cloud.
1 This gap significantly affects web-based user experiences in terms of both sync speed and traffic cost [
19,
46,
47].
Web is a popular access method for cloud storage services [
20,
30,
32]: All major services support web-based access, while only providing PC clients and mobile apps for a limited set of OS distributions and devices. One reason is
that many users do not want to install PC clients or mobile apps on their devices to avoid the extra storage and CPU/memory overhead; in comparison, almost every device has web browsers. Specially, for the emerging cloud-oriented systems and devices (e.g., Chrome OS and Chromebook) web browsers are perhaps the only option to access cloud storage.
To understand the fundamental obstacles of web-based delta sync, we implement a delta sync solution, WebRsync, using state-of-the-art web techniques including JavaScript, WebSocket, and HTML5 File APIs [
31,
45] to achieve fast file I/O and transfer. WebRsync implements the algorithm of
rsync [
49] and works with all modern web browsers. Our experiments show that the sync speed of WebRsync is much lower than that of PC client-based
rsync, since WebRsync is severely affected by the low execution efficiency of JavaScript inside web browsers.
Thus, our first approach to optimizing WebRsync is to leverage the native client [
25], a sandbox for efficiently and securely executing compiled C/C++ code in a web browser. We embody this approach by developing WebRsync-native, and the experiments show that it can achieve comparable sync time as PC client-based
rsync. Nevertheless, using the native client requires the client side to download and install plug-ins for web browsers, which substantially impairs the usability and pervasiveness of WebRsync-native.
Driven by above observations, our second effort towards efficient web-based delta sync is to “reverse” the WebRsync process by handing computation-intensive operations over to the server side. The resulting solution is named
WebR2sync (Web-based Reverse rsync). It significantly cuts the computation burden on the web client, but brings considerable computation overhead to the server side. To this end, we make
additional two-fold efforts to optimize the server-side computation overhead. First, we exploit the locality of users’ file edits, which can help bypass most (up to
\(\sim\)90%) chunk search operations in real usage scenarios. Second, by leveraging lightweight checksum algorithms SipHash [
9] and Spooky [
11] instead of MD5, we can reduce the complexity of chunk comparison by
\(\sim\)5 times. The solution is referred to as WebR2sync+.
WebR2sync+, however, breaks the original workflow of rsync due to the reverse scheme. Hence, mainstream cloud storage systems need to refactor their architectures for adopting WebR2sync+, which greatly restricts the industrial applicability of WebR2sync+. Also, the collision probability of SipHash is much higher than that of MD5. Thus, WebR2sync+ runs risks of making more sync errors than other solutions, which greatly impairs the system reliability of WebR2sync+.
The above-described solutions,
including WebRsync, WebRsync-native, and WebR2sync+, seem to be all
striving around the tradeoffs among efficiency, applicability, and usability, as
comparatively listed in Table
1. Nevertheless, we still wish to explore the possibility of building an efficient web-based delta sync solution without sacrificing applicability
or usability.
Recently, an emerging technique (or web language) called
WebAssembly [
54] (or WASM for short) has been attracting increasing attention in the design of web applications. WASM is a portable binary instruction format (for web programming languages) that is efficient in both encoding size and load time. The unique advantages of WASM seem to be likely to enable web-based delta sync applications to enjoy near-native runtime speed without requiring significant cloud-side or client-side changes. Thereby, we explore how to leverage WASM to enable efficient web-based delta sync in a user-friendly and easy-to-apply manner (i.e., to make the best of both worlds).
Observing that the calculation of MD5 checksums incurs the most computation overhead at the client side, we leverage WASM to carry out this operation, resulting in the preliminary solution called WASMrsync. To avoid blocking the main thread that may make the web page unresponsive, WASM APIs are designed to be asynchronous. Therefore, we have to implement WASMrsync with a quasi-asynchronous strategy, i.e., to implement the asynchronous WASM API invocations in a synchronous manner using the await/async mechanism of JavaScript. Unfortunately, there are overheads associated with running the await/async method: The current execution context has to be captured, there is a thread transition, and a state machine is built through which the code runs. Worse still, this await/async property of a WASM API invocation is propagated to the wrapping function of the invocation, which introduces unnecessary overhead for executing even non-WASM code in the wrapping function and thus severely increases the overall sync time. To mitigate this shortcoming, we strategically devise sync-async code decoupling that extracts the synchronous non-WASM code from asynchronous modules to avoid unnecessary await/async overhead.
Moreover, to improve the loading efficiency of asynchronous WASM modules, we adopt streaming compilation (a novel WASM API) to download and compile WASM modules in parallel. When using WASM, the browser typically needs to download, compile, and instantiate a WASM module successively, and then call functions exported from that module through JavaScript. With streaming compilation, the browser can start compiling WASM modules while the browser is still downloading the module bytes. As soon as the browser downloads all bytes of a single function, the function can be passed to a background thread to compile it, which essentially improves the loading efficiency.
In addition, when handling large files, we notice WASMrsync suffers from the time- and memory-consuming file construction operations at the server side, which stem from the conventional In-situ Separate Memory Allocation (ISMA) mechanism. Hence, we devise the Informed In-place File Construction (IIFC) mechanism by (1) encoding the actions to every chunk of the updated file (ADD or COPY) with a dependency graph, (2) topologically sorting the graph, and (3) delivering the processed information from the client side to the server side. As a result, we only need to perform a single incremental memory allocation based on the memory space occupied by the old file.
The eventual solution is named WASMrsync+. We evaluate the performance of WebRsync-native, WebR2sync+, and WASMrsync+ by building prototype systems based on a Dropbox-like system architecture. The results show that WebRsync-native can significantly reduce the sync time of WebRsync—in fact, close to the sync time of rsync—while sacrificing usability. WebR2sync+ outpaces WebRsync by an order of magnitude, also approaching the performance of rsync, but at the cost of limited applicability. In contrast, WASMrsync+ achieves comparable sync time as WebR2sync+ (the former only takes 10%\(\sim\)20% more sync time than the latter) and saves \(\sim\)50% server-side memory usage, but without impairing the reliability, applicability, and usability.
In summary, our work makes the following contributions:
•
We elaborate the methodology for deeply
and comprehensively studying web- and WASM-based delta sync solutions,
involving the key metrics and workloads
for quantifying their performance, as well as an automated tool for accurately quantifying the
stagnation of web browsers (Section
3).
•
We propose a variety of web-based delta sync solutions in a step-by-step optimization manner, to reveal the challenges and opportunities of supporting delta sync
in traditional frameworks (Section
4).
•
We explore the
practical feasibility of leveraging WASM to enable efficient web-based delta sync in a user-friendly and easy-to-apply manner by developing the first workable
WASM-based delta sync solution (WASMrsync) (Section
5).
•
We implement an efficient WASM-based delta sync solution (WASMrsync+) without sacrificing applicability
or usability by devising both the client- and server-side optimizations (Section
6).
At last, we compare in Table 1 all the web- and WASM-based delta sync solutions as proposed and implemented in this article so readers can easily figure out how they vary. We
also make all their source code publicly available at
https://WASMDeltaSync.github.io.
2 Related Work and Status Quo
Delta sync, known as delta encoding or delta compression, is a way of storing or transmitting data in the form of differences (deltas) between different versions of a file, rather than the complete content of the file [
56]. It is very useful for network applications where file modifications or incremental data updates frequently happen, e.g., storing multiple versions of a file, distributing consecutive user edits to a file, and transmitting video sequences [
28]. In the past four decades, a variety of delta sync algorithms or solutions have been put forward, such as UNIX
diff [
27],
Vcdiff [
33], WebExpress [
26], Optimistic Deltas [
10],
rsync [
49], and
content defined chunking (CDC) [
34].
Due to its efficiency and flexibility,
rsync has become the de facto delta sync protocol widely used in practice. It was originally proposed by Tridgell and Mackerras in 1996 as an algorithm for efficient remote update of data over a high-latency, low-bandwidth network link [
52]. Then, in 1999, Tridgell thoroughly discussed its design, implementation, and performance in Reference [
51]. Being a standard Linux utility included in all popular Linux distributions,
rsync has also been ported to Windows, FreeBSD, NetBSD, OpenBSD, and MacOS [
49].
According to a real-world usage dataset [
38], the majority (84%) of files are modified by the users at least once, thus confirming the importance of delta sync on the network-level efficiency of cloud storage services. Among all mainstream cloud storage services, Dropbox was the first to adopt delta sync (more specifically,
rsync) in around 2009 in its PC client-based file sync process [
40]. Then, SugarSync, iCloud Drive, OneDrive, and Seafile followed the design choice of Dropbox by utilizing delta sync (
rsync or CDC) to reduce their PC clients’ and cloud servers’ sync traffic. After that, two academic cloud storage systems, namely, QuickSync [
13] and DeltaCFS [
64], further implemented delta sync (
rsync and CDC, respectively) for mobile apps.
Drago et al. studied the system architecture of Dropbox and conducted large-scale measurements based on ISP-level traces of Dropbox network traffic [
18]. They observed that the Dropbox traffic was as much as one-third of the YouTube traffic, which strengthens the necessity of Dropbox’s adopting delta sync. Li et al. investigated in detail the delta sync process of Dropbox through various types of controlled benchmark experiments and found it suffers from both traffic and computation overuse problems in the presence of frequent, short data updates [
40]. To this end, they designed an efficient batched synchronization algorithm called
UDS (update-batched delayed sync) to reduce the traffic usage, and further extended UDS with a backwards compatible Linux kernel modification to reduce the CPU usage (recall that delta sync is computation intensive).
Despite the wide adoption of delta sync (particularly rsync) in cloud storage services, practical delta sync techniques are currently only available for PC clients and mobile apps rather than web browsers. We conduct a qualitative study of delta sync support in state-of-the-art cloud storage services. The target services are selected for either their popularity (Dropbox, Google Drive, Microsoft OneDrive, iCloud Drive, and Box.com) or representativeness in terms of techniques used (SugarSync, Seafile, QuickSync, and DeltaCFS). For each service, we examined its delta sync support with different access methods, using its latest-version (as of March 2021) Windows PC client, Android app, and Chrome web browser. The only exception occurred to iCloud Drive, for which we used its latest-version MacOS client, iOS app, and Safari web browser.
To examine a specific service with a specific access method, we first uploaded a 1 MB
2 highly compressed new file (
f) to the cloud (so the resulting network traffic would be slightly larger than 1 MB). Next, on the user side, we appended a single byte to
f to generate an updated file
\(f^{\prime }\). Afterwards, we synchronized
\(f^{\prime }\) from the user to the cloud with the specific access method and meanwhile recorded the network traffic consumption. In this way, we can reveal if delta sync is applied by measuring the traffic consumption—if the traffic consumption was larger than 1 MB, the service did not adopt delta sync; otherwise (i.e., the traffic consumption was just tens of KBs), the service had implemented delta sync.
Based on the examination results listed in Table
2, we have the following observations: First, delta sync has been widely adopted in the majority of PC clients of cloud storage services. However, it has never been used by the mobile apps of any popular cloud storage services, though two academic services [
13,
64] have implemented delta sync in their mobile apps and proved the efficacy. In fact, as the battery capacity and energy efficiency of mobile apps grow constantly, we expect delta sync to be widely adopted by mobile apps in the near future [
37]. Finally, none of the studied cloud storage services supports
web-based delta sync, despite web browsers constituting the most pervasive and OS-independent method for accessing Internet services.
To this end, we first introduce the general idea of web-based delta sync with basic motivation, preliminary design, and early-stage performance evaluation using limited workloads and metrics [
61]. Nevertheless, in practice, we notice that leveraging conventional web technologies alone can not enable the efficient web-based delta sync without sacrificing the practicality (i.e., reliability, applicability, and usability). Fortunately, the emerging of WebAssembly [
54] sheds light on how to enable web-based applications in both efficient and practical manners. Many useful WebAssembly-based browser applications have been proposed, such as interactive 3D visualization [
21], resource accounting [
24], audio and video software [
36], and games [
63]. To this end, we go a step further to introduce the WebAssembly-based delta sync with preliminary design, implementation, optimizations, and performance evaluation using limited workloads and metrics. In this article, our work is conducted based on Reference [
62] while
going beyond it in terms of techniques, evaluations, and presentations.
6 WASMrsync+: WebAssembly-based Delta Sync Made Efficient and Practical
In this section, we present WASMrsync+, the resulting WebAssembly-based practical solution of supporting delta sync under current web frameworks. As discussed in Section
5, the performance gap between WASMrsync and WebR2sync+ reveals the need for a more comprehensive solution that requires optimization on both the client and server sides. Thus, to bridge this gap, we make additional two-fold efforts to improve the client-side sync efficiency and devise a mechanism to mitigate the performance bottleneck at the server side.
6.1 Client-side Optimizations
Devising strategic sync-async code decoupling at the client. By inspecting the execution flow of WASMrsync, we find that the quasi-asynchronous nature of WASM in rsync has led to considerable performance degradation. As demonstrated in Figure
3, when the client receives the checksum list of
f from the server, the client first performs chunk
search and
comparison operations on
\(f^{\prime }\) in a byte-by-byte manner on checksums (including rolling and MD5 checksums) and then generates both the
matching tokens and
literal bytes. Inevitably, this manner requires us to implement the chunk
search and
comparison operations at the client side in a synchronous way.
However, to avoid blocking the main thread that may make the web page unresponsive, WASM APIs are designed to be asynchronous [
55], e.g., WebAssembly.compile introduced in Section
5. Thus, we have to implement WASMrsync with a quasi-asynchronous strategy, i.e., to implement the asynchronous WASM API invocations in a synchronous manner using the await/async mechanism of JavaScript. Unfortunately, there are overheads associated with running the await/async method: (1) the current execution context has to be captured; (2) there is a thread transition; (3) and a state machine is built through which the code runs. Worse still, this await/async property of a WASM API invocation is propagated to the wrapping function of the invocation, which introduces unnecessary overhead for executing even non-WASM code in the wrapping function and thus severely increases the overall sync time.
We practically overcome this shortcoming through strategic sync-async code decoupling that extracts the synchronous non-WASM code from asynchronous modules to avoid unnecessary await/async overhead. Specifically, we re-examine and modify the system implementation to avoid using async/await for very short methods and non-WASM exported functions, or having await statements in tight loops (run the whole loop asynchronously instead). And we change the multi-layer nested callbacks to the single-layer nested callbacks to minimize the workload caused by propagating the aforementioned await/async nature.
Adopting stream compilation in loading WASM modules at the client. As shown in Figure
19, when leveraging WASM, developers usually need to download a WASM module, compile it, instantiate it, and then use the exported module in JavaScript. For consistency and for the sake of keeping the main thread free, every operation in the code snippet is designed to be asynchronous. The asynchronous property of WASM APIs offers us an opportunity to process these operations in parallel for improving the overall loading efficiency. Hence, we manage to adopt
streaming compilation [
44] in loading WASM modules to meet this parallelization.
Streaming compilation is an emerging web-accelerating technology that allows code to be downloaded while the browser compiles it.
We embody this adoption by replacing WebAssembly.compile and WebAssembly.instantiate with the state-of-the-art WASM API—WebAssembly.instantiateStreaming. This WASM API can compile and instantiate a WASM module directly from a streamed underlying source in one go. Specifically, by adopting stream compilation, we can compile the
md5.wasm module already while we are still downloading the module bytes. Note that this API has been supported by V8 (JavaScript engine) [
2] and Chrome 61 [
1].
To find out the client-side optimizations’ real impacts on the performance of the final product, we conduct extensive experimental evaluations in Section
6.3 in terms of our proposed metrics. The experimental results (Section
6.3) show that the
client-side optimizations have the most significant effect on reducing sync time of the final solution.
6.2 Server-side Optimizations
Designing Informed In-place File Construction mechanism at the server. By adopting the client-side optimizations as above, we manage to make WASM more compatible with web-based delta sync to improve the sync efficiency. After that, when handling large files (
\(\ge\)10 MB), we additionally notice that WASMrsync+ suffers from the time- and memory-consuming file construction at the server side. To quantify the proportion of file construction time in the entire server-side sync time, we perform random append, insert, and cut operations of different edit sizes (ranging from 1 B, 10 B, 100 B, 1 KB, 10 KB, 100 KB, 1 MB) into different large files (ranging from 10 MB, 20 MB, 40 MB, 60 MB, 80 MB, to 100 MB). These large files are generated to simulate real-world files. For each large file, we take the average sync time of appending/inserting/cutting different edit sizes as the final results. According to our measurement results (as exemplified in Figure
23), for such files, the construction time usually accounts for about 54
\(\%\)–83
\(\%\) of the entire sync time, and the proportion increases with the file size.
In particular, WASMrsync still follows the algorithm of rsync and may inherit some of the inherent defects of rsync. Thus, to demystify this phenomenon, we carefully investigate the internals of rsync and find the root cause is that the server side adopts the conventional In-situ Separate Memory Allocation (ISMA) mechanism to construct the updated file \(f^{\prime }\). In detail, when parsing the “patch” of every newly submitted file chunk, the server needs to (1) reallocate a memory region that is able to contain both the previous parsed chunk (stored in another memory region) and the new chunk, (2) copy the parsed chunk data from its memory region to the reallocated one, and (3) append the new chunk data. This means that the server side needs to perform excessive memory allocation/copy operations and maintain two memory spaces (\(M_{new}\) and \(M_{old}\)) when constructing an updated file. For example, when the chunk size is set to 64 KB, inserting 1 MB of data into a 100 MB file requires the server side to maintain a total memory space of 201 MB (100 MB for old, 101 MB for new) and perform about 3,156 (2 \(\times\) \(\frac{101 MB}{64 KB}\)) memory allocation/copy operations. For space-constrained servers, maintaining two memory spaces also significantly limits the number of concurrent clients supported by them. Further, we note that ISMA is largely associated with a lack of information about the updated file (i.e., the size of the updated file and the actions to chunks).
Based on the above-mentioned observations and investigations, to reduce the overhead introduced by ISMA, we devise the Informed In-place File Construction (IIFC) mechanism and deploy this strategy in WASMrsync. The objectives of the IIFC mechanism include: (1) performing a single incremental memory allocation to prevent the excessive time-consuming memory allocation/copy operations; and (2) reusing the memory space already occupied by the old file to eliminate the need for additional memory space when constructing the updated file. To meet the two optimization objectives, we first modify the client to send the size of the updated file to inform the server in advance. Thus, the server can conduct a single incremental memory allocation based on the memory space occupied by the old file according to the informed size of the updated file. Then, we encode the actions to matched and unmatched file chunks as Copy and Add according to the chunk searching and comparing results, respectively, and send the encoding information to the server along with the literal bytes. Based on the encoding information and the literal bytes, the server copies the unmatched data chunks from their original memory regions to the new ones and adds the changed data chunks to the new memory regions.
It is extremely challenging to perform the In-place file construction, because we must account for hazards that arise when conducting a
Copy action.
Copy not only reads and copies a chunk of a file, but also overwrites some bytes that may exist in other data chunks. The overwritten chunks cannot be used in future
Copy actions, because they no longer contain the original data. Figure
24(a) plots a simple example of file synchronization. The matched chunks will be moved from their original location (bottom) to their new location (top). To ensure the correctness of sync results,
B2 must be completed before
B1 and
\(B3,\) because the destination of these chunks will corrupt the source for
B2. Since the new
B2 occupies the original region of
B4,
B4 needs to be copied before
B2. In the abstract, we can get the associated dependency graph shown in Figure
24(b).
To undo the hazards and inspired by Figure
24, we modify the workflow of WASMrsync and embody the IIFC mechanism by taking the following steps: First, after encoding the actions to chunks of the updated file (
Copy or
Add) according to the chunk searching and comparing results, the client buffers
Copy and
Add actions in memory. For matched chunks, we need to buffer the source offset and the target offset of them. Unmatched Chunks require an extra field for the length of chunk data. Thus, we can get the original chunk dependency graph, but there may be some dependency cycles with regard to
Copy actions.
Second, to ensure the correctness of sync results, we need to topologically sort the original dependency graph and break cycles as they are detected. To this end, we devise a DFS-based algorithm. The algorithm constructs a directed graph (denoted as \(G = (N, E)\)) in which edges \(E = (e_1, e_2, e_3, ...)\) represent ordering constraints among Copy actions and nodes \(N = (n_1, n_2, n_3, ...)\) indicate matched chunks. The unmatched chunks are self-describing and require no data from the old region. Therefore, Add actions need no reordering. A topological sort of the graph determines an execution order for processing Copy actions at the server. When a total topological ordering proves impossible because of cycles in the graph, we randomly convert the action to one node in the cycle from Copy to Add and add the corresponding chunk data explicitly. In the end, we can get a new chunk dependency tree with no cycles. After the above steps, the client side delivers the new chunk dependency graph, followed by literal bytes to the server side. Finally, the server applies Copy or Add actions when receiving the data. The server seeks the given offset to either copy a matched chunk from its original memory region or insert the unmatched data chunk.
It is worth mentioning that when handling a file larger than the server’s memory with the IIFC, we devise a sequential processing strategy. Specifically, once one chunk dependency tree (smaller than memory) is formed, we write the content of these involved chunks to the new file and free up memory space in real time. When all the chunk dependency trees are sequentially generated and written to the new file in the same way, the entire file synchronization is finished.
Here, the side effects of IIFC mainly lie in two aspects. On the one hand, buffering Copy and Add actions consumes additional memory space (usually ranging from several to tens of KB), but it is much less than \(M_{new}\) required in traditional rsync for large files (\(\ge\)10 MB). On the other hand, the process of detecting and resolving the dependency costs additional time to conduct an analysis of all Copy actions before transmitting data to the server. Traditional rsync overlaps detecting and transmitting matching tokens and literal bytes by sending data to the server immediately, while we have to wait until all Copy actions are found and analyzed before transmitting data. Nevertheless, this process is usually completed in tens of milliseconds, so the trivial additional sync time is acceptable, considering the significant reductions in memory usage and sync time.
To figure out the impact expected on the performance of the final product for the server-side optimization (IIFC) in terms of our proposed metrics, we conduct extensive experimental evaluations in Section
6.3. Our experimental results show that the reduction of memory usage and the increase of service throughput are mainly due to the server-side optimization (i.e., the IIFC mechanism). The detailed experimental results are analyzed in Section
6.3.
6.3 WASMrsync+: The Final Efficient and Practical Product
The integration of WASMrsync, the client-side optimization, and the server-side optimization creates WASMrsync+. The client side of WASMrsync+ is implemented based on the HTML5 File APIs, the WebSocket protocol, and WASM. In total, it is written in 2,200 lines of JavaScript code and 500 lines of C code. The former deals with the client-side workflow of rsync, and the latter implements the calculation of MD5 checksums to prepare for adopting WASM. The server side of WASMrsync+ is developed based on the node.js framework with 1,500 lines of node.js code and a series of C processing modules with 600 lines of C code for handling the user requests and embodying the server-side optimization (IIFC); its architecture also follows the server architecture of Dropbox. Similar to other solutions, the web service of WAMrsync+ runs on a VM server rented from Aliyun ECS, and the file content is hosted on object storage rented from Aliyun OSS [
6]. More details about the server, client, and network configurations have been described in Section
3.3 and Figure
2.
It is worth recalling that even though mainstream cloud storage systems (i.e., Dropbox) do not support delta sync in web browsers, they still have web clients. Thus, although WASMrsync+ also requires deployment at the
web client side and the server side, the optimizations of WASMrsync+ including leveraging WASM to calculate MD5 and the implementation of IIFC can all be packaged as dynamic libraries for mainstream cloud storage systems to directly apply
to their web clients and servers, respectively.
As illustrated in Figure 18, mainstream cloud storage systems do not need to refactor their architectures
(i.e., reversing the components ① and ② and modifying interfaces between them like WebR2sync+), but only need to replace the original functional modules
(MD5 and File Construction (FC)) with our encapsulated dynamic libraries
(WASM-MD5 and IIFC), which essentially improve the applicability of WASMrsync+.
Note that, since WASM-MD5 and IIFC are highly encapsulated JavaScript and C dynamic libraries, they can be easily integrated into these systems’ web clients and servers by following the conventional way of loading JavaScript modules and C libraries. To validate the effectiveness of WASMrsync+, we evaluate the performance of WASMrsync+, in comparison to WebRsync, WebR2sync, WebR2sync+, WASMrsync, and (PC client-based)
rsync under the aforementioned workloads and metrics in Section
3.
Sync efficiency. We measure the efficiency of WASMrsync+ in terms of the time for completing the sync. Figure
25 shows the time for syncing against different types of file operations under a simple workload. We can see that the sync time of WASMrsync+ is substantially shorter than that of WASMrsync (by 2 to 5 times) and WebRsync (by 13 to 19 times) for every different type of operation. Note that Figure
25 is plotted with a log scale. In other words, WASMrsync+ outpaces WebRsync by around an order of magnitude, approaching the speed of WebR2sync+.
Similar to Figure
22, we further break down the sync time of WASMrsync+ into three stages compared with WebR2sync+ as shown in Figure
26. Comparing Figures
22 and
26, we notice that the majority of client-side and server-side sync time has been largely decreased. Particularly, the file construction time at the server side is saved by nearly 3–15 times, which demonstrates the effectiveness of the IIFC mechanism. This indicates that the computation overheads of the client and server sides in WASMrsync are substantially reduced in WASMrsync+.
To further explore the respective effects of the client- and server-side optimizations on improving the sync efficiency of WASMrsync+, we additionally measure the sync time of WASMrsync,
WASMrsync with the client-side optimization (WASMrsync-CO),
WASMrsync with the server-side optimization (WASMrsync-SO), and WASMrsync+. As demonstrated in Figure
27, the sync time of WASMrsync-CO and WASMrsync-SO is between that of WASMrsync+ and WASMrsync. Note that Figure
27 is plotted with a log scale. In other words, both the client- and server-side optimizations can effectively reduce the sync time of WASMrsync. In particular, the client-side optimization has a more significant effect on reducing the sync time of WASMrsync compared with the server-side optimization.
Although the time of calculating MD5 can be significantly reduced by using WASM, it is still about 16%–33% more than that of Siphash according to our measurement results. Meanwhile, even though the IIFC mechanism can effectively reduce the file construction time of WASMrsyc+, it still introduces the additional time of detecting and resolving the chunk dependency. Thus, the sync time of WASMrsync+ is a bit higher (only 10%\(\sim\)20%) than that of WebRs2sync+. Even though WASMrsync+ does not outperform WebR2sync+, WASMrsync+ achieves comparable sync time as WebR2sync+ under the premise of preserving the system reliability and industrial applicability.
Computation overhead. Moreover, we record the client-side and server-side CPU utilizations in Figures
28 and
29, respectively. At the client side, WebRsync consumes the most CPU resources, while WebR2sync+ consumes the least for shifting the most computation-intensive chunk search and comparison operations from the client to the server. PC client-based
rsync consumes nearly a half CPU resources as compared to WebRsync. The CPU utilizations of WASMrsync and WASMrsync+ lie between
rsync and WebR2sync+. This is within our expectation, because WASMrsync and WASMrsync+ perform the most computation-intensive chunk search and comparison operations at the client side for preserving the original workflow of
rsync and using the heavyweight but more reliable checksum function–MD5. Nevertheless, the client-side CPU utilization of WASMrsync is significantly reduced compared to WebRsync, demonstrating the effectiveness of adopting WASM at the client side. The client-side CPU utilization of WASMrsync+ is less than that of WASMrsync (nearly by 10%–20%), which proves the efficacy of our two-fold client-side optimizations. Owing to the moderate (
\(\lt\)\(40\%\)) CPU utilizations, both the clients of WASMrsync and WASMrsync+ do not exhibit stagnation, so do WebR2sync and WebR2sync+.
On the server side, WebR2sync consumes the most CPU resources, because the most computation-intensive chunk search and comparison operations are shifted from the client to the server. On the contrary, the server-side CPU utilizations of WASMrsync and WASMrsync+ are both less than that of WebR2sync+, because they do not move the expensive chunk search and comparison operations from the client side to the server side. In particular, WASMrsync+ consumes the least CPU resources, which also validates the efficacy of our server-side optimizations.
Sync traffic. Figure
30 illustrates the sync traffic consumed by the different approaches. We can see that for any type of edit, the sync traffic (between 1 KB and 120 KB) is significantly less than the average file size (
\(\sim\)1 MB), confirming the power of delta sync in improving the network-level efficiency of cloud storage services. For the same edit size, the sync traffic of an append operation is usually less than that of an insert operation, because the former would bring more matching tokens while fewer literal bytes (refer to Figure
3). Besides, when the edit size is relatively large (10 KB or 100 KB), a cut operation consumes much less sync traffic than an append/insert operation, because a cut operation brings only matching tokens but not literal bytes. It is worth mentioning that although WASMrsync+ spends a small amount of additional traffic to transfer the information of the dependency graph of matched chunks due to the IIFC mechanism, this is trivial compared to the improved overall performance of WASMrsync+.
Memory usage. Furthermore, to validate the efficacy of IIFC, we record the memory usages of all the proposed solutions under heavy workloads (refer to Section
3.2). Considering that the edit sizes (10 KB–1 MB) are relatively small compared to the file sizes of large files (10 MB–100 MB), for every large file, we take the average amount of memory space required by various edit sizes as the final experimental result. As Figure
31 depicted, for any size of large files, with the exception of WASMrsync+, the memory space required by the server side of these solutions is basically twice the size of these large files. On the contrary, WASMsync+ only needs to allocate nearly the same amount of memory as the size of these large files to correctly complete the overall sync process, which demonstrates the power of the IIFC mechanism. In other words, WASMrsync+ can save
\(\sim\)50% server-side memory usage compared to other solutions.
Service throughput. Finally, we measure the service throughput of WASMrsync+ in terms of the number of concurrent clients it can support. In general, as the number of concurrent clients increases, the main burden imposed on the server comes from the high CPU utilizations in all cores. When the CPU utilizations on all cores approach 100%, we record the number of concurrent clients at that time as the service throughput. As shown in Figure
32, WASMrsync+ can simultaneously support 7,600–9,100 web clients’ delta sync using a standard VM server instance under regular workloads, which is more than WebR2sync+ can support (6,800–8,500). The throughput of WASMrsync+ is 3–4 times that of WebR2sync/
rsync and
\(\sim\)17 times that of NoRsync. NoRsync means that no delta sync is used for synchronizing file edits, i.e., directly uploading the entire content of the edited file to the cloud. Also, we measure the service throughput of each solution under intensive workloads (which are mixed by the three types of edits; refer to Section
3.2).
The service throughput is affected not only by the file size but also by the number of sub-edits in the web-based delta sync scenario. Although the average file size in intensive workloads is 23 KB, there are many sub-edits in these involving files, so the CPU utilization of the server can reach 100%. As illustrated in Figure
33, under the intensive workloads, WebR2sync supports
fewer concurrent users than the rsync scheme. This is reasonable, because WebR2sync
has moved more computational tasks from the client side to the server side. Meanwhile, WebR2sync+ can support more concurrent users than
rsync due to our two-fold server-side optimizations, including exploiting the locality of file edits in chunk search and replacing MD5 with SipHash in chunk comparison. The experimental result also indicates the effectiveness of our proposed server-side optimizations.
The results in Figure
33 also show that even under the intensive workloads, WASMrsync+ can simultaneously support the most concurrent users (about 810 web clients’ delta sync) compared with other solutions using a single VM server instance, which proves the effectiveness of leveraging WASM and the corresponding client-side and server-side optimizations. To further clarify the respective effects of the client- and server-side optimizations on increasing the service throughput of WASMrsync+, we also measure the throughput of WASMrsync with IIFC. As shown in Figures
32 and
33, the service throughput of WASMrsync with IIFC is very close to that of WASMrsync+. Thus, we can reasonably believe the increase of service throughput of WASMrsync+ is mainly due to the adoption of the IIFC mechanism.