Abstract
Virtualization provides the function of saving the entire status of the execution environment of a running virtual machine (VM), which makes checkpointing flexible and practical for HPC servers or data center servers. However, the system-level checkpointing needs to save a large number of data to the disk. Moreover, the overhead grows linearly with the increasing size of virtual machine memory, which leads to disk I/O consumption disaster along with poor system scalability. To target this, we propose a novel fast VM checkpointing approach, named Fast Incremental checkpoinTing with Delta memOry Compression (FITDOC). By studying the run-time memory characteristics of different workloads, FITDOC counts the dirty pages in a fine-granularity manner (i.e., the number of 8 bytes), instead of in the conventional method (i.e., the number of pages). FITDOC utilises a dirty page logging mechanism to record the dirty pages. Accordingly, a delta memory compression mechanism is implemented to eliminate redundant memory data in checkpointing files. To locate the dirty data in dirty pages, FITDOC utilizes two mechanisms: by analyzing the distribution characteristics of dirty pages in the dirty bitmap, we propose a fast dirty bitmap scanning method to locate the dirty pages, and take a multi-threading data comparison mechanism to locate the real dirty data in one page. The experimental results show that compared with Xen’s default system-level checkpointing algorithm, FITDOC can on average reduce checkpointing time 70.54 % with a 1 GB memory size and achieve better improvement for VMs with larger memory configurations. FITDOC can reduce the size of checkpointing data 52.88 % on average compared with Remus’s incremental solution, which is in page granularity. Compared with the default dirty bitmap scanning method in Xen, the scanning time of FITDOC is decreased by 91.13 % on average.
Similar content being viewed by others
References
Goldberg RP (1974) Survey of virtual machine research. IEEE Comput 7:34–45
Barham P, Dragovic B, Fraser K, Hand S, Harris T, Ho A, Neugebauer R, Pratt I, Warfield A (2003) Xen and the art of virtualization. In: Proceedings of the 19th ACM symposium on operating systems principles (SOSP’03), pp 164–177
Nathuji R, Schwan K (2007) VirtualPower: coordinated power management in virtualized enterprise systems. In: Proceedings of the 21st ACM symposium on operating systems principles (SOSP’07), pp 265–278
Jan S, Lang C, Bellosa F (2007) Energy management for hypervisor-based virtual machines. In: Proceedings of the USENIX annual technical conference
Nagarajan AB, Mueller F, Engelmann C, Scott SL (2007) Proactive fault tolerance for HPC with Xen virtualization. In: Proceedings of 21st ACM international conference on supercomputing (ICS’07), pp 23–32
Zhu J, Dong W, Jiang Z, Shi X, Xiao Z, Li X (2010) Improving the performance of hypervisor-based fault tolerance. In: Proceedings of international parallel and distributed processing symposium (IPDPS’10), pp 1–10
Nicolae B, Cappello F (2011) BlobCR: efficient checkpoint-restart for HPC applications on IaaS clouds using virtual disk image snapshots. In: Proceedings of the 2011 international conference for high performance computing, networking, storage and analysis (SC’11), pp 1–12
Cully B, Lefebvre G, Meyer D, Feeley M, Hutchinson N, Warfield A (2008) Remus: high availability via asynchronous virtual machine replication. In: Proceedings of the 5th USENIX symposium on networked systems design and implementation (NSDI’08), pp 161–174
Gerofi B, Vass Z, Ishikawa Y (2011) Utilizing memory content similarity for improving the performance of replicated virtual machines. In: Proceedings of the 4th IEEE international conference on utility and cloud computing (UCC’11), pp 73–80
Park E, Egger B, Lee J (2011) Fast and space-efficient virtual machine checkpointing. In: Proceedings of the 7th ACM SIGPLAN/SIGOPS international conference on virtual execution environments (VEE’11), pp 75–86
Deng L, Jin H, Wu S, Shi X, Zhou J (2011) Fast saving and restoring virtual machines with page compression. In: Proceedings of the 2011 international conference on cloud and service computing (CSC’11), pp 150–157
Zhang X, Huo Z, Ma J, Meng D (2010) Exploiting data deduplication to accelerate live virtual machine migration. In: Proceedings of the 2010 IEEE international conference on cluster computing (Cluster’10), pp 88–96
DBENCH. http://dbench.samba.org/
The Linux Kernel Archives. http://www.kernel.org/
NAS Parallel Benchmarks. http://www.nas.nasa.gov/publications/npb.html
Welcome to Apache Hadoop! http://hadoop.apache.org/
TPC-W-Homepage. http://www.tpc.org/tpcw/
Agarwal S, Garg R, Gupta MS, Moreira JE (2004) Adaptive incremental checkpointing for massively parallel systems. In: Proceedings of 18th ACM international conference on supercomputing (ICS’04)
Naksinehaboon N, Liu Y, Leangsuksun C, Nassar R, Paun M, Scott SL (2008) Reliability-aware approach: an incremental checkpoint/restart model in HPC environments. In: Proceedings of the 8th IEEE international symposium on cluster computing and the grid (CCGrid’08)
Svard P, Hudzia B, Tordsson J, Elmroth E (2011) Evaluation of delta compression techniques for efficient live migration of large virtual machines. In: Proceedings of the 7th ACM SIGPLAN/SIGOPS international conference on virtual execution environments (VEE’11), pp 111–120
Clark C, Fraser K, Hand S, Hansen JG, Jul E, Limpach C, Pratt I, Warfield A (2005) Live migration of virtual machines. In Proceedings of the second USENIX symposium on networked systems design and implementation (NSDI’05), pp 273–286
Nelson M, Lim B, Hutchines G (2005) Fast transparent migration for virtual machines. In: Proceedings of the USENIX annual technical conference (USENIX’05), pp 391–394
Hines MR, Gopalan K (2009) Post-copy based live virtual machine migration using adaptive pre-paging and dynamic self-ballooning. In: Proceedings of the 2009 ACM SIGPLAN/SIGOPS international conference on virtual execution environments (VEE’09), pp 51–60
Jin H, Deng L, Wu S, Shi X, Pan X (2009) Live virtual machine migration with adaptive memory compression. In: Proceedings of the IEEE international conference on cluster computing (Cluster’09), pp 1–10
Liu H, Jin H, Liao X, Hu L, Yu C (2009) Live migration of virtual machine based on full system trace and replay. In: Proceedings of the 18th international symposium on high performance distributed computing (HPDC’09), pp 101–110
Huang W, Gao Q, Liu J, Panda DK (2007) High performance virtual machine migration with RDMA over modern interconnects. In: Proceedings of the IEEE international conference on cluster computing (Cluster’07), pp 11–20
Zou H, Yu Y, Tang W (2014) FlexAnalytics: a flexible data analytics framework for big data applications with I/O performance improvement. Big Data Res 1:4–13
Yi S, Heo J, Cho Y, Hong J (2006) Adaptive page-level incremental checkpointing based on expected recovery time. In: Proceedings of the 2006 ACM symposium on applied computing (SAC’06), pp 1472–1476
Acknowledgments
This paper is partly supported by the NSFC under Grant No. 61370104 and No. 61433019, MOE-Intel Special Research Fund of Information Technology under Grant MOE-INTEL-2012-01, and Chinese Universities Scientific Fund under Grant No. 2014TS008.
Author information
Authors and Affiliations
Corresponding author
Additional information
A preliminary version containing some of the results in this paper has been published in the CSE 2014.
Rights and permissions
About this article
Cite this article
Du, Y., Shi, X., Jin, H. et al. FITDOC: fast virtual machines checkpointing with delta memory compression. J Supercomput 72, 3328–3347 (2016). https://doi.org/10.1007/s11227-015-1429-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-015-1429-5