Container-based Hadoop distributed file system (HDFS) storage has been widely used in cloud data center networks, while traditional HDFS has single point problem resulting in overall unavailability. In this paper, we mainly study the storage reliability of the Docker container-based HDFS cluster with single point of failure. Firstly, we investigate a data volume-based persistence solution of Hadoop with the single point failure and single backup strategy of HDFS cluster. Secondly, we propose an HDFS-based replica placement algorithm for data storage with considering the performance of the host and container nodes. Thirdly, we design the KADC-KNN data segmentation algorithm to effectively store the persistent data of the Docker container. Extensive experimental results show that this method can effectively ensure the stable storage and fast migration of cluster data. Compared with the most advanced algorithm, the proposed data volume persistence algorithm DVPS can improve the data reliability by 19.8%. The data partitioning algorithm KADC-KNN improves the partitioning accuracy by 20.2% and has lower time overhead.
We thank the editors and the anonymous reviewers for their useful feedback that improved this paper.
This work is supported by Natural Science Foundation of China under grant (No. 62172291, 62102196, 62102195), Natural Science Foundation of Jiangsu Province (No. BK20200753), Jiangsu Postdoctoral Science Foundation Funded Project (No. 2021K096A), Future Network Scientific Research Fund Project (FNSRFP-2021-YB-60), Natural Science Fund For Colleges and Universities in Jiangsu Province (21KJB520026), the Fundamental Research Funds for the Central Universities JL (No. 93K172020K25, 93K172021K03), Innovative Research Team Project of Suzhou Institute of Industrial Technology (2021KYTD003), and the Qing Lan Project of Jiangsu Province.
