Hmfs: efficient support of small files processing over HDFS

C Yan, T Li, Y Huang, Y Gan - … , ICA3PP 2014, Dalian, China, August 24-27 …, 2014 - Springer
C Yan, T Li, Y Huang, Y Gan
Algorithms and Architectures for Parallel Processing: 14th International …, 2014Springer
The storage and access of massive small files are one of the challenges in the design of
distributed file system. Hadoop distributed file system (HDFS) is primarily designed for
reliable storage and fast access of very big files while it suffers a performance penalty with
increasing number of small files. A middleware called Hmfs is proposed in this paper to
improve the efficiency of storing and accessing small files on HDFS. It is made up of three
layers, file operation interfaces to make it easier for software developers to submit different …
Abstract
The storage and access of massive small files are one of the challenges in the design of distributed file system. Hadoop distributed file system (HDFS) is primarily designed for reliable storage and fast access of very big files while it suffers a performance penalty with increasing number of small files. A middleware called Hmfs is proposed in this paper to improve the efficiency of storing and accessing small files on HDFS. It is made up of three layers, file operation interfaces to make it easier for software developers to submit different file requests, file management tasks to merge small files into big ones or extract small files from big ones in the background, and file buffers to improve the I/O performance. Hmfs boosts the file upload speed by using asynchronous write mechanism and the file download speed by adopting prefetching and caching strategy. The experimental results show that Hmfs can help to obtain high speed of storage and access for massive small files on HDFS.
Springer