Abstract
High speed networks and rapidly improving microprocessor performance make the network of workstations an extremely important tool for parallel computing in order to speedup the execution of scientific applications. Shared memory is an attractive programming model for designing parallel and distributed applications, where the programmer can focus on algorithmic development rather than data partition and communication. Based on this important characteristic, the design of systems to provide the shared memory abstraction on physically distributed memory machines has been developed, known as Distributed Shared Memory (DSM). DSM is built using specific software to combine a number of computer hardware resources into one computing environment. Such an environment not only provides an easy way to execute parallel applications, but also combines available computational resources with the purpose of speeding up execution of these applications. DSM systems need to maintain data consistency in memory, which usually leads to communication overhead. Therefore, there exists a number of strategies that can be used to overcome this overhead issue and improve overall performance. Strategies as prefetching have been proven to show great performance in DSM systems, since they can reduce data access communication latencies from remote nodes. On the other hand, these strategies also transfer unnecessary prefetching pages to remote nodes. In this research paper, we focus on the access pattern during execution of a parallel application, and then analyze the data type and behavior of parallel applications. We propose an adaptive data classification scheme to improve prefetching strategy with the goal to improve overall performance. Adaptive data classification scheme classifies data according to the accessing sequence of pages, so that the home node uses past history access patterns of remote nodes to decide whether it needs to transfer related pages to remote nodes. From experimental results, we can observe that our proposed method can increase the accuracy of data access in effective prefetch strategy by reducing the number of page faults and misprefetching. Experimental results using our proposed classification scheme show a performance improvement of about 9–25% over the same benchmark applications running on top of an original JIAJIA DSM system.
Similar content being viewed by others
References
Abe T, Okamoto S (2003) A moving home-based software DSM system. In: The proceedings of communication, computers and signal processing, vol 1, pp 17–20
Cox AL, Dwarkadas S, Keleher P, Zwaenepoel W (1994) An integrated approach to distributed shared memory. In: First international workshop on parallel processing, December 1994
Dwarkadas S, Keleher P, Cox AL, Zwaenepoel W (1993) An evaluation of software distributed shared memory for next-generation processors and networks. In: Proceedings of the twentieth symposium on computer architecture, May, pp 144–155
Eskicioglu MR, Marsland TA, Hu W, Shi W (1999) Evaluation of the JIAJIA software DSM system on high performance computer architectures. In: The proceedings of the 32nd annual Hawaii international conference on system sciences (HICSS-32), volume track 8
Hsu C-H, Chen T-L, Li K-C (2007) Performance effective pre-scheduling strategy for heterogeneous grid systems in the master slave paradigm. Future Gener Comput Syst 23(4):569–579
Hu W, Shi W, Tang Z (1999) JIAJIA: an SVM system based on a new cache coherence protocol. In: The proceedings of the high performance computing and networking (HPCN’99), pp 463–472
Hu W, Shi W, Tang Z (1999) Write detection in home-based software DSMs. In: The proceedings of the EuroPar’99
Hu W, Shi W, Tang Z (1999) Reducing system overheads in home-based software DSMs. In: The proceedings of 13th international and 10th symposium on parallel and distributed processing, pp 167–173
Hu W, Shi W, Tang Z (1999) Home migration in home based software DSMs. In: The proceedings of ACM 1st workshop on software DSM system
Hu W, Shi W, Tang Z (2001) Optimizing home-based software DSM protocols. J Netw Softw Tools Appl 4(3):235–242
Hu W, Zhang F, Liu H (2001) Dynamic data prefetching in home-based software DSM. J Comput Sci Technology
Liu H, Hu W (2001) A comparison of two strategies of dynamic data prefetching in software DSM. In: Proceedings of the 15th IEEE international parallel and distributed processing symposium
Lu S-H, Yang C-C, Wang H-H, Li K-C (2005) On design of agent home scheme for prefetching strategy in DSM systems. In: The proceedings of the 19th IEEE international conference on advanced information networking and applications (AINA’2005), vol 1, pp 693–698
Mirchandaney R, Hiranandani S, Sethi A (1994) Improving the performance of DSM systems via compiler involvement. In: Proceedings of supercomputing ’94, pp 763–772
Park D, Saavedra RH (1996) Adaptive granularity: transparent integration of fine- and coarse-grain communication. In: The proceedings of parallel architectures and compilation techniques, pp 260–268
Roh Y, Seong BH, Park D (2000) Hiding latency through bulk transfer and prefetching in distributed shared memory multiprocessors. In: The proceedings of high performance computing in the Asia-Pacific region, vol 1, pp 164–166
Shi W (1999) Improving the performance of software DSM systems. Ph.D. thesis, Chinese Academy of Sciences, Beijing, China
Tu J-F, Wang Y-H, Wang L-H (2000) A dynamic data prefetching method of improving the memory latency. In: International conference on high performance computing in the Asia-Pacific region, vol 1, pp 13–18
Wang K-J (2004) On the design and implementation of an effective prefetch strategy on DSM systems. Master thesis, Providence University, Dept. of Computer Science and Information Management, Taiwan
Wang H-H, Li K-C, Wang K-J, Lu S-H (2006) On the design and implementation of an effective prefetch strategy for DSM systems. J Supercomput 37(1):91–112
Wang K-J, Wang H-H, Li K-C (2004) On design of a prefetching strategy for DSM system. In: PDPTA’2004 international conference on parallel and distributed processing techniques and applications, Las Vegas, USA
Yang C-C, Lu S-H, Wang H-H, Li K-C (2005) An efficient data transmission scheme for software distributed shared memory. In: Proceedings of the 11th workshop on compiler techniques for high performance computing, Taiwan, pp 136–140
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wang, HH., Li, KC., Lu, SH. et al. Towards implementation of a novel scheme for data prefetching on distributed shared memory systems. J Supercomput 47, 111–126 (2009). https://doi.org/10.1007/s11227-008-0180-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-008-0180-6