Abstract
To leverage high speed interconnects like InfiniBand it is important to minimize the communication overhead. The most interfering overhead is the registration of communication memory.
In this paper, we present our analysis of the memory registration process inside the Mellanox InfiniBand driver and possible ways out of this bottleneck. We evaluate and characterize the most time consuming parts in the execution path of the memory registration function using the Read Time Stamp Counter (RDTSC) instruction. We present measurements on AMD Opteron and Intel Xeon systems with different types of Host Channel Adapters for PCI-X and PCI-Express. Finally, we conclude with first results using Linux hugepage support to shorten the time of registering a memory region.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Archives, L.K.: Website http://www.kernel.org
Bell, C., Bonachea, D.: A New DMA Registration Strategy for Pinning-Based High Performance Networks. In: Proceedings of Int’l Parallel and Distributed Processing Symposium (IPDPS 2003) (April 2003)
Grabner, R., Mietke, F., Rehm, W.: Implementing an MPICH-2 Channel Device over VAPI on InfiniBand. In: Proceedings of the 18th Int’l Parallel and Distributed Processing Symposium, IPDPS (2004)
InfiniBand Trade Association. InfiniBand Architecture Specification 1.2 (2004)
Intel GmbH, Hermlheimer Str. 8a, D-50321 Brhl, Germany. Intel MPI Benchmarks – Users Guide and Methodology Description
Liss, L., Birk, Y., Schuster, A.: In-Kernel Integration of Operating System and Infiniband Functions for High Performance Computing Clusters: A DSM Example. IEEE Transactions on Parallel and Distributed Systems 16(9) (September 2005)
Liu, J., Jiang, W., Wyckoff, P., Panda, D.K., Ashton, D., Buntinas, D., Gropp, W., Toonen, B.: Design and Implementation of MPICH2 over InfiniBand with RDMA Support. In: Proceedings of Int’l Parallel and Distributed Processing Symposium (IPDPS 2004) (April 2004)
Liu, J., Wu, J., Kini, S.P., Wyckoff, P., Panda, D.K.: High Performance RDMA-Based MPI Implementation over InfiniBand. In: The Proceedings of 17th Annual ACM International Conference on Supercomputing (June 2003)
Mehlan, T., Rehm, W., Engler, R., Wenzel, T.: Providing a High-Performance VIA-Module for LAM/MPI. In: Proceedings of IEEE International Conference on Parallel Computing in Electrical Engineering (PARELEC 2004) (September 2004)
Mietke, F., Rex, R., Mehlan, T., Hoefler, T., Rehm, W.: Reducing the Impact of Memory Registration in InfiniBand. In: Proceedings of the 1. Workshop Kommunikation in Clusterrechnern und Clusterverbundsystemen (KiCC) (2005)
Myrinet. Myrinet Inc. http://www.myri.com
Rex, R.: Analysis and Evaluation of Memory Locking Operations for High-Speed Network Interconnects. Student Project, Chemnitz University of Technology (October 2005)
Sur, S., Bondhugula, U., Mamidala, A., Jin, H.-W., Panda, D.K.: High Performance RDMA Based All-to-all Broadcast for InfiniBand Clusters. In: Bader, D.A., Parashar, M., Sridhar, V., Prasanna, V.K. (eds.) HiPC 2005. LNCS, vol. 3769, Springer, Heidelberg (2005)
Tezuka, H., O’Carroll, F., Hori, A., Ishikawa, Y.: Pin-down Cache: A Virtual Memory Management Technique for Zero-copy Communication. In: Proceedings of 12th Int. Parallel Processing Symposium (March 1998)
Tipparaju, V., Santhanaraman, G., Nieplocha, J., Panda, D.K.: Host-Assisted Zero-Copy Remote Memory Access Communication on InfiniBand. In: Proceedings of Int’l Parallel and Distributed Processing Symposium (IPDPS 2004) (April 2004)
O. M. Website, A High Performance Message Passing Library http://www.open-mpi.org
Wu, J., Wyckoff, P., Panda, D.K.: PVFS over InfiniBand: Design and Performance Evaluation. In: Proceedings of International Conference on Parallel Processing (ICPP 2003) (October 2003)
Wu, J., Wyckoff, P., Panda, D.K.: Supporting Efficient Noncontiguous Access in PVFS over InfiniBand. In: Proceedings of IEEE International Conference on Cluster Computing (Cluster 2003) (December 2003)
Wu, J., Wyckoff, P., Panda, D.K., Ross, R.: Unifier: Unifying Cache Management and Communication Buffer Management for PVFS over InfiniBand. In: Proceedings of IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGrid 2004) (April 2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mietke, F., Rex, R., Baumgartl, R., Mehlan, T., Hoefler, T., Rehm, W. (2006). Analysis of the Memory Registration Process in the Mellanox InfiniBand Software Stack. In: Nagel, W.E., Walter, W.V., Lehner, W. (eds) Euro-Par 2006 Parallel Processing. Euro-Par 2006. Lecture Notes in Computer Science, vol 4128. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11823285_13
Download citation
DOI: https://doi.org/10.1007/11823285_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-37783-2
Online ISBN: 978-3-540-37784-9
eBook Packages: Computer ScienceComputer Science (R0)