Abstract
Software cache has been showed as a robust approach in multi-core systems with no hardware support for transparent data transfers between local and global memories. Software cache provides the user with a transparent view of the memory architecture and considerably improves the programmability of such systems. But this software approach can suffer from poor performance due to considerable overheads related to software mechanisms to maintain the memory consistency. This paper presents a set of alternatives to smooth their impact. A specific write-back mechanism is introduced based on some degree of speculation regarding the number of threads actually modifying the same cache lines. A case study based on the Cell BE processor is described. Performance evaluation indicates that improvements due to the optimized software-cache structures combined with the proposed code-optimizations translate into 20% up to 40% speedup factors, compared to a traditional software cache approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Kistler, M., et al.: Cell Multiprocessor Communication Network: Built for Speed. IEEE Micro 26(3), 10–23
Pham, D., et al.: The Design and Implementation of a First-Generation CELL Processor. In: Proceedings of the IEEE International Solid-State Circuits Conference (2005)
Gschwind, M., et al.: A Novel SIMD Architecture for the CELL Heterogeneous Chip-Multiprocessor. Hot Chips 17 (2005)
Chen, T., et al.: Optimizing the use of static buffers for DMA on a Cell chip. In: Almási, G.S., Caşcaval, C., Wu, P. (eds.) LCPC 2006. LNCS, vol. 4382, pp. 314–329. Springer, Heidelberg (2006)
Eichenberger, A.E., et al.: Using advanced compiler technology to exploit the performance of the Cell Broadband Engine architecture. IBM Sytems Journal 45(1) (2006)
Chen, T., et al.: Orchestrating Data Transfer for the Cell B.E. processor. In: The Proceedings of the Annual International Conference on Supercomputing, ICS 2008 (2008)
Gonzalez, M., et al.: Hybrid Access-Specific Software Cache Techniques for the Cell BE Architecture. In: Proceedings of the Seventeenth International Conference on Parallel Architectures and Compilation Techniques, PACT 2008 (2008)
Vujic, N., et al.: Automatic Pre-Fetch and Modulo Scheduling Transformations for the Cell BE Architecture. In: Amaral, J.N. (ed.) LCPC 2008. LNCS, vol. 5335, pp. 31–46. Springer, Heidelberg (2008)
Bailey, D., et al.: The NAS parallel benchmarks, Technical Report TR RNR-91-002, NASA Ames (August 1991)
Jin, H., et al.: The OpenMP Implementation of the NAS Parallel Benchmarks and its Performance. Technical Report NAS-99-011, NASA Ames Research Center (October 1999)
Paek, Y., et al.: Efficient and Precise Array Access Analysis. ACM Transactions on Programming Languages and Systems 24(1), 65–109 (2002)
Rugina, R., et al.: Pointer Analysis for Structured Parallel Programs. ACM Transactions on Programming Languages and Systems 25(1) (January 2003)
Robert, P., et al.: Efficient Context-Sensitive Pointer Analysis. ACM SIGPLAN 30(6) (June 1995)
Hoeflinger, J., et al.: The OpenMP Memory Model. In: The Proceedings of the First International Workshop on OpenMP
Altevogt, P., et al.: IBM BladeCenter QS21 Hardware Performance. IBM Technical White Paper WP101245 (2008)
Chen, T., et al.: Prefetching Irregular References for Software Cache on Cell. In: Proc. of the sixth Annual International Symposium on Code Generation and Optimization
Shen, Z., et al.: An empirical study of Fortran programs for parallelizing compilers. IEEE Trans. Paral. Distrib. Syst.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Vujic, N., Alvarez, L., Tallada, M.G., Martorell, X., Ayguadé, E. (2010). Adaptive and Speculative Memory Consistency Support for Multi-core Architectures with On-Chip Local Memories. In: Gao, G.R., Pollock, L.L., Cavazos, J., Li, X. (eds) Languages and Compilers for Parallel Computing. LCPC 2009. Lecture Notes in Computer Science, vol 5898. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13374-9_15
Download citation
DOI: https://doi.org/10.1007/978-3-642-13374-9_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13373-2
Online ISBN: 978-3-642-13374-9
eBook Packages: Computer ScienceComputer Science (R0)